[ML] Instrument Data Frame Analysis Job to report state information to Java backend #906

valeriy42 · 2019-12-16T09:32:16Z

In this PR I introduce a new class CDataFrameAnalysisState, which is responsible for centralized collection of statistics related to the data frame analytics jobs, i.e. progress, memory usage, etc. Hence, the related variables are also moved out of CDataFrameAnalysisRunner class.

Right now we have two callback functions related to the job state: progressRecorder and memoryUsageRecorder. However, since we can expect that every new kind of statistics (quality of results, time, parameters) would require another update callback function in maths classes, I pass the reference to the State object. To avoid dependency from maths to api, I introduce a state interface class owned by maths.

Related to #976

# Conflicts: # include/maths/CBoostedTreeImpl.h

valeriy42 · 2019-12-17T15:17:18Z

lib/maths/COutliers.cc

    for (const auto& model : m_Models) {
        model.addOutlierScores(points, scores, m_RecordMemoryUsage);
+        state.nextStep(step++);


IMHO, this constitutes a single step of the outlier detection job. WDYT @tveasey ?

I agree. This seems like a natural place to me as well.

tveasey

Right I've done a first pass through. I really like the design here. I've left some minor comments. My two larger comments are I have some reservations about the naming and it feels like CDataFrameAnalysisStateInterface is superfluous (and it's slightly odd it is a concrete type). I'd rejig things slightly to remove and factor out CStubAnalysisState and use throughout the unit tests. Overall good job!

include/api/CDataFrameAnalysisRunner.h

include/api/CDataFrameAnalysisState.h

tveasey · 2019-12-19T10:56:44Z

include/api/CDataFrameAnalysisState.h

+
+private:
+    SInternalState m_InternalState;
+    core::CConcurrentQueue<SInternalState, 10> m_StateQueue;


There is going to be a block if we try and push more than 10 state documents. Since we pop at fixed intervals in time we need to be aware of this behaviour. For example, if this gets updated many times it will potentially mean the working threads are blocked. I don't think this will cause a problem as it stands, but think we should document this behaviour in the class description.

Comment added that elements can be dropped if the queue is full.

include/api/CDataFrameAnalysisState.h

include/maths/CDataFrameAnalysisStateInterface.h

lib/api/CDataFrameAnalyzer.cc

tveasey · 2019-12-19T11:14:39Z

lib/maths/COutliers.cc

    for (const auto& model : m_Models) {
        model.addOutlierScores(points, scores, m_RecordMemoryUsage);
+        state.nextStep(step++);


I agree. This seems like a natural place to me as well.

tveasey

Thanks for working through those comments. I've done another pass. It now seems to me that we can ditch the queue altogether: as currently implemented it isn't serving a purpose, but in fact the concurrent line writer avoids the need for any synchronisation that this would be used for.

include/api/CDataFrameAnalysisInstrumentation.h

include/maths/CDataFrameAnalysisInstrumentationInterface.h

lib/api/CDataFrameAnalyzer.cc

lib/api/CDataFrameAnalysisInstrumentation.cc

include/api/CDataFrameAnalysisInstrumentation.h

tveasey

I think this has come together well. The unit test failure is caused by requiring that the training is instrumented. I think this is too severe. Aside from that it's LGTM.

lib/maths/CBoostedTreeImpl.cc

lib/api/CDataFrameAnalysisInstrumentation.cc

# Conflicts: # lib/api/unittest/CDataFrameAnalyzerFeatureImportanceTest.cc

# Conflicts: # docs/CHANGELOG.asciidoc # include/maths/CBoostedTreeFactory.h

tveasey

One minor comment. Otherwise, LGTM. Let's get this in!

lib/maths/CBoostedTreeImpl.cc

…o Java backend (elastic#906) In this PR I introduce a new class CDataFrameAnalysisState, which is responsible for centralized collection of statistics related to the data frame analytics jobs, i.e. progress, memory usage, etc. Hence, the related variables are also moved out of CDataFrameAnalysisRunner class. Right now we have two callback functions related to the job state: progressRecorder and memoryUsageRecorder. However, since we can expect that every new kind of statistics (quality of results, time, parameters) would require another update callback function in maths classes, I pass the reference to the State object. To avoid dependency from maths to api, I introduce a state interface class owned by maths.

…o Java backend (#978) Backport to #906

valeriy42 added 5 commits December 12, 2019 13:38

initial draft of classes and interfaces

691978a

introduce State classes, build works

40a7f57

Merge branch 'master' into instrumentalization

6dc23cf

outlier test is working

faf73b7

pass state to Boosted Tree

69c3076

valeriy42 added >non-issue WIP :ml >feature v8.0.0 v7.6.0 >enhancement labels Dec 16, 2019

valeriy42 added 7 commits December 16, 2019 10:34

Merge branch 'master' into feature/instrumentalize

e01c96b

# Conflicts: # include/maths/CBoostedTreeImpl.h

remove callbacks from Boosted Tree

dd09a7f

add exports and formatting

bce8396

coutlier test fixed

0ebe0c4

CBoostedTreeTest fixed

0b16c51

CBoostedTreeTest fixed

06cf2ef

unit test fixed

af27d08

valeriy42 commented Dec 17, 2019

View reviewed changes

valeriy42 added 7 commits December 17, 2019 16:19

unit test fixed

a231d8b

adjust boosted tree impl to call next step

13d6abf

formatting fixed

0a5300c

internal state without atomics

dc36e41

PR walk through

1677010

fix unit test

8c8ef5b

PR cleanup and docs

4803933

valeriy42 removed the WIP label Dec 18, 2019

valeriy42 requested a review from tveasey December 18, 2019 15:51

add enhancement note

23754d0

valeriy42 removed the >non-issue label Dec 19, 2019

tveasey reviewed Dec 19, 2019

View reviewed changes

valeriy42 added 3 commits December 19, 2019 13:24

rename state to instrumentation

18aeb91

review comments

26938c2

minor fixes

3fb85fe

tveasey reviewed Dec 19, 2019

View reviewed changes

valeriy42 added 2 commits December 19, 2019 17:39

state class removed

d33bd1c

Merge branch 'master' into feature/instrumentalize

d6144ca

tveasey reviewed Jan 14, 2020

View reviewed changes

lib/maths/CBoostedTreeImpl.cc Outdated Show resolved Hide resolved

droberts195 reviewed Jan 15, 2020

View reviewed changes

lib/api/CDataFrameAnalysisInstrumentation.cc Outdated Show resolved Hide resolved

droberts195 added v7.7.0 and removed v7.6.0 labels Jan 15, 2020

valeriy42 added 6 commits January 16, 2020 11:30

Merge branch 'master' into unit-test-fix

c1428b2

# Conflicts: # lib/api/unittest/CDataFrameAnalyzerFeatureImportanceTest.cc

changelog updated to 7.7

8c87266

deal with empty instrumentation pointer

715ef71

dealing with nullptr

b756a88

Merge branch 'master' into feature/instrumentalize

3fabdc9

# Conflicts: # docs/CHANGELOG.asciidoc # include/maths/CBoostedTreeFactory.h

deactivate writing state

161f417

tveasey approved these changes Jan 29, 2020

View reviewed changes

lib/maths/CBoostedTreeImpl.cc Outdated Show resolved Hide resolved

use typedefs

ec7324e

valeriy42 merged commit 2b05f39 into elastic:master Jan 29, 2020

valeriy42 deleted the feature/instrumentalize branch January 29, 2020 12:10

valeriy42 restored the feature/instrumentalize branch January 29, 2020 12:11

valeriy42 mentioned this pull request Jan 29, 2020

[ML] Instrument Data Frame Analysis Job to report state information to Java backend #978

Merged

valeriy42 added a commit that referenced this pull request Jan 29, 2020

[ML] Instrument Data Frame Analysis Job to report state information t…

2c1d9d2

…o Java backend (#978) Backport to #906

tveasey mentioned this pull request Feb 14, 2020

[ML] Fix a bug with progress reporting during model training #1001

Merged

valeriy42 deleted the feature/instrumentalize branch May 6, 2020 11:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Instrument Data Frame Analysis Job to report state information to Java backend #906

[ML] Instrument Data Frame Analysis Job to report state information to Java backend #906

valeriy42 commented Dec 16, 2019 •

edited

Loading

valeriy42 Dec 17, 2019

tveasey Dec 19, 2019

tveasey left a comment

tveasey Dec 19, 2019

valeriy42 Dec 19, 2019

tveasey Dec 19, 2019

tveasey left a comment

tveasey left a comment

tveasey left a comment

[ML] Instrument Data Frame Analysis Job to report state information to Java backend #906

[ML] Instrument Data Frame Analysis Job to report state information to Java backend #906

Conversation

valeriy42 commented Dec 16, 2019 • edited Loading

valeriy42 Dec 17, 2019

Choose a reason for hiding this comment

tveasey Dec 19, 2019

Choose a reason for hiding this comment

tveasey left a comment

Choose a reason for hiding this comment

tveasey Dec 19, 2019

Choose a reason for hiding this comment

valeriy42 Dec 19, 2019

Choose a reason for hiding this comment

tveasey Dec 19, 2019

Choose a reason for hiding this comment

tveasey left a comment

Choose a reason for hiding this comment

tveasey left a comment

Choose a reason for hiding this comment

tveasey left a comment

Choose a reason for hiding this comment

valeriy42 commented Dec 16, 2019 •

edited

Loading