Time dependent metric system #578

wangcj05 · 2018-02-15T16:43:45Z

Pull Request Description

What issue does this change request address? (Use "#" before the issue to link it, i.e., #42.)

close #333

What are the significant changes in functionality due to this change request?

Time-dependent metrics system is implemented.

For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

1. Review all computer code.
2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True.
6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
8. If an analytic test is changed/added is the the analytic documentation updated/added?

moosebuild · 2018-02-24T23:20:11Z

Job Test qsubs on 9c5f090 : invalidated by @wangcj05

moosebuild · 2018-05-08T17:00:52Z

Job Test linux on 9c5f090 : invalidated by @wangcj05

moosebuild · 2018-06-05T23:33:15Z

Job Test mac on b73550b : invalidated by @wangcj05

env error

alfoa · 2018-06-06T17:06:23Z

framework/MetricDistributor.py

+    self.messageHandler          = messageHandler
+    self.estimator                = estimator
+    self.canHandleDynamicData = self.estimator.isDynamic()
+    self.canHandlePairwiseData = self.estimator.isPairwise()


Comment the variables in the constructor
e.g.

# pointer to the estimator to be used self.estimator = estimator # can handle dynamic data? True if HistorySet are accepted self.canHandleDynamicData = self.estimator.isDynamic() (etc.)

alfoa · 2018-06-06T17:07:26Z

framework/MetricDistributor.py

+      @ Out, state, dict, it contains all the information needed by the class to be initialized
+    """
+    state = self.__dict__.copy()
+    return state


since no particular action is performed in these methods, why do you define them? they should be pickable without these methods to be defined

alfoa · 2018-06-06T17:08:09Z

framework/MetricDistributor.py

+      @ Out, paramDict, dict, dictionary containing the parameter names as keys
+        and each parameter's initial value as the dictionary values
+    """
+    paramDict = {}


no parameter? e.g. ```handleDynamic``, etc?

alfoa · 2018-06-06T17:08:53Z

framework/MetricDistributor.py

+      @ In, pairedData, tuple, (featureValues, targetValues), both featureValues and targetValues
+        are 2D numpy array with the same number of columns. For example, featureValues with shape
+        (numRealizations1,numParameters), targetValues with shape (numRealizations2, numParameters)
+      @ Out, output, numpy.array, 2D array, with shape (numRealizations1,numRealization2)


alfoa · 2018-06-06T17:09:25Z

framework/MetricDistributor.py

+    assert(type(pairedData).__name__ == 'tuple', "The paired data is not a tuple!")
+    if not self.canHandlePairwiseData:
+      self.raiseAnError(IOError, "The metric", self.estimator.name, "can not handle pairwise data")
+    feat, targ = pairedData[0], pairedData[1]


feat, targ = pairedData

alfoa · 2018-06-06T18:34:06Z

framework/PostProcessors/Metric.py

-            outputDict[varName] = np.atleast_1d(output)
-    else:
-      self.raiseAnError(IOError, "Not implemented yet")
+        nodeName = (str(self.targets[cnt]) + '_' + str(self.features[cnt])).replace("|","_")


where is this documented?

document is updated.

alfoa · 2018-06-06T18:34:38Z

framework/unSupervisedLearning.py

+              inputI[ind] = valueI
+              inputJ[ind] = valueJ
+            pairedData = ((inputI,None), (inputJ,None))
+            # TODO: Using loops can be very slow for large number of realizations


yes. should be a FIXME

alfoa · 2018-06-06T18:36:29Z

tests/framework/PostProcessors/Metrics/DTW/lorentzAttractor_timeScale_I.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+''' from wikipedia: dx/dt = sigma*(y-x)  ; dy/dt = x*(rho-z)-y  dz/dt = x*y-beta*z  ; '''


docstring.... """

alfoa · 2018-06-06T18:36:45Z

tests/framework/PostProcessors/Metrics/DTW/lorentzAttractor_timeScale_II.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+''' from wikipedia: dx/dt = sigma*(y-x)  ; dy/dt = x*(rho-z)-y  dz/dt = x*y-beta*z  ; '''


alfoa · 2018-06-06T18:37:26Z

tests/framework/PostProcessors/Metrics/gold/mcMetricRun/output_metric.csv

@@ -0,0 +1,5 @@
+Metrics,ans-ans2
+euclidean2_paired_distance_euclidean,8.15320610438


it is not standard

removed, this is not needed, and this is the old output format.

alfoa · 2018-06-06T18:39:17Z

First round of review performed @wangcj05

wangcj05 · 2018-06-07T23:34:05Z

This PR for your review @alfoa

alfoa · 2018-06-07T23:58:37Z

framework/Metrics/DTW.py

    elif axis == 1:
-      assert x.shape[1] == y.shape[1], "The second dimension of first input is not the same as the second dimension of second input"


I would add the "error" message as comment for future developers (in case)

the syntax is invalid...everything is failing

fixed, I found we should be careful with the parenthesis. If we do not use parenthesis in assert, which should be fine, but if we add parenthesis in assert, we should avoid string message inside parenthesis, which will always be True for the assert evaluation.

In our code, in most cases, we do not use parenthesis in assert statement.

In my case, I always use parentheses, and never use a string statement, just for comparison. See examples in the data objects.

Since asserts are exclusively for developers, I will often put any comments that would usually go in a string, in a comment above the assert, assuming the developer will go check the assert.

moosebuild · 2018-06-08T00:37:48Z

Job Mingw Test on 6aa47e6 : invalidated by @alfoa

alfoa · 2018-07-31T03:30:34Z

@wangcj05 the docstring has not been fixed all over the place.
Please take time to review all of them (overall in the places where there are copy/pastes)

wangcj05 · 2018-08-03T18:46:10Z

@alfoa The linux test failed due to time out for the distribution test.
The failed test in the qsub paralRom, I do not know the reason. I have tested on the Falcon, and all are passing:

Return code: 0
PASS adaptiveSobol

ALL PASSED

wangcj05 · 2018-08-03T18:47:59Z

According to Test qsubs:

remoteRunCommand {u'args': [u'qsub', u'-N', 'test_qsub', u'-W', u'block=true', u'-P', u'moose', u'-l', u'select=3:ncpus=1:mpiprocs=1', u'-l', u'walltime=00:10:00', u'-l', u'place=free', u'-v', u'COMMAND="/home/moosetest/civet/build_0/raven/raven_framework /home/moosetest/civet/build_0/raven/tests/cluster_tests/InternalParallel/test_internal_parallel_ROM_scikit.xml /home/moosetest/civet/build_0/raven/tests/cluster_tests/InternalParallel/../pbspro_mpi.xml /home/moosetest/civet/build_0/raven/tests/cluster_tests/InternalParallel/../cluster_runinfo_legacy.xml"', '/home/moosetest/civet/build_0/raven/framework/raven_qsub_command_legacy.sh'], u'cwd': '/home/moosetest/civet/build_0/raven/tests/cluster_tests/InternalParallel'}
3865879.service2
Return code: 0
ls: cannot access InternalParallelScikit/*.csv: No such file or directory
Lines not here yet, waiting longer.
ls: cannot access InternalParallelScikit/*.csv: No such file or directory
Lines not here yet, waiting even longer.
ls: cannot access InternalParallelScikit/*.csv: No such file or directory
FAIL paralROM

moosebuild · 2018-08-03T18:48:12Z

Job Test linux on de8412f : invalidated by @wangcj05

time out

moosebuild · 2018-08-03T18:48:27Z

Job Test qsubs on de8412f : invalidated by @wangcj05

moosebuild · 2018-08-03T19:12:35Z

Job Test qsubs on de8412f : invalidated by @wangcj05

alfoa · 2018-08-03T22:10:49Z

doc/user_manual/metrics.tex

+    </SKL>
+    <SKL name="mean_squared_error">
+        <metricType>regression|mean_squared_error</metricType>
+        <sample_weight>[0.1,0.1,0.1,0.05,0.05]</sample_weight>


why not just a comma separated expression?
Why are you forcing the user to know the syntax of a Python List?

I can change that. I think the main reason is to keep consistent with SciKitLearn. As I observed in several other places, such as rom and data mining, we do follow the SciKitLearn format of input.

Can you point me to the other places where we do this? This should be changed. Let's begin with changing it here.

For example, in Lasso linear model, the node <alphas> accepts numpy array as mentioned in our manual. You can also check the DataMining manual, we do not convert the format of the input to the Sckitlearn Functions, we just pass the variables to the functions. I see there are several advantages for this:

we do not need to change format of input, otherwise we need to add a lot of if conditions in our code

Only manual need to be updated, when there is a change in the ScikitLearn.

Personally, I think we may need to keep the same format of ScikitLearn.

Congjian your concern could have sense before introducing the InputData.
In the InputData we have a data type called FloatListType (useful in this particular case) that is aimed to perform the conversion automatically without any if statement if the variable linked to it is correctly defined.
Since we introduced this system, we have to exploit it to make the life of the user easier and our input checking more robust.

Andrea

I got it, I think I misunderstand your question at the beginning. It seems I do use the InputData with FloatListType in the code. I will update the test and manual.

Some update on this issue. 1) In the metric PP, I do use the FloatListType, 2) In the Metric system, the InputData is not used, and the old way to read more xml is used.

Now this PR need to rework a little bit to employ InputData to parse the input xml file.

alfoa · 2018-08-03T22:14:40Z

framework/unSupervisedLearning.py

-          for j in range(i+1,cardinality):
-            self.normValues[i][j] = metric.distance(tdictNorm[keys[i]],tdictNorm[keys[j]])
-            self.normValues[j][i] = self.normValues[i][j]
+          for j in range(i,cardinality):


these loops will be super slow. You do not have another way?

I do add FIXME in line 238. This is an current issue in our devel branch. I do not have a good solution for this now. I would like to explore this issue in the future PR when I started to add more metrics. This may be changed since I plan to use the data object directly instead of dictionary here.

alfoa · 2018-08-03T22:15:30Z

framework/Metrics/SklMetric.py

  """
-  def initialize(self,inputDict):
+  availMetrics ={}
+  # regression metrics


and what about the classification ones?

This will be added after this PR.

wangcj05 · 2018-08-09T22:15:05Z

@alfoa Now the InputData is available in Metric system, and I think I have resolved all you comments. Please let me know if you have any more comments on the modifications.

alfoa · 2018-08-22T15:04:54Z

framework/Metrics/DTW.py

+    for child in paramInput.subparts:
+      if child.getName() == "order":
+        self.order = child.value
+        if self.order not in [0,1]:


you do not need this check.
You should use the enumerator type instead of the IntegerType => no check required since it is done directly in the InputData

alfoa · 2018-08-22T15:07:36Z

doc/user_manual/metrics.tex

@@ -20,7 +20,7 @@ \section{Metrics}

  In addition to this XML subnode, the users can also specify the weights for given metric:
  \begin{itemize}
-    \item \xmlNode{w}, \xmlDesc{python list, optional parameter}, the weights for each value in \textit{u}
+    \item \xmlNode{w}, \xmlDesc{comma separated float, optional parameter}, the weights for each value in \textit{u}


comma separated floats or comma separated float values

alfoa · 2018-08-22T15:07:44Z

doc/user_manual/metrics.tex

@@ -31,7 +31,7 @@ \section{Metrics}

  In addition to this XML subnode, the users can also specify the weights for given metric:
  \begin{itemize}
-    \item \xmlNode{sample\_weight}, \xmlDesc{python list, optional parameter}, the weights for each value in \textit{u}
+    \item \xmlNode{sample\_weight}, \xmlDesc{comma separated float, optional parameter}, the weights for each value in \textit{u}


alfoa · 2018-08-22T15:09:44Z

doc/user_manual/metrics.tex

@@ -411,7 +411,7 @@ \subsection{Dynamic Time Warping}
 \begin{itemize}
  \item \xmlNode{order},          \xmlDesc{int, required field},    order of the DTW calculation: $0$ specifices a classical DTW caluclation and $1$ specifies


I honestly do not get why we need integer here to specify a enumeration list.
Can we change this in classical and derivative.
@mandd . In addition, can we add a bit more info regarding what this option is supposed to trigger?????

alfoa · 2018-08-22T15:10:14Z

doc/user_manual/postprocessor.tex

@@ -1803,6 +1803,8 @@ \subsubsection{Metric}
    \textbf{mean, max, min, raw\_values} over the time. For example, when `mean' is used, the metrics' calculations
    will be averaged over the time. When `raw\_values' is used, the full set of  metrics' calculations will be dumped.
    \default{raw\_values}
+  \item \xmlNode{weight}, \xmlDesc{comma separated float, optional field}, when `mean' is provided for \xmlNode{multiOutput},


comma separated floats or comma separated float values

alfoa · 2018-08-22T15:10:44Z

@wangcj05 few other changes

@mandd there is a question for you as well

alfoa · 2018-08-26T01:57:09Z

For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

1. Review all computer code. DONE
2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts). OK
3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details. DONE
4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass. THEY PASS
5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True. TEST ADDED
6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync. N/A
7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done. improve the metric system and metric post processor to accept HistorySet #333
8. If an analytic test is changed/added is the the analytic documentation updated/added? N/A

moosebuild · 2018-08-27T14:53:50Z

Job Mingw Test on a41e20d : invalidated by @wangcj05

wangcj05 · 2018-08-27T18:09:59Z

@alfoa Tests are green now.

wangcj05 · 2018-09-04T16:40:45Z

@alfoa Can you take a look at this PR? Based on my understanding, you have already reviewed it, and the tests are green. Do you have additional comments?

alfoa · 2018-09-04T16:48:10Z

Tests pass...

Checklist passed...

Issue closure list passed...

Merging...

wangcj05 changed the title ~~WIP: time dependent metric system~~ Time dependent metric system Feb 22, 2018

wangcj05 added the Do Not Merge label Feb 22, 2018

wangcj05 force-pushed the wangc/td_metrics branch from 9c5f090 to 2dc413e Compare May 9, 2018 18:52

alfoa changed the title ~~Time dependent metric system~~ WIP: Time dependent metric system May 10, 2018

wangcj05 force-pushed the wangc/td_metrics branch from b36d4fe to 65c6b3f Compare June 5, 2018 22:46

wangcj05 removed the Do Not Merge label Jun 5, 2018

wangcj05 changed the title ~~WIP: Time dependent metric system~~ Time dependent metric system Jun 5, 2018

wangcj05 requested a review from aalfonsi June 5, 2018 22:56

alfoa requested changes Jun 6, 2018

View reviewed changes

alfoa reviewed Jun 7, 2018

View reviewed changes

wangcj05 added 14 commits July 17, 2018 11:05

Initial Implementation for Time-dependent metrics system

00c13d8

add and modify tests

2af3297

update tests

21ab138

add comments, and add axis option for the metric evaluation

2b5a8bd

fix comparison statistics metrics

42f1929

update docstrings

be7cc75

implement the interface to handle the scipy metrics

4a7240a

add tests to test scipy metric with booleans

5ffb8ff

remove Minkowski, since we add it in the ScipyMetric

adc3200

update tests

34dfe2f

update manual for metrics

fab449f

restructure metrics

5434421

update tests

2041e84

using the new interface for DTW, and add test for DTW metric

44d75e2

update docstring

de8412f

Merge remote-tracking branch 'upstream/devel' into wangc/td_metrics

6a39aa1

alfoa requested changes Aug 3, 2018

View reviewed changes

wangcj05 added 2 commits August 9, 2018 14:12

Merge remote-tracking branch 'upstream/devel' into wangc/td_metrics

764c0ca

change xml reader to InputData parser

bded40f

wangcj05 added priority_normal task This tag should be used for any new capability, improvement or enanchment comments addressed used to comunicate that the comments on PRs are addressed Ready To Review labels Aug 9, 2018

alfoa requested changes Aug 22, 2018

View reviewed changes

alfoa reviewed Aug 22, 2018

View reviewed changes

resolve comments

a41e20d

alfoa approved these changes Sep 4, 2018

View reviewed changes

alfoa merged commit 09e47ca into idaholab:devel Sep 4, 2018

		@@ -0,0 +1,5 @@
		Metrics,ans-ans2
		euclidean2_paired_distance_euclidean,8.15320610438

		elif axis == 1:
		assert x.shape[1] == y.shape[1], "The second dimension of first input is not the same as the second dimension of second input"

		@@ -411,7 +411,7 @@ \subsection{Dynamic Time Warping}
		\begin{itemize}
		\item \xmlNode{order}, \xmlDesc{int, required field}, order of the DTW calculation: $0$ specifices a classical DTW caluclation and $1$ specifies

Time dependent metric system #578

Time dependent metric system #578

Conversation

wangcj05 commented Feb 15, 2018 • edited by alfoa Loading

Pull Request Description

What issue does this change request address? (Use "#" before the issue to link it, i.e., #42.)

What are the significant changes in functionality due to this change request?

For Change Control Board: Change Request Review

moosebuild commented Feb 24, 2018

moosebuild commented May 8, 2018

moosebuild commented Jun 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alfoa commented Jun 6, 2018

wangcj05 commented Jun 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moosebuild commented Jun 8, 2018

alfoa commented Jul 31, 2018

wangcj05 commented Aug 3, 2018

wangcj05 commented Aug 3, 2018

moosebuild commented Aug 3, 2018

moosebuild commented Aug 3, 2018

moosebuild commented Aug 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangcj05 commented Aug 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alfoa commented Aug 22, 2018

alfoa commented Aug 26, 2018 • edited Loading

For Change Control Board: Change Request Review

moosebuild commented Aug 27, 2018

wangcj05 commented Aug 27, 2018

wangcj05 commented Sep 4, 2018

alfoa commented Sep 4, 2018

wangcj05 commented Feb 15, 2018 •

edited by alfoa

Loading

alfoa commented Aug 26, 2018 •

edited

Loading