Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time dependent metric system #578

Merged
merged 54 commits into from
Sep 4, 2018
Merged

Conversation

wangcj05
Copy link
Collaborator

@wangcj05 wangcj05 commented Feb 15, 2018


Pull Request Description

What issue does this change request address? (Use "#" before the issue to link it, i.e., #42.)

close #333

What are the significant changes in functionality due to this change request?

Time-dependent metrics system is implemented.


For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

  • 1. Review all computer code.
  • 2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
  • 3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
  • 4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
  • 5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True.
  • 6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
  • 7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
  • 8. If an analytic test is changed/added is the the analytic documentation updated/added?

@wangcj05 wangcj05 changed the title WIP: time dependent metric system Time dependent metric system Feb 22, 2018
@moosebuild
Copy link

Job Test qsubs on 9c5f090 : invalidated by @wangcj05

@moosebuild
Copy link

Job Test linux on 9c5f090 : invalidated by @wangcj05

@alfoa alfoa changed the title Time dependent metric system WIP: Time dependent metric system May 10, 2018
@wangcj05 wangcj05 changed the title WIP: Time dependent metric system Time dependent metric system Jun 5, 2018
@wangcj05 wangcj05 requested a review from aalfonsi June 5, 2018 22:56
@moosebuild
Copy link

Job Test mac on b73550b : invalidated by @wangcj05

env error

self.messageHandler = messageHandler
self.estimator = estimator
self.canHandleDynamicData = self.estimator.isDynamic()
self.canHandlePairwiseData = self.estimator.isPairwise()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment the variables in the constructor
e.g.

  # pointer to the estimator to be used
  self.estimator                = estimator
  # can handle dynamic data? True if HistorySet are accepted
  self.canHandleDynamicData = self.estimator.isDynamic()
  (etc.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@ Out, state, dict, it contains all the information needed by the class to be initialized
"""
state = self.__dict__.copy()
return state
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since no particular action is performed in these methods, why do you define them? they should be pickable without these methods to be defined

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@ Out, paramDict, dict, dictionary containing the parameter names as keys
and each parameter's initial value as the dictionary values
"""
paramDict = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no parameter? e.g. ```handleDynamic``, etc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@ In, pairedData, tuple, (featureValues, targetValues), both featureValues and targetValues
are 2D numpy array with the same number of columns. For example, featureValues with shape
(numRealizations1,numParameters), targetValues with shape (numRealizations2, numParameters)
@ Out, output, numpy.array, 2D array, with shape (numRealizations1,numRealization2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ndarray

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

assert(type(pairedData).__name__ == 'tuple', "The paired data is not a tuple!")
if not self.canHandlePairwiseData:
self.raiseAnError(IOError, "The metric", self.estimator.name, "can not handle pairwise data")
feat, targ = pairedData[0], pairedData[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feat, targ = pairedData

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

outputDict[varName] = np.atleast_1d(output)
else:
self.raiseAnError(IOError, "Not implemented yet")
nodeName = (str(self.targets[cnt]) + '_' + str(self.features[cnt])).replace("|","_")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this documented?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document is updated.

inputI[ind] = valueI
inputJ[ind] = valueJ
pairedData = ((inputI,None), (inputJ,None))
# TODO: Using loops can be very slow for large number of realizations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. should be a FIXME

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

# See the License for the specific language governing permissions and
# limitations under the License.

''' from wikipedia: dx/dt = sigma*(y-x) ; dy/dt = x*(rho-z)-y dz/dt = x*y-beta*z ; '''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring.... """

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# See the License for the specific language governing permissions and
# limitations under the License.

''' from wikipedia: dx/dt = sigma*(y-x) ; dy/dt = x*(rho-z)-y dz/dt = x*y-beta*z ; '''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,5 @@
Metrics,ans-ans2
euclidean2_paired_distance_euclidean,8.15320610438
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not standard

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, this is not needed, and this is the old output format.

@alfoa
Copy link
Collaborator

alfoa commented Jun 6, 2018

First round of review performed @wangcj05

@wangcj05
Copy link
Collaborator Author

wangcj05 commented Jun 7, 2018

This PR for your review @alfoa

elif axis == 1:
assert x.shape[1] == y.shape[1], "The second dimension of first input is not the same as the second dimension of second input"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add the "error" message as comment for future developers (in case)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the syntax is invalid...everything is failing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, I found we should be careful with the parenthesis. If we do not use parenthesis in assert, which should be fine, but if we add parenthesis in assert, we should avoid string message inside parenthesis, which will always be True for the assert evaluation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our code, in most cases, we do not use parenthesis in assert statement.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case, I always use parentheses, and never use a string statement, just for comparison. See examples in the data objects.

Since asserts are exclusively for developers, I will often put any comments that would usually go in a string, in a comment above the assert, assuming the developer will go check the assert.

@moosebuild
Copy link

Job Mingw Test on 6aa47e6 : invalidated by @alfoa

@alfoa
Copy link
Collaborator

alfoa commented Jul 31, 2018

@wangcj05 the docstring has not been fixed all over the place.
Please take time to review all of them (overall in the places where there are copy/pastes)

@wangcj05
Copy link
Collaborator Author

wangcj05 commented Aug 3, 2018

@alfoa The linux test failed due to time out for the distribution test.
The failed test in the qsub paralRom, I do not know the reason. I have tested on the Falcon, and all are passing:

Return code: 0
PASS adaptiveSobol

ALL PASSED

@wangcj05
Copy link
Collaborator Author

wangcj05 commented Aug 3, 2018

According to Test qsubs:

remoteRunCommand {u'args': [u'qsub', u'-N', 'test_qsub', u'-W', u'block=true', u'-P', u'moose', u'-l', u'select=3:ncpus=1:mpiprocs=1', u'-l', u'walltime=00:10:00', u'-l', u'place=free', u'-v', u'COMMAND="/home/moosetest/civet/build_0/raven/raven_framework /home/moosetest/civet/build_0/raven/tests/cluster_tests/InternalParallel/test_internal_parallel_ROM_scikit.xml /home/moosetest/civet/build_0/raven/tests/cluster_tests/InternalParallel/../pbspro_mpi.xml /home/moosetest/civet/build_0/raven/tests/cluster_tests/InternalParallel/../cluster_runinfo_legacy.xml"', '/home/moosetest/civet/build_0/raven/framework/raven_qsub_command_legacy.sh'], u'cwd': '/home/moosetest/civet/build_0/raven/tests/cluster_tests/InternalParallel'}
3865879.service2
Return code: 0
ls: cannot access InternalParallelScikit/*.csv: No such file or directory
Lines not here yet, waiting longer.
ls: cannot access InternalParallelScikit/*.csv: No such file or directory
Lines not here yet, waiting even longer.
ls: cannot access InternalParallelScikit/*.csv: No such file or directory
FAIL paralROM

@moosebuild
Copy link

Job Test linux on de8412f : invalidated by @wangcj05

time out

@moosebuild
Copy link

Job Test qsubs on de8412f : invalidated by @wangcj05

1 similar comment
@moosebuild
Copy link

Job Test qsubs on de8412f : invalidated by @wangcj05

</SKL>
<SKL name="mean_squared_error">
<metricType>regression|mean_squared_error</metricType>
<sample_weight>[0.1,0.1,0.1,0.05,0.05]</sample_weight>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just a comma separated expression?
Why are you forcing the user to know the syntax of a Python List?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change that. I think the main reason is to keep consistent with SciKitLearn. As I observed in several other places, such as rom and data mining, we do follow the SciKitLearn format of input.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to the other places where we do this? This should be changed. Let's begin with changing it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, in Lasso linear model, the node <alphas> accepts numpy array as mentioned in our manual. You can also check the DataMining manual, we do not convert the format of the input to the Sckitlearn Functions, we just pass the variables to the functions. I see there are several advantages for this:

  1. we do not need to change format of input, otherwise we need to add a lot of if conditions in our code
  2. Only manual need to be updated, when there is a change in the ScikitLearn.

Personally, I think we may need to keep the same format of ScikitLearn.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congjian your concern could have sense before introducing the InputData.
In the InputData we have a data type called FloatListType (useful in this particular case) that is aimed to perform the conversion automatically without any if statement if the variable linked to it is correctly defined.
Since we introduced this system, we have to exploit it to make the life of the user easier and our input checking more robust.

Andrea

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it, I think I misunderstand your question at the beginning. It seems I do use the InputData with FloatListType in the code. I will update the test and manual.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some update on this issue. 1) In the metric PP, I do use the FloatListType, 2) In the Metric system, the InputData is not used, and the old way to read more xml is used.

Now this PR need to rework a little bit to employ InputData to parse the input xml file.

for j in range(i+1,cardinality):
self.normValues[i][j] = metric.distance(tdictNorm[keys[i]],tdictNorm[keys[j]])
self.normValues[j][i] = self.normValues[i][j]
for j in range(i,cardinality):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these loops will be super slow. You do not have another way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do add FIXME in line 238. This is an current issue in our devel branch. I do not have a good solution for this now. I would like to explore this issue in the future PR when I started to add more metrics. This may be changed since I plan to use the data object directly instead of dictionary here.

"""
def initialize(self,inputDict):
availMetrics ={}
# regression metrics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what about the classification ones?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be added after this PR.

@wangcj05
Copy link
Collaborator Author

wangcj05 commented Aug 9, 2018

@alfoa Now the InputData is available in Metric system, and I think I have resolved all you comments. Please let me know if you have any more comments on the modifications.

@wangcj05 wangcj05 added priority_normal task This tag should be used for any new capability, improvement or enanchment comments addressed used to comunicate that the comments on PRs are addressed Ready To Review labels Aug 9, 2018
for child in paramInput.subparts:
if child.getName() == "order":
self.order = child.value
if self.order not in [0,1]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do not need this check.
You should use the enumerator type instead of the IntegerType => no check required since it is done directly in the InputData

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

@@ -20,7 +20,7 @@ \section{Metrics}

In addition to this XML subnode, the users can also specify the weights for given metric:
\begin{itemize}
\item \xmlNode{w}, \xmlDesc{python list, optional parameter}, the weights for each value in \textit{u}
\item \xmlNode{w}, \xmlDesc{comma separated float, optional parameter}, the weights for each value in \textit{u}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma separated floats or comma separated float values

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -31,7 +31,7 @@ \section{Metrics}

In addition to this XML subnode, the users can also specify the weights for given metric:
\begin{itemize}
\item \xmlNode{sample\_weight}, \xmlDesc{python list, optional parameter}, the weights for each value in \textit{u}
\item \xmlNode{sample\_weight}, \xmlDesc{comma separated float, optional parameter}, the weights for each value in \textit{u}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -411,7 +411,7 @@ \subsection{Dynamic Time Warping}
\begin{itemize}
\item \xmlNode{order}, \xmlDesc{int, required field}, order of the DTW calculation: $0$ specifices a classical DTW caluclation and $1$ specifies
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly do not get why we need integer here to specify a enumeration list.
Can we change this in classical and derivative.
@mandd . In addition, can we add a bit more info regarding what this option is supposed to trigger?????

@@ -1803,6 +1803,8 @@ \subsubsection{Metric}
\textbf{mean, max, min, raw\_values} over the time. For example, when `mean' is used, the metrics' calculations
will be averaged over the time. When `raw\_values' is used, the full set of metrics' calculations will be dumped.
\default{raw\_values}
\item \xmlNode{weight}, \xmlDesc{comma separated float, optional field}, when `mean' is provided for \xmlNode{multiOutput},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma separated floats or comma separated float values

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@alfoa
Copy link
Collaborator

alfoa commented Aug 22, 2018

@wangcj05 few other changes

@mandd there is a question for you as well

@alfoa
Copy link
Collaborator

alfoa commented Aug 26, 2018


For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

  • 1. Review all computer code. DONE
  • 2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts). OK
  • 3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details. DONE
  • 4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass. THEY PASS
  • 5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True. TEST ADDED
  • 6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync. N/A
  • 7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done. improve the metric system and metric post processor to accept HistorySet #333
  • 8. If an analytic test is changed/added is the the analytic documentation updated/added? N/A

@moosebuild
Copy link

Job Mingw Test on a41e20d : invalidated by @wangcj05

@wangcj05
Copy link
Collaborator Author

@alfoa Tests are green now.

@wangcj05
Copy link
Collaborator Author

wangcj05 commented Sep 4, 2018

@alfoa Can you take a look at this PR? Based on my understanding, you have already reviewed it, and the tests are green. Do you have additional comments?

@alfoa
Copy link
Collaborator

alfoa commented Sep 4, 2018

Tests pass...

Checklist passed...

Issue closure list passed...

Merging...

@alfoa alfoa merged commit 09e47ca into idaholab:devel Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comments addressed used to comunicate that the comments on PRs are addressed priority_normal Ready To Review task This tag should be used for any new capability, improvement or enanchment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve the metric system and metric post processor to accept HistorySet
5 participants