-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time dependent metric system #578
Conversation
Job Test qsubs on 9c5f090 : invalidated by @wangcj05 |
Job Test linux on 9c5f090 : invalidated by @wangcj05 |
b36d4fe
to
65c6b3f
Compare
framework/MetricDistributor.py
Outdated
self.messageHandler = messageHandler | ||
self.estimator = estimator | ||
self.canHandleDynamicData = self.estimator.isDynamic() | ||
self.canHandlePairwiseData = self.estimator.isPairwise() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment the variables in the constructor
e.g.
# pointer to the estimator to be used
self.estimator = estimator
# can handle dynamic data? True if HistorySet are accepted
self.canHandleDynamicData = self.estimator.isDynamic()
(etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
framework/MetricDistributor.py
Outdated
@ Out, state, dict, it contains all the information needed by the class to be initialized | ||
""" | ||
state = self.__dict__.copy() | ||
return state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since no particular action is performed in these methods, why do you define them? they should be pickable without these methods to be defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@ Out, paramDict, dict, dictionary containing the parameter names as keys | ||
and each parameter's initial value as the dictionary values | ||
""" | ||
paramDict = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no parameter? e.g. ```handleDynamic``, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
framework/MetricDistributor.py
Outdated
@ In, pairedData, tuple, (featureValues, targetValues), both featureValues and targetValues | ||
are 2D numpy array with the same number of columns. For example, featureValues with shape | ||
(numRealizations1,numParameters), targetValues with shape (numRealizations2, numParameters) | ||
@ Out, output, numpy.array, 2D array, with shape (numRealizations1,numRealization2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
framework/MetricDistributor.py
Outdated
assert(type(pairedData).__name__ == 'tuple', "The paired data is not a tuple!") | ||
if not self.canHandlePairwiseData: | ||
self.raiseAnError(IOError, "The metric", self.estimator.name, "can not handle pairwise data") | ||
feat, targ = pairedData[0], pairedData[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feat, targ = pairedData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
outputDict[varName] = np.atleast_1d(output) | ||
else: | ||
self.raiseAnError(IOError, "Not implemented yet") | ||
nodeName = (str(self.targets[cnt]) + '_' + str(self.features[cnt])).replace("|","_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document is updated.
framework/unSupervisedLearning.py
Outdated
inputI[ind] = valueI | ||
inputJ[ind] = valueJ | ||
pairedData = ((inputI,None), (inputJ,None)) | ||
# TODO: Using loops can be very slow for large number of realizations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. should be a FIXME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
''' from wikipedia: dx/dt = sigma*(y-x) ; dy/dt = x*(rho-z)-y dz/dt = x*y-beta*z ; ''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring.... """
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
''' from wikipedia: dx/dt = sigma*(y-x) ; dy/dt = x*(rho-z)-y dz/dt = x*y-beta*z ; ''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -0,0 +1,5 @@ | |||
Metrics,ans-ans2 | |||
euclidean2_paired_distance_euclidean,8.15320610438 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, this is not needed, and this is the old output format.
First round of review performed @wangcj05 |
This PR for your review @alfoa |
framework/Metrics/DTW.py
Outdated
elif axis == 1: | ||
assert x.shape[1] == y.shape[1], "The second dimension of first input is not the same as the second dimension of second input" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add the "error" message as comment for future developers (in case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the syntax is invalid...everything is failing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, I found we should be careful with the parenthesis. If we do not use parenthesis in assert, which should be fine, but if we add parenthesis in assert, we should avoid string message inside parenthesis, which will always be True for the assert evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our code, in most cases, we do not use parenthesis in assert statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my case, I always use parentheses, and never use a string statement, just for comparison. See examples in the data objects.
Since asserts are exclusively for developers, I will often put any comments that would usually go in a string, in a comment above the assert, assuming the developer will go check the assert.
Job Mingw Test on 6aa47e6 : invalidated by @alfoa |
@wangcj05 the docstring has not been fixed all over the place. |
@alfoa The linux test failed due to time out for the distribution test.
|
According to Test qsubs:
|
Job Test linux on de8412f : invalidated by @wangcj05 time out |
Job Test qsubs on de8412f : invalidated by @wangcj05 |
1 similar comment
Job Test qsubs on de8412f : invalidated by @wangcj05 |
doc/user_manual/metrics.tex
Outdated
</SKL> | ||
<SKL name="mean_squared_error"> | ||
<metricType>regression|mean_squared_error</metricType> | ||
<sample_weight>[0.1,0.1,0.1,0.05,0.05]</sample_weight> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just a comma separated expression?
Why are you forcing the user to know the syntax of a Python List?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change that. I think the main reason is to keep consistent with SciKitLearn. As I observed in several other places, such as rom and data mining, we do follow the SciKitLearn format of input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me to the other places where we do this? This should be changed. Let's begin with changing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, in Lasso linear model, the node <alphas>
accepts numpy array
as mentioned in our manual. You can also check the DataMining manual, we do not convert the format of the input to the Sckitlearn Functions, we just pass the variables to the functions. I see there are several advantages for this:
- we do not need to change format of input, otherwise we need to add a lot of if conditions in our code
- Only manual need to be updated, when there is a change in the ScikitLearn.
Personally, I think we may need to keep the same format of ScikitLearn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congjian your concern could have sense before introducing the InputData.
In the InputData we have a data type called FloatListType
(useful in this particular case) that is aimed to perform the conversion automatically without any if statement if the variable linked to it is correctly defined.
Since we introduced this system, we have to exploit it to make the life of the user easier and our input checking more robust.
Andrea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it, I think I misunderstand your question at the beginning. It seems I do use the InputData with FloatListType in the code. I will update the test and manual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some update on this issue. 1) In the metric PP, I do use the FloatListType, 2) In the Metric system, the InputData is not used, and the old way to read more xml is used.
Now this PR need to rework a little bit to employ InputData to parse the input xml file.
for j in range(i+1,cardinality): | ||
self.normValues[i][j] = metric.distance(tdictNorm[keys[i]],tdictNorm[keys[j]]) | ||
self.normValues[j][i] = self.normValues[i][j] | ||
for j in range(i,cardinality): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these loops will be super slow. You do not have another way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do add FIXME
in line 238. This is an current issue in our devel branch. I do not have a good solution for this now. I would like to explore this issue in the future PR when I started to add more metrics. This may be changed since I plan to use the data object directly instead of dictionary here.
""" | ||
def initialize(self,inputDict): | ||
availMetrics ={} | ||
# regression metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and what about the classification ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be added after this PR.
@alfoa Now the InputData is available in Metric system, and I think I have resolved all you comments. Please let me know if you have any more comments on the modifications. |
framework/Metrics/DTW.py
Outdated
for child in paramInput.subparts: | ||
if child.getName() == "order": | ||
self.order = child.value | ||
if self.order not in [0,1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do not need this check.
You should use the enumerator type instead of the IntegerType => no check required since it is done directly in the InputData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
doc/user_manual/metrics.tex
Outdated
@@ -20,7 +20,7 @@ \section{Metrics} | |||
|
|||
In addition to this XML subnode, the users can also specify the weights for given metric: | |||
\begin{itemize} | |||
\item \xmlNode{w}, \xmlDesc{python list, optional parameter}, the weights for each value in \textit{u} | |||
\item \xmlNode{w}, \xmlDesc{comma separated float, optional parameter}, the weights for each value in \textit{u} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comma separated floats or comma separated float values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
doc/user_manual/metrics.tex
Outdated
@@ -31,7 +31,7 @@ \section{Metrics} | |||
|
|||
In addition to this XML subnode, the users can also specify the weights for given metric: | |||
\begin{itemize} | |||
\item \xmlNode{sample\_weight}, \xmlDesc{python list, optional parameter}, the weights for each value in \textit{u} | |||
\item \xmlNode{sample\_weight}, \xmlDesc{comma separated float, optional parameter}, the weights for each value in \textit{u} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
doc/user_manual/metrics.tex
Outdated
@@ -411,7 +411,7 @@ \subsection{Dynamic Time Warping} | |||
\begin{itemize} | |||
\item \xmlNode{order}, \xmlDesc{int, required field}, order of the DTW calculation: $0$ specifices a classical DTW caluclation and $1$ specifies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly do not get why we need integer here to specify a enumeration list.
Can we change this in classical
and derivative
.
@mandd . In addition, can we add a bit more info regarding what this option is supposed to trigger?????
doc/user_manual/postprocessor.tex
Outdated
@@ -1803,6 +1803,8 @@ \subsubsection{Metric} | |||
\textbf{mean, max, min, raw\_values} over the time. For example, when `mean' is used, the metrics' calculations | |||
will be averaged over the time. When `raw\_values' is used, the full set of metrics' calculations will be dumped. | |||
\default{raw\_values} | |||
\item \xmlNode{weight}, \xmlDesc{comma separated float, optional field}, when `mean' is provided for \xmlNode{multiOutput}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comma separated floats or comma separated float values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
For Change Control Board: Change Request ReviewThe following review must be completed by an authorized member of the Change Control Board.
|
Job Mingw Test on a41e20d : invalidated by @wangcj05 |
@alfoa Tests are green now. |
@alfoa Can you take a look at this PR? Based on my understanding, you have already reviewed it, and the tests are green. Do you have additional comments? |
Tests pass... Checklist passed... Issue closure list passed... Merging... |
Pull Request Description
What issue does this change request address? (Use "#" before the issue to link it, i.e., #42.)
close #333
What are the significant changes in functionality due to this change request?
Time-dependent metrics system is implemented.
For Change Control Board: Change Request Review
The following review must be completed by an authorized member of the Change Control Board.
<internalParallel>
to True.