MergeRuns sum option and error control #19076

Merged
merged 22 commits into from Mar 29, 2017

Conversation

Projects
None yet
3 participants
@ianbush
Member

ianbush commented Mar 7, 2017

Adds the following changes:

  1. Sum merge type - this allows a sample log to be simply summed
  2. Sample logs are now overwritten, and not renamed
  3. Option to skip a file if binning is different
  4. Option to throw an error instead of skipping

To test:

Load 2 data files that make sense to merge.

  1. Try setting a sum property in MergeRuns, either via the algorithm or instrument parameters file, check the output is as expected
  2. Sample logs should be overwitten with new type, should be noticable for TimeSeries and List options. Check a TimeSeries/List is created instead of any existing numeric type. Check merging two workspaces that were already merged once also works.
  3. Rebin one workspace, check that workspace is skipped in the merge
  4. Check the algorithm throws when using RebinBehaviour=Skip and FailBehaviour=Stop. Also check for the SampleLogsFail option and FailBehaviour=Stop.

Also try a few unscripted tests, and check code looks good.

Fixes #19042
Fixes #17520


Reviewer

Please comment on the following (full description):

Code Review
  • Is the code of an acceptable quality?
  • Does the code conform to the coding standards? Is it well structured with small focussed classes/methods/functions?
  • Are there unit/system tests in place? Are the unit tests small and test the a class in isolation?
  • If there are changes in the release notes then do they describe the changes appropriately?
Functional Tests
  • Do changes function as described? Add comments below that describe the tests performed?

  • How do the changes handle unexpected situations, e.g. bad input?

  • Has the relevant documentation been added/updated?

  • Is user-facing documentation written in a user-friendly manner?

  • Has developer documentation been updated if required?

  • Does everything look good? Comment with the ship it emoji but don't merge. A member of @mantidproject/gatekeepers will take care of it.

@ianbush ianbush added this to the Release 3.10 milestone Mar 7, 2017

@ianbush ianbush requested a review from soininen Mar 7, 2017

};
SampleLogsBehaviour(API::MatrixWorkspace &ws, Kernel::Logger &logger,
+ const std::string sampleLogsSum,

This comment has been minimized.

@soininen

soininen Mar 7, 2017

Contributor

Is there a reason why not to pass the strings as references here (and in a few places below)?

@soininen

soininen Mar 7, 2017

Contributor

Is there a reason why not to pass the strings as references here (and in a few places below)?

This comment has been minimized.

@ianbush

ianbush Mar 8, 2017

Member

This is now fixed.

@ianbush

ianbush Mar 8, 2017

Member

This is now fixed.

Framework/Algorithms/src/MergeRuns.cpp
@@ -40,6 +46,9 @@ void MergeRuns::init() {
declareProperty(make_unique<WorkspaceProperty<Workspace>>(
"OutputWorkspace", "", Direction::Output),
"Name of the output workspace");
+ declareProperty(
+ "SampleLogsSum", "",
+ "A comma separated list of the sample logs to sum into a single entry.");

This comment has been minimized.

@soininen

soininen Mar 7, 2017

Contributor

Adding this property here will break the algorithm's Python signature, no?

@soininen

soininen Mar 7, 2017

Contributor

Adding this property here will break the algorithm's Python signature, no?

This comment has been minimized.

@ianbush

ianbush Mar 8, 2017

Member

Indeed it would, I have moved this so it should not break the signature now.

@ianbush

ianbush Mar 8, 2017

Member

Indeed it would, I have moved this so it should not break the signature now.

Framework/Algorithms/src/MergeRuns.cpp
+ g_log.error() << "Could not merge run: " << it->get()->getName()
+ << ". Binning is different from first workspace. "
+ "MergeRuns will continue but this run will be "
+ "skipped.";

This comment has been minimized.

@soininen

soininen Mar 7, 2017

Contributor

Missing \n at the end of the error.

@soininen

soininen Mar 7, 2017

Contributor

Missing \n at the end of the error.

Framework/Algorithms/src/MergeRuns.cpp
+ g_log.error()
+ << "Could not merge run: " << it->get()->getName()
+ << ". Reason: \"" << e.what()
+ << "\". MergeRuns will continue but this run will be skipped.";

This comment has been minimized.

@soininen

soininen Mar 7, 2017

Contributor

Missing \n.

@soininen

soininen Mar 7, 2017

Contributor

Missing \n.

@@ -181,12 +181,13 @@ void SampleLogsBehaviour::setSampleMap(SampleLogsMap &map,
continue;
}
- // Check 4: Can the property can be converted to a double? If not, and time
- // series case, log an error but continue.
+ // Check 4: Can the property can be converted to a double? If not, and sum

This comment has been minimized.

@soininen

soininen Mar 7, 2017

Contributor

Extra 'can'?

@soininen

soininen Mar 7, 2017

Contributor

Extra 'can'?

timeSeriesProp->addValue(startTime, value);
+ ws.mutableRun().removeLogData(item);

This comment has been minimized.

@soininen

soininen Mar 7, 2017

Contributor

I wonder if you could just

ws.mutableRun().addProperty(std::move(timeSeriesProp), true); // True for overwrite.

and be done with removeLogData/addLogData.

@soininen

soininen Mar 7, 2017

Contributor

I wonder if you could just

ws.mutableRun().addProperty(std::move(timeSeriesProp), true); // True for overwrite.

and be done with removeLogData/addLogData.

+
+ if (returnProp->type().compare("string") != 0) {
+ ws.mutableRun().removeLogData(item);
+ ws.mutableRun().addProperty(item, value);

This comment has been minimized.

@soininen

soininen Mar 7, 2017

Contributor

I think you can set the 'overwrite' parameter of addProperty() to true to get rid of the removeLogData() call above.

@soininen

soininen Mar 7, 2017

Contributor

I think you can set the 'overwrite' parameter of addProperty() to true to get rid of the removeLogData() call above.

+ // See if property exists as a TimeSeriesLog already - merging an output of
+ // MergeRuns
+ ws.run().getTimeSeriesProperty<double>(item);
+ returnProp = std::shared_ptr<Property>(ws.getLog(item)->clone());

This comment has been minimized.

@soininen

soininen Mar 7, 2017

Contributor

Would

returnProp.reset(ws.getLog(item)->clone());

save a construction and an assignment?

@soininen

soininen Mar 7, 2017

Contributor

Would

returnProp.reset(ws.getLog(item)->clone());

save a construction and an assignment?

This comment has been minimized.

@ianbush

ianbush Mar 8, 2017

Member

Yes, I have changed this elsewhere where the same pattern occurs too.

@ianbush

ianbush Mar 8, 2017

Member

Yes, I have changed this elsewhere where the same pattern occurs too.

+ * @param name the name of the property
+ */
+void SampleLogsBehaviour::updateSumProperty(double addeeWSNumber,
+ double outWSNumber,

This comment has been minimized.

@soininen

soininen Mar 7, 2017

Contributor

The argument names bring to my mind runNumbers. Would valueInAddeeWS or something similar be better?

@soininen

soininen Mar 7, 2017

Contributor

The argument names bring to my mind runNumbers. Would valueInAddeeWS or something similar be better?

This comment has been minimized.

@ianbush

ianbush Mar 8, 2017

Member

I wanted to convey it is a number not, say, a string value. I have gone for addeeWSNumericValue.

@ianbush

ianbush Mar 8, 2017

Member

I wanted to convey it is a number not, say, a string value. I have gone for addeeWSNumericValue.

+ double outWSNumber,
+ MatrixWorkspace &outWS,
+ const std::string name) {
+ outWS.mutableRun().addProperty(name, addeeWSNumber + outWSNumber, name, true);

This comment has been minimized.

@soininen

soininen Mar 7, 2017

Contributor

Why name is here twice? Isn't this the

addProperty(const std::string &name, const TYPE &value,
                   const std::string &units, bool overwrite = false)

signature, in which case you are giving quite a strange units for the property.

@soininen

soininen Mar 7, 2017

Contributor

Why name is here twice? Isn't this the

addProperty(const std::string &name, const TYPE &value,
                   const std::string &units, bool overwrite = false)

signature, in which case you are giving quite a strange units for the property.

This comment has been minimized.

@ianbush

ianbush Mar 8, 2017

Member

Whoops, yes, that would have some strange units!

@ianbush

ianbush Mar 8, 2017

Member

Whoops, yes, that would have some strange units!

@soininen

This comment has been minimized.

Show comment
Hide comment
@soininen

soininen Mar 7, 2017

Contributor

An example on how to use the sum functionality in instrument parameter files could be added to the documentation, as well.

Contributor

soininen commented Mar 7, 2017

An example on how to use the sum functionality in instrument parameter files could be added to the documentation, as well.

@ianbush

This comment has been minimized.

Show comment
Hide comment
@ianbush

ianbush Mar 8, 2017

Member

I have pushed the code changes, and also added a doc test for the sum option.

Member

ianbush commented Mar 8, 2017

I have pushed the code changes, and also added a doc test for the sum option.

@ianbush

This comment has been minimized.

Show comment
Hide comment
@ianbush

ianbush Mar 8, 2017

Member

@soininen I realise I have missed the point on the doc changes - I will add the instrument parameter file bits for the sum option too.

Member

ianbush commented Mar 8, 2017

@soininen I realise I have missed the point on the doc changes - I will add the instrument parameter file bits for the sum option too.

@soininen

This comment has been minimized.

Show comment
Hide comment
@soininen

soininen Mar 10, 2017

Contributor

The algorithm does a bit too strict checking: if I list a sample log entry such as 'sample.temperature' in SampleLogsTimeSeries and SampleLogsWarn at the same time, I get:

Error when making list of merge items, sample log "sample.temperature" defined more than once!

Here, I would like to merge the temperatures as time series but at the same time make sure the values do not differ too much.

Contributor

soininen commented Mar 10, 2017

The algorithm does a bit too strict checking: if I list a sample log entry such as 'sample.temperature' in SampleLogsTimeSeries and SampleLogsWarn at the same time, I get:

Error when making list of merge items, sample log "sample.temperature" defined more than once!

Here, I would like to merge the temperatures as time series but at the same time make sure the values do not differ too much.

@soininen

This comment has been minimized.

Show comment
Hide comment
@soininen

soininen Mar 10, 2017

Contributor

Why running this:

mantid.config.appendDataSearchDir('/net4/serdon/illdata/162/in4/exp_4-01-1529/rawdata/')

ws1 = Load('086490')
ws2 = Load('086491')

merged = MergeRuns('ws1, ws2', SampleLogsTimeSeries='sample.temperature')

gives this warning:

TimeSeriesProperty sample.temperature could not be added to another property of the same name but incompatible type.

?

Contributor

soininen commented Mar 10, 2017

Why running this:

mantid.config.appendDataSearchDir('/net4/serdon/illdata/162/in4/exp_4-01-1529/rawdata/')

ws1 = Load('086490')
ws2 = Load('086491')

merged = MergeRuns('ws1, ws2', SampleLogsTimeSeries='sample.temperature')

gives this warning:

TimeSeriesProperty sample.temperature could not be added to another property of the same name but incompatible type.

?

@ianbush

This comment has been minimized.

Show comment
Hide comment
@ianbush

ianbush Mar 10, 2017

Member

The new restriction is there because sum, time series and list now can not be used together. It does make sense to allow fail/warn in combination with these so I try and loosen that restriction.

I'll take a look at that warning.

Member

ianbush commented Mar 10, 2017

The new restriction is there because sum, time series and list now can not be used together. It does make sense to allow fail/warn in combination with these so I try and loosen that restriction.

I'll take a look at that warning.

ianbush added some commits Mar 10, 2017

Refs #19042 Add tests for merging options
Also fixed an issue with removing and re-adding the properties for a time series merge.
@ianbush

This comment has been minimized.

Show comment
Hide comment
@ianbush

ianbush Mar 14, 2017

Member

@soininen The changes we discussed are here now, you can use sum, time series and list merge in combination with warn/fail options. The warning is gone too. Please try testing lots of permutations again and let me know if things look good now.

Member

ianbush commented Mar 14, 2017

@soininen The changes we discussed are here now, you can use sum, time series and list merge in combination with warn/fail options. The warning is gone too. Please try testing lots of permutations again and let me know if things look good now.

@soininen

This comment has been minimized.

Show comment
Hide comment
@soininen

soininen Mar 15, 2017

Contributor

Where does the Duration info in time series come from? When running this:

wss = Load(Filename = '/net4/serdon/illdata/162/in4/exp_4-01-1529/rawdata/086490:086491')
merged = MergeRuns(InputWorkspaces='086490, 086491', SampleLogsTimeSeries='sample.temperature')

I get Duration = 1800s for the 'sample.temperature' time series, but neither workspace has such duration. In fact, both input workspaces have 'duration' = 0, and the real duration is in the actual_time log.

There is also a strange warning when highlighting the 'sample.temperature' log in the Sample log window for the merged workspace:
Could not find log value running in workspace

Contributor

soininen commented Mar 15, 2017

Where does the Duration info in time series come from? When running this:

wss = Load(Filename = '/net4/serdon/illdata/162/in4/exp_4-01-1529/rawdata/086490:086491')
merged = MergeRuns(InputWorkspaces='086490, 086491', SampleLogsTimeSeries='sample.temperature')

I get Duration = 1800s for the 'sample.temperature' time series, but neither workspace has such duration. In fact, both input workspaces have 'duration' = 0, and the real duration is in the actual_time log.

There is also a strange warning when highlighting the 'sample.temperature' log in the Sample log window for the merged workspace:
Could not find log value running in workspace

@soininen

This comment has been minimized.

Show comment
Hide comment
@soininen

soininen Mar 15, 2017

Contributor

The algorithm gives an error, if I want to merge nonexistent logs as time series when giving the logs as input properties. This is all fine, but it seems that the algorithm silently ignores nonexistent logs when they are specified in the IPF. Maybe a warning should be shown in this case?

Contributor

soininen commented Mar 15, 2017

The algorithm gives an error, if I want to merge nonexistent logs as time series when giving the logs as input properties. This is all fine, but it seems that the algorithm silently ignores nonexistent logs when they are specified in the IPF. Maybe a warning should be shown in this case?

@soininen

This comment has been minimized.

Show comment
Hide comment
@soininen

soininen Mar 15, 2017

Contributor

In my IPF I have this:

<parameter name="sample_logs_warn_tolerances" type="string">
	<value val="kaljaa" />
</parameter>

This kills the algorithm, since "kaljaa" is not a number, but the error message is cryptic:

stod
Contributor

soininen commented Mar 15, 2017

In my IPF I have this:

<parameter name="sample_logs_warn_tolerances" type="string">
	<value val="kaljaa" />
</parameter>

This kills the algorithm, since "kaljaa" is not a number, but the error message is cryptic:

stod
@ianbush

This comment has been minimized.

Show comment
Hide comment
@ianbush

ianbush Mar 15, 2017

Member

In the time series logs actual date-time is used, taken from start_time in the workspace. The files seem to be 30 minutes apart so the result is as I would expect.

Member

ianbush commented Mar 15, 2017

In the time series logs actual date-time is used, taken from start_time in the workspace. The files seem to be 30 minutes apart so the result is as I would expect.

@ianbush

This comment has been minimized.

Show comment
Hide comment
@ianbush

ianbush Mar 15, 2017

Member

I have just pushed changes to give better error messages when tolerances can not be converted to doubles, and supressed the warning when looking at time series logs (the running log is something specific to ISIS).

When trying to merge a non-existent time series (or other) log in the IPF you should see Could not merge sample log "kaljaa", does not exist in workspace! This sample log will be ignored. Do you not get that?

Member

ianbush commented Mar 15, 2017

I have just pushed changes to give better error messages when tolerances can not be converted to doubles, and supressed the warning when looking at time series logs (the running log is something specific to ISIS).

When trying to merge a non-existent time series (or other) log in the IPF you should see Could not merge sample log "kaljaa", does not exist in workspace! This sample log will be ignored. Do you not get that?

ianbush added some commits Mar 15, 2017

@soininen soininen self-assigned this Mar 17, 2017

@soininen

This comment has been minimized.

Show comment
Hide comment
@soininen

soininen Mar 17, 2017

Contributor

I do get the error message when I mention a non-existent log for MergeRuns.

Contributor

soininen commented Mar 17, 2017

I do get the error message when I mention a non-existent log for MergeRuns.

@soininen

This comment has been minimized.

Show comment
Hide comment
@soininen

soininen Mar 17, 2017

Contributor

This might be something to consider for another issue/PR, but I'll leave it here anyway:

I'm wondering if we could handle the 'start_time' and 'end_time' logs in a graceful way, as well. Preferably one would choose the minimum (earliest date/time) of 'start_time's and the maximum (last date/time) of 'end_time's for the merged workspace. These are string logs, and though they seems to be in some standard format, I don't know if their values can be parsed reliably.

Contributor

soininen commented Mar 17, 2017

This might be something to consider for another issue/PR, but I'll leave it here anyway:

I'm wondering if we could handle the 'start_time' and 'end_time' logs in a graceful way, as well. Preferably one would choose the minimum (earliest date/time) of 'start_time's and the maximum (last date/time) of 'end_time's for the merged workspace. These are string logs, and though they seems to be in some standard format, I don't know if their values can be parsed reliably.

@ianbush

This comment has been minimized.

Show comment
Hide comment
@ianbush

ianbush Mar 20, 2017

Member

Yes, let's leave the start and end times for a separate PR, that will need some thought. Parsing the values is not an issue, that is done to construct the time series. Depending on how you treat other information you might really want a vector of start/end times to account for all the data.

Member

ianbush commented Mar 20, 2017

Yes, let's leave the start and end times for a separate PR, that will need some thought. Parsing the values is not an issue, that is done to construct the time series. Depending on how you treat other information you might really want a vector of start/end times to account for all the data.

@soininen

This comment has been minimized.

Show comment
Hide comment
@soininen

soininen Mar 23, 2017

Contributor

@ianbush The algorithm seem to happily accept negative fail and warn tolerances.

Contributor

soininen commented Mar 23, 2017

@ianbush The algorithm seem to happily accept negative fail and warn tolerances.

@ianbush

This comment has been minimized.

Show comment
Hide comment
@ianbush

ianbush Mar 23, 2017

Member

@soininen - an error is now thrown if a negative tolerance is given.

Member

ianbush commented Mar 23, 2017

@soininen - an error is now thrown if a negative tolerance is given.

@soininen

This comment has been minimized.

Show comment
Hide comment
@soininen

soininen Mar 28, 2017

Contributor

Tested this with different combinations of sums, time_series, warnings, failings and tolerances. Everything seems to work as expected. The code changes look good to me, as well. Nice work! :shipit:

Contributor

soininen commented Mar 28, 2017

Tested this with different combinations of sums, time_series, warnings, failings and tolerances. Everything seems to work as expected. The code changes look good to me, as well. Nice work! :shipit:

@peterfpeterson peterfpeterson merged commit 434df37 into master Mar 29, 2017

9 checks passed

ClangFormat Jenkins build pull_requests-clang-format 12274 has succeeded
Details
Doxygen Jenkins build pull_requests-doxygen 11660 has succeeded
Details
Flake8 Jenkins build pull_requests-flake8 2976 has succeeded
Details
OSX Jenkins build pull_requests-osx 12885 has succeeded
Details
RHEL7 + System Tests Jenkins build pull_requests-rhel7 12713 has succeeded
Details
Ubuntu + Doc Tests Jenkins build pull_requests-ubuntu 13321 has succeeded
Details
Ubuntu Python 3 Jenkins build pull_requests-ubuntu-python3 863 has succeeded
Details
Windows Jenkins build pull_requests-win7 13582 has succeeded
Details
cppcheck Jenkins build pull_requests-cppcheck 13272 has succeeded
Details

@peterfpeterson peterfpeterson deleted the 19042_MergeRuns_sum_option branch Mar 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment