-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
257 Adding averaging time using timeit #285
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problems with where the default for num_runs
is defined comes back to our discussions around where default options should be defined in general - it is not necessarily a big concern for this PR, but I think if there are any other PRs where we might be changing the default options, it would be a good idea to refactor the default definitions first.
docs/source/users/options.rst
Outdated
``num_runs`` | ||
------------------- | ||
|
||
Number of runs is used to define how many times FitBenchmarking calls a minimizer and thus calculates an average elapsed time using `timeit`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number of runs is used to define how many times FitBenchmarking calls a minimizer and thus calculates an average elapsed time using `timeit`. | |
Number of runs defines how many times FitBenchmarking calls a minimizer and thus calculates an average elapsed time using `timeit`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected
""" | ||
Fit benchmark one problem, with one function definition and all | ||
the selected minimizers, using the chosen fitting software. | ||
|
||
@param controller :: The software controller for the fitting | ||
@param minimizers :: array of minimizers used in fitting | ||
@param num_runs :: number of times controller.fit() is run to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docstring standardisation should include types for parameters (see #236), particularly if functions are part of the API (which this might be). Probably a good opportunity to add types for existing parameters as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have gone through fitbenchmarking/fitbenchmark_one_problem.py
to standardise the doc strings. I was not sure what to do with multiple return arguments so let me know what you think about how I did it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of a tricky one using the sphinx docstring style, as it doesn't deal with multiple return types as well as the numpy style. However I would be inclined to either put :rtype :: tuple
and then define the individual types within :return ::
(as is already the case), or at least put brackets around what you currently have (so :rtype :: (list of FittingResult, plot_helper.data instance)
) - the reason for this is to differentiate it from the situation where you actually have single object returned but the type can vary (e.g. if you had a conditional which returned list
and the else
returned str
). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think :rtype :: (list of FittingResult, plot_helper.data instance)
looks the best option for this, makes it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
""" | ||
Sets up the controller for a particular problem and fits the models | ||
provided in the problem object. The best fit, along with the data and a | ||
starting guess is then plotted on a visual display page. | ||
|
||
@param user_input :: all the information specified by the user | ||
@param problem :: a problem object containing information used in fitting | ||
@param num_runs :: number of times controller.fit() is run to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as comment about docstrings below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
start_time = time.time() | ||
controller.fit() | ||
end_time = time.time() | ||
runtime = timeit.timeit(controller.fit, number=num_runs) / num_runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning from timeit.timeit
here does reveal if any of the calls to controller.fit
have been cached. While this may not be likely, it is possible that we could inadvertently cache the result from the controllers we have implemented, and definitely possible that a contributor might make this mistake - this would result in significantly reduced average runtimes. I tested this with a toy problem to confirm.
timeit.repeat
returns a list of n repeats (each of which is the sum of number
executions, so you might want to set this to 1
, rather than the default of 1000000
). This could then be used to suggest if caching has occurred, which could result in a warning. It would also allow the uncertainty on the runtime to be estimated, which might be a useful additional property for a user to have access to (although probably not by default) - not for this PR but this would at least make it easier to implement this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I checked the timings using timeit.timeit
and timeit.repeat
and they seem to be pretty much the same. However I have changed the code to use timeit.repeat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah they should be exactly the same without caching - this just allows us to catch this. So maybe we should provide a warning
if the stdev is 'large'? I don't really mind what is determined to be 'large'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a check to see if the standard deviation is high
fitbenchmarking/fitbenchmarking/fitbenchmark_one_problem.py
Lines 146 to 150 in 1c31a9d
cv = np.std(runtime_list) / np.mean(runtime_list) | |
if cv > 0.2: | |
raise Warning('The ratio of the standard deviation and mean' | |
' is larger than {}, this may indicate caching' | |
' is used in the timing results.'.format()) |
Is 0.2 an ok measure do you think?
start_time = time.time() | ||
controller.fit() | ||
end_time = time.time() | ||
runtime = timeit.timeit(controller.fit, number=num_runs) / num_runs | ||
except Exception as e: | ||
print(e.message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will all e
have message
attributes? Otherwise this could raise an AttributeError
. Do we want to create an issue for this? Does it have associated tests for exceptional behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it does but not 100% sure. At the moment there are no specific tests for this file other than the testing the examples scripts run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, I've found the problem - Exception.message
was deprecated in Python 2.5, so while it still works in 2.7 it is undocumented, and in 3+ it is unsupported. The new form is that casting the Exception
to str
returns the message, so this should be changed to print(str(e))
- do you want to do that here or would you prefer me to create a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its best if we do that here otherwise it might be put at the bottom of the issue pile in the future, do you mind changing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problems, I've done it and made it PEP 8 compliant (albeit in one case by disabling a pylint warning, although I think it is probably justified here).
""" | ||
Gather the user input and list of paths. Call benchmarking on these. | ||
|
||
@param group_name :: is the name (label) for a group. E.g. the name for the group of problems in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not too irritating, I'd suggest adding types for these parameters and the return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
minimizers, software = misc.get_minimizers(software_options) | ||
num_runs = software_options.get('num_runs', None) | ||
|
||
if num_runs is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a lot of work just to set a single variable - I could be wrong but it appears that some of this is redundant. Could you please explain the process you are following to set num_runs?
Also it appears that the default of 5
for num_runs gets set in two places: here and in the default options json. This probably shouldn't be the case, as it might produce unexpected behaviour for a user if they created their own default without num_runs
(or with num_runs=None
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, users can define a software_options
as a dictionary. If num_runs
is not defined in the dictionary then fitbenchmarking.utils.options
is used to figure out the default option from the json file. This is similar to how the comparison_mode is set:
fitbenchmarking/fitbenchmarking/results_output.py
Lines 38 to 50 in 3685404
minimizers, software = misc.get_minimizers(software_options) | |
comparison_mode = software_options.get('comparison_mode', None) | |
if comparison_mode is None: | |
if 'options_file' in software_options: | |
options_file = software_options['options_file'] | |
comparison_mode = options.get_option(options_file=options_file, | |
option='comparison_mode') | |
else: | |
comparison_mode = options.get_option(option='comparison_mode') | |
if comparison_mode is None: | |
comparison_mode = 'both' |
I have removed the second default option of 5 for num_runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I think I've figured out why this is confusing me:
line 42 can only be True
if software_options['num_runs'] is None
. It seems like this would be a strange situation to occur. However if it did occur, then line 51 always sets options_file
to None
(based on the excerpt above I think your addition should be options_file = software_options['options_file']
, rather than sofware_options['num_runs']
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, and if you follow the excerpt above, then your line 50 needs to be if 'options_file'...
rather than if 'num_runs'...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
return prob_results, results_dir | ||
def _benchmark(user_input, problem_group, num_runs=5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this another place where the default for num_runs
is specified? This one should probably go as it is a 'private' function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_runs
is defined in the json file so I have no removed this from this function
|
||
@param user_input :: all the information specified by the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, types if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the doc strings
|
||
parsed_problems = [parse.parse_problem_file(p) for p in problem_group] | ||
if not isinstance(user_input, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem with this PR, but this code could really do with a refactor - do you know if there is an issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is any issue that talks is for refactoring code, as Andrew cleans up things (such as the fitting and parsing) he is refactoring those places
Just before you make any changes, I've done some refactoring for PEP 8 compliance which I need to commit - it's not letting me push at the moment so I just need to work out why. |
|
||
if not controller.success: | ||
chi_sq = np.nan | ||
status = 'failed' | ||
else: | ||
cv = np.std(runtime_list) / np.mean(runtime_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, as my original suggestion was a bad metric for determining whether caching has occurred. It would be better to test the ratio of the max to the min (as if a large number of runs were cached then the stdefv could be low despite a single outlier). I've had a look at the timeit source code for Cython and the ratio they use is 4, so we could pick this to be consistent.
Also I think rather than raising the Warning maybe we should use warnings.warn
, so just:
warnings.warn('The ratio of the max time to the min is larger than {}, which may indicate that caching has occurred in the timing results`.format(tolerance))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@@ -89,7 +88,15 @@ def main(argv): | |||
|
|||
# Processes software_options dictionary into Fitbenchmarking format | |||
minimizers, software = misc.get_minimizers(software_options) | |||
|
|||
num_runs = software_options.get('num_runs', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this pattern need to be included here as well as in fitbenchmark_group
? Is it not dealt with there if the user doesn't define it correctly here? If it is required then maybe it would be worth adding a comment or two explaining what it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From our meeting on Friday I thought the idea was to remove the expert scirpt so this would not be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's fine.
I discovered that when running the While @wathen did not experience the same warnings, @AndrewLister-STFC had the same while running on a clean cloud instance. The origin of this behaviour (max run time > 4 x min run time) is not obvious, and for the vast majority of problems/minimizer combinations, the ratio is < 2. We decided that the current determination of anomalous run times (max/min ratio) is appropriate, and this warning should occur. We recommend that the html cell for each timing result (which will be the average) is hyperlinked to the individual results. This has the added benefit that the user can calculate the uncertainty on the runtimes. NB: For context, the max/min ratio is the same criteria that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the accuracy tables are still the same
Accuracies are consistent with master.
New timing results are consistent between runs
Other than timing variations discussed in previous comment, timing are consistent.
Function: Does the change do what it's supposed to?
Yes, tabulated runtimes are now the average of n runs (default 5). Rather than discard the highest and lowest (as suggested in #257 ), the ratio of these is calculated and the user is warned if this is > some tolerance (set to 4).
Tests: Does it pass? Is there adequate coverage for new code?
There are currently no tests associated with either of the python modules modified, and so no new coverage has been added. This will be addressed in #290 .
Style: Is the coding style consistent? Is anything overly confusing?
Coding style is fine, other than problem with how options are set which is consistent with existing code - this will be rectified in #260 .
Documentation: Is there a suitable change to documentation for this change?
Yes.
Description of Work
Fixes #257
Allows FitBenchmarking to calls a minimizer multiple of times and thus calculates an average elapsed time using
timeit
.Testing Instructions
Function: Does the change do what it's supposed to?
Tests: Does it pass? Is there adequate coverage for new code?
Style: Is the coding style consistent? Is anything overly confusing?
Documentation: Is there a suitable change to documentation for this change?