Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

257 Adding averaging time using timeit #285

Merged
merged 19 commits into from
Nov 12, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions docs/source/users/options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ Scipy:
- ``"dogbox"``
- ``"lm"``
- and ``"trf"``


Information about these can be found on the
`Scipy documentation
Expand All @@ -128,9 +128,9 @@ DFO-GN:
- ``"dfogn"``
Information about this can be found on the
`DFO-GN documentation
<http://people.maths.ox.ac.uk/robertsl/dfogn/>`__
<http://people.maths.ox.ac.uk/robertsl/dfogn/>`__


``comparison_mode``
-------------------
The comparison mode is used when displaying results to select the value
Expand All @@ -152,3 +152,11 @@ Available options are ``"abs"``, ``"rel"``, or ``"both"``.
Return both absolute and relative values.
Values will be shown as an absolute value followed by a relative value in
parentheses.


``num_runs``
-------------------

Number of runs is used to define how many times FitBenchmarking calls a minimizer and thus calculates an average elapsed time using `timeit`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Number of runs is used to define how many times FitBenchmarking calls a minimizer and thus calculates an average elapsed time using `timeit`.
Number of runs defines how many times FitBenchmarking calls a minimizer and thus calculates an average elapsed time using `timeit`.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected


Default set as `5`.
21 changes: 11 additions & 10 deletions fitbenchmarking/fitbenchmark_one_problem.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from __future__ import absolute_import, division, print_function

import time
import timeit

import numpy as np

Expand All @@ -28,14 +28,16 @@
ScipyController = None


def fitbm_one_prob(user_input, problem):
def fitbm_one_prob(user_input, problem, num_runs):
"""
Sets up the controller for a particular problem and fits the models
provided in the problem object. The best fit, along with the data and a
starting guess is then plotted on a visual display page.

@param user_input :: all the information specified by the user
@param problem :: a problem object containing information used in fitting
@param num_runs :: number of times controller.fit() is run to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comment about docstrings below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

generate an average runtime

@returns :: nested array of result objects, per function definition
containing the fit information
Expand Down Expand Up @@ -66,7 +68,8 @@ def fitbm_one_prob(user_input, problem):
controller.function_id = i

results_problem, best_fit = benchmark(controller=controller,
minimizers=user_input.minimizers)
minimizers=user_input.minimizers,
num_runs=num_runs)

if best_fit is not None:
# Make the plot of the best fit
Expand All @@ -80,13 +83,15 @@ def fitbm_one_prob(user_input, problem):
return results_fit_problem


def benchmark(controller, minimizers):
def benchmark(controller, minimizers, num_runs):
"""
Fit benchmark one problem, with one function definition and all
the selected minimizers, using the chosen fitting software.

@param controller :: The software controller for the fitting
@param minimizers :: array of minimizers used in fitting
@param num_runs :: number of times controller.fit() is run to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docstring standardisation should include types for parameters (see #236), particularly if functions are part of the API (which this might be). Probably a good opportunity to add types for existing parameters as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone through fitbenchmarking/fitbenchmark_one_problem.py to standardise the doc strings. I was not sure what to do with multiple return arguments so let me know what you think about how I did it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a tricky one using the sphinx docstring style, as it doesn't deal with multiple return types as well as the numpy style. However I would be inclined to either put :rtype :: tuple and then define the individual types within :return :: (as is already the case), or at least put brackets around what you currently have (so :rtype :: (list of FittingResult, plot_helper.data instance)) - the reason for this is to differentiate it from the situation where you actually have single object returned but the type can vary (e.g. if you had a conditional which returned list and the else returned str). What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think :rtype :: (list of FittingResult, plot_helper.data instance) looks the best option for this, makes it clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

generate an average runtime

@returns :: nested array of result objects, per minimizer
and data object for the best fit data
Expand All @@ -102,15 +107,11 @@ def benchmark(controller, minimizers):
init_function_def = controller.problem.get_function_def(params=controller.initial_params,
function_id=controller.function_id)
try:
start_time = time.time()
controller.fit()
end_time = time.time()
runtime = timeit.timeit(controller.fit, number=num_runs) / num_runs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning from timeit.timeit here does reveal if any of the calls to controller.fit have been cached. While this may not be likely, it is possible that we could inadvertently cache the result from the controllers we have implemented, and definitely possible that a contributor might make this mistake - this would result in significantly reduced average runtimes. I tested this with a toy problem to confirm.

timeit.repeat returns a list of n repeats (each of which is the sum of number executions, so you might want to set this to 1, rather than the default of 1000000). This could then be used to suggest if caching has occurred, which could result in a warning. It would also allow the uncertainty on the runtime to be estimated, which might be a useful additional property for a user to have access to (although probably not by default) - not for this PR but this would at least make it easier to implement this in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I checked the timings using timeit.timeit and timeit.repeat and they seem to be pretty much the same. However I have changed the code to use timeit.repeat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah they should be exactly the same without caching - this just allows us to catch this. So maybe we should provide a warning if the stdev is 'large'? I don't really mind what is determined to be 'large'.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a check to see if the standard deviation is high

cv = np.std(runtime_list) / np.mean(runtime_list)
if cv > 0.2:
raise Warning('The ratio of the standard deviation and mean'
' is larger than {}, this may indicate caching'
' is used in the timing results.'.format())

Is 0.2 an ok measure do you think?

except Exception as e:
print(e.message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will all e have message attributes? Otherwise this could raise an AttributeError. Do we want to create an issue for this? Does it have associated tests for exceptional behaviour?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it does but not 100% sure. At the moment there are no specific tests for this file other than the testing the examples scripts run

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I've found the problem - Exception.message was deprecated in Python 2.5, so while it still works in 2.7 it is undocumented, and in 3+ it is unsupported. The new form is that casting the Exception to str returns the message, so this should be changed to print(str(e)) - do you want to do that here or would you prefer me to create a separate issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its best if we do that here otherwise it might be put at the bottom of the issue pile in the future, do you mind changing it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problems, I've done it and made it PEP 8 compliant (albeit in one case by disabling a pylint warning, although I think it is probably justified here).

controller.success = False
end_time = np.inf

runtime = end_time - start_time
runtime = np.inf

controller.cleanup()

Expand Down
4 changes: 3 additions & 1 deletion fitbenchmarking/fitbenchmarking_default_options.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,7 @@
"dfogn" : ["dfogn"]
},

"comparison_mode" : "both"
"comparison_mode" : "both",

"num_runs" : 5
}
131 changes: 73 additions & 58 deletions fitbenchmarking/fitting_benchmarking.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,82 +11,97 @@
from fitbenchmarking.utils.logging_setup import logger

from fitbenchmarking.parsing import parse
from fitbenchmarking.utils import create_dirs, misc
from fitbenchmarking.utils import create_dirs, misc, options
from fitbenchmarking.fitbenchmark_one_problem import fitbm_one_prob


def fitbenchmark_group(group_name, software_options, data_dir,
use_errors=True, results_dir=None):
"""
Gather the user input and list of paths. Call benchmarking on these.
"""
Gather the user input and list of paths. Call benchmarking on these.

@param group_name :: is the name (label) for a group. E.g. the name for the group of problems in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not too irritating, I'd suggest adding types for these parameters and the return.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"NIST/low_difficulty" may be picked to be NIST_low_difficulty
@param software_options :: dictionary containing software used in fitting the problem, list of minimizers and
location of json file contain minimizers
@param data_dir :: full path of a directory that holds a group of problem definition files
@param use_errors :: whether to use errors on the data or not
@param results_dir :: directory in which to put the results. None
means results directory is created for you

@returns :: array of fitting results for the problem group and
the path to the results directory
"""

logger.info("Loading minimizers from {0}".format(
software_options['software']))
minimizers, software = misc.get_minimizers(software_options)
num_runs = software_options.get('num_runs', None)

if num_runs is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of work just to set a single variable - I could be wrong but it appears that some of this is redundant. Could you please explain the process you are following to set num_runs?

Also it appears that the default of 5 for num_runs gets set in two places: here and in the default options json. This probably shouldn't be the case, as it might produce unexpected behaviour for a user if they created their own default without num_runs (or with num_runs=None).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, users can define a software_options as a dictionary. If num_runs is not defined in the dictionary then fitbenchmarking.utils.options is used to figure out the default option from the json file. This is similar to how the comparison_mode is set:

minimizers, software = misc.get_minimizers(software_options)
comparison_mode = software_options.get('comparison_mode', None)
if comparison_mode is None:
if 'options_file' in software_options:
options_file = software_options['options_file']
comparison_mode = options.get_option(options_file=options_file,
option='comparison_mode')
else:
comparison_mode = options.get_option(option='comparison_mode')
if comparison_mode is None:
comparison_mode = 'both'

I have removed the second default option of 5 for num_runs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I think I've figured out why this is confusing me:

line 42 can only be True if software_options['num_runs'] is None. It seems like this would be a strange situation to occur. However if it did occur, then line 51 always sets options_file to None (based on the excerpt above I think your addition should be options_file = software_options['options_file'], rather than sofware_options['num_runs']).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and if you follow the excerpt above, then your line 50 needs to be if 'options_file'... rather than if 'num_runs'...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if 'num_runs' in software_options:
options_file = software_options['num_runs']
num_runs = options.get_option(options_file=options_file,
option='num_runs')
else:
num_runs = options.get_option(option='num_runs')

@param group_name :: is the name (label) for a group. E.g. the name for the group of problems in
"NIST/low_difficulty" may be picked to be NIST_low_difficulty
@param software_options :: dictionary containing software used in fitting the problem, list of minimizers and
location of json file contain minimizers
@param data_dir :: full path of a directory that holds a group of problem definition files
@param use_errors :: whether to use errors on the data or not
@param results_dir :: directory in which to put the results. None
means results directory is created for you
if num_runs is None:
num_runs = 5

@returns :: array of fitting results for the problem group and
the path to the results directory
"""
# create list of paths to all problem definitions in data_dir
problem_group = misc.get_problem_files(data_dir)

logger.info("Loading minimizers from {0}".format(
software_options['software']))
minimizers, software = misc.get_minimizers(software_options)
results_dir = create_dirs.results(results_dir)
group_results_dir = create_dirs.group_results(results_dir, group_name)

# create list of paths to all problem definitions in data_dir
problem_group = misc.get_problem_files(data_dir)
user_input = misc.save_user_input(software, minimizers, group_name,
group_results_dir, use_errors)

results_dir = create_dirs.results(results_dir)
group_results_dir = create_dirs.group_results(results_dir, group_name)
prob_results = _benchmark(user_input, problem_group, num_runs)

user_input = misc.save_user_input(software, minimizers, group_name,
group_results_dir, use_errors)
return prob_results, results_dir

prob_results = _benchmark(user_input, problem_group)

return prob_results, results_dir
def _benchmark(user_input, problem_group, num_runs=5):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this another place where the default for num_runs is specified? This one should probably go as it is a 'private' function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_runs is defined in the json file so I have no removed this from this function

"""
Loops through software and benchmarks each problem within the problem
group.

@param user_input :: all the information specified by the user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, types if possible

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the doc strings

@param problem_group :: list of paths to problem files in the group
e.g. ['NIST/low_difficulty/file1.dat',
'NIST/low_difficulty/file2.dat',
...]
@param num_runs :: number of times controller.fit() is run to
generate an average runtime

def _benchmark(user_input, problem_group):
"""
Loops through software and benchmarks each problem within the problem
group.

@param user_input :: all the information specified by the user
@param problem_group :: list of paths to problem files in the group
e.g. ['NIST/low_difficulty/file1.dat',
'NIST/low_difficulty/file2.dat',
...]
@returns :: array of result objects, per problem per user_input
"""

@returns :: array of result objects, per problem per user_input
"""
parsed_problems = [parse.parse_problem_file(p) for p in problem_group]

parsed_problems = [parse.parse_problem_file(p) for p in problem_group]
if not isinstance(user_input, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem with this PR, but this code could really do with a refactor - do you know if there is an issue for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is any issue that talks is for refactoring code, as Andrew cleans up things (such as the fitting and parsing) he is refactoring those places

list_prob_results = [per_func
for p in parsed_problems
for per_func in fitbm_one_prob(user_input,
p, num_runs)]

if not isinstance(user_input, list):
list_prob_results = [per_func
for p in parsed_problems
for per_func in fitbm_one_prob(user_input, p)]
else:
list_prob_results = [[fitbm_one_prob(u, p, num_runs)
for u in user_input]
for p in parsed_problems]

else:
list_prob_results = [[fitbm_one_prob(u, p)
for u in user_input]
for p in parsed_problems]

# reorganise loop structure from:
# [[[val per minimizer] per function] per input] per problem]
# to:
# [[val per input per minimizer] per function per problem]
list_prob_results = \
[[list_prob_results[prob_idx][input_idx][func_idx][minim_idx]
for input_idx in range(len(user_input))
for minim_idx in range(len(list_prob_results[prob_idx][input_idx][func_idx]))]
for prob_idx in range(len(parsed_problems))
for func_idx in range(len(list_prob_results[prob_idx][0]))]

return list_prob_results
# reorganise loop structure from:
# [[[val per minimizer] per function] per input] per problem]
# to:
# [[val per input per minimizer] per function per problem]
list_prob_results = \
[[list_prob_results[prob_idx][input_idx][func_idx][minim_idx]
for input_idx in range(len(user_input))
for minim_idx in range(len(list_prob_results[prob_idx][input_idx][func_idx]))]
for prob_idx in range(len(parsed_problems))
for func_idx in range(len(list_prob_results[prob_idx][0]))]

return list_prob_results