257 Adding averaging time using timeit #285

wathen · 2019-11-05T10:48:19Z

Description of Work

Fixes #257

Allows FitBenchmarking to calls a minimizer multiple of times and thus calculates an average elapsed time using timeit.

Testing Instructions

Check the accuracy tables are still the same
New timing results are consistent between runs

Function: Does the change do what it's supposed to?

Tests: Does it pass? Is there adequate coverage for new code?

Style: Is the coding style consistent? Is anything overly confusing?

Documentation: Is there a suitable change to documentation for this change?

TOFarmer

The problems with where the default for num_runs is defined comes back to our discussions around where default options should be defined in general - it is not necessarily a big concern for this PR, but I think if there are any other PRs where we might be changing the default options, it would be a good idea to refactor the default definitions first.

TOFarmer · 2019-11-05T11:21:58Z

docs/source/users/options.rst

+``num_runs``
+-------------------
+
+Number of runs is used to define how many times FitBenchmarking calls a minimizer and thus calculates an average elapsed time using `timeit`.


Suggested change

Number of runs is used to define how many times FitBenchmarking calls a minimizer and thus calculates an average elapsed time using `timeit`.

Number of runs defines how many times FitBenchmarking calls a minimizer and thus calculates an average elapsed time using `timeit`.

TOFarmer · 2019-11-05T11:27:43Z

fitbenchmarking/fitbenchmark_one_problem.py

    """
    Fit benchmark one problem, with one function definition and all
    the selected minimizers, using the chosen fitting software.

    @param controller :: The software controller for the fitting
    @param minimizers :: array of minimizers used in fitting
+    @param num_runs :: number of times controller.fit() is run to


I think the docstring standardisation should include types for parameters (see #236), particularly if functions are part of the API (which this might be). Probably a good opportunity to add types for existing parameters as well.

I have gone through fitbenchmarking/fitbenchmark_one_problem.py to standardise the doc strings. I was not sure what to do with multiple return arguments so let me know what you think about how I did it?

It's a bit of a tricky one using the sphinx docstring style, as it doesn't deal with multiple return types as well as the numpy style. However I would be inclined to either put :rtype :: tuple and then define the individual types within :return :: (as is already the case), or at least put brackets around what you currently have (so :rtype :: (list of FittingResult, plot_helper.data instance)) - the reason for this is to differentiate it from the situation where you actually have single object returned but the type can vary (e.g. if you had a conditional which returned list and the else returned str). What do you think?

I think :rtype :: (list of FittingResult, plot_helper.data instance) looks the best option for this, makes it clearer.

Sounds good to me.

TOFarmer · 2019-11-05T11:35:38Z

fitbenchmarking/fitbenchmark_one_problem.py

    """
    Sets up the controller for a particular problem and fits the models
    provided in the problem object. The best fit, along with the data and a
    starting guess is then plotted on a visual display page.

    @param user_input :: all the information specified by the user
    @param problem :: a problem object containing information used in fitting
+    @param num_runs :: number of times controller.fit() is run to


Same as comment about docstrings below.

TOFarmer · 2019-11-05T12:13:29Z

fitbenchmarking/fitbenchmark_one_problem.py

-            start_time = time.time()
-            controller.fit()
-            end_time = time.time()
+            runtime = timeit.timeit(controller.fit, number=num_runs) / num_runs


Returning from timeit.timeit here does reveal if any of the calls to controller.fit have been cached. While this may not be likely, it is possible that we could inadvertently cache the result from the controllers we have implemented, and definitely possible that a contributor might make this mistake - this would result in significantly reduced average runtimes. I tested this with a toy problem to confirm.

timeit.repeat returns a list of n repeats (each of which is the sum of number executions, so you might want to set this to 1, rather than the default of 1000000). This could then be used to suggest if caching has occurred, which could result in a warning. It would also allow the uncertainty on the runtime to be estimated, which might be a useful additional property for a user to have access to (although probably not by default) - not for this PR but this would at least make it easier to implement this in the future.

Good point, I checked the timings using timeit.timeit and timeit.repeat and they seem to be pretty much the same. However I have changed the code to use timeit.repeat

Yeah they should be exactly the same without caching - this just allows us to catch this. So maybe we should provide a warning if the stdev is 'large'? I don't really mind what is determined to be 'large'.

I've added a check to see if the standard deviation is high

fitbenchmarking/fitbenchmarking/fitbenchmark_one_problem.py

Lines 146 to 150 in 1c31a9d

cv = np.std(runtime_list) / np.mean(runtime_list)

if cv > 0.2:

raise Warning('The ratio of the standard deviation and mean'

' is larger than {}, this may indicate caching'

' is used in the timing results.'.format())

Is 0.2 an ok measure do you think?

TOFarmer · 2019-11-05T12:16:33Z

fitbenchmarking/fitbenchmark_one_problem.py

-            start_time = time.time()
-            controller.fit()
-            end_time = time.time()
+            runtime = timeit.timeit(controller.fit, number=num_runs) / num_runs
        except Exception as e:
            print(e.message)


Will all e have message attributes? Otherwise this could raise an AttributeError. Do we want to create an issue for this? Does it have associated tests for exceptional behaviour?

I thought it does but not 100% sure. At the moment there are no specific tests for this file other than the testing the examples scripts run

Ah ok, I've found the problem - Exception.message was deprecated in Python 2.5, so while it still works in 2.7 it is undocumented, and in 3+ it is unsupported. The new form is that casting the Exception to str returns the message, so this should be changed to print(str(e)) - do you want to do that here or would you prefer me to create a separate issue?

I think its best if we do that here otherwise it might be put at the bottom of the issue pile in the future, do you mind changing it?

No problems, I've done it and made it PEP 8 compliant (albeit in one case by disabling a pylint warning, although I think it is probably justified here).

TOFarmer · 2019-11-05T12:23:21Z

fitbenchmarking/fitting_benchmarking.py

+  """
+  Gather the user input and list of paths. Call benchmarking on these.
+
+  @param group_name :: is the name (label) for a group. E.g. the name for the group of problems in


If it's not too irritating, I'd suggest adding types for these parameters and the return.

TOFarmer · 2019-11-05T13:26:19Z

fitbenchmarking/fitting_benchmarking.py

+  minimizers, software = misc.get_minimizers(software_options)
+  num_runs = software_options.get('num_runs', None)
+
+  if num_runs is None:


This seems like a lot of work just to set a single variable - I could be wrong but it appears that some of this is redundant. Could you please explain the process you are following to set num_runs?

Also it appears that the default of 5 for num_runs gets set in two places: here and in the default options json. This probably shouldn't be the case, as it might produce unexpected behaviour for a user if they created their own default without num_runs (or with num_runs=None).

At the moment, users can define a software_options as a dictionary. If num_runs is not defined in the dictionary then fitbenchmarking.utils.options is used to figure out the default option from the json file. This is similar to how the comparison_mode is set:

fitbenchmarking/fitbenchmarking/results_output.py

Lines 38 to 50 in 3685404

minimizers, software = misc.get_minimizers(software_options)

comparison_mode = software_options.get('comparison_mode', None)

if comparison_mode is None:

if 'options_file' in software_options:

options_file = software_options['options_file']

comparison_mode = options.get_option(options_file=options_file,

option='comparison_mode')

else:

comparison_mode = options.get_option(option='comparison_mode')

if comparison_mode is None:

comparison_mode = 'both'

I have removed the second default option of 5 for num_runs.

Ok, so I think I've figured out why this is confusing me:

line 42 can only be True if software_options['num_runs'] is None. It seems like this would be a strange situation to occur. However if it did occur, then line 51 always sets options_file to None (based on the excerpt above I think your addition should be options_file = software_options['options_file'], rather than sofware_options['num_runs']).

Ah, and if you follow the excerpt above, then your line 50 needs to be if 'options_file'... rather than if 'num_runs'...

TOFarmer · 2019-11-05T13:39:42Z

fitbenchmarking/fitting_benchmarking.py


-    return prob_results, results_dir
+def _benchmark(user_input, problem_group, num_runs=5):


Is this another place where the default for num_runs is specified? This one should probably go as it is a 'private' function.

num_runs is defined in the json file so I have no removed this from this function

TOFarmer · 2019-11-05T13:40:04Z

fitbenchmarking/fitting_benchmarking.py


+  @param user_input :: all the information specified by the user


Again, types if possible

I've updated the doc strings

TOFarmer · 2019-11-05T13:48:59Z

fitbenchmarking/fitting_benchmarking.py


-    parsed_problems = [parse.parse_problem_file(p) for p in problem_group]
+  if not isinstance(user_input, list):


Not a problem with this PR, but this code could really do with a refactor - do you know if there is an issue for this?

I don't think there is any issue that talks is for refactoring code, as Andrew cleans up things (such as the fitting and parsing) he is refactoring those places

TOFarmer · 2019-11-05T13:54:50Z

Just before you make any changes, I've done some refactoring for PEP 8 compliance which I need to commit - it's not letting me push at the moment so I just need to work out why.

TOFarmer · 2019-11-06T16:39:19Z

fitbenchmarking/fitbenchmark_one_problem.py


        if not controller.success:
            chi_sq = np.nan
            status = 'failed'
        else:
+            cv = np.std(runtime_list) / np.mean(runtime_list)


Apologies, as my original suggestion was a bad metric for determining whether caching has occurred. It would be better to test the ratio of the max to the min (as if a large number of runs were cached then the stdefv could be low despite a single outlier). I've had a look at the timeit source code for Cython and the ratio they use is 4, so we could pick this to be consistent.

Also I think rather than raising the Warning maybe we should use warnings.warn, so just:

warnings.warn('The ratio of the max time to the min is larger than {}, which may indicate that caching has occurred in the timing results`.format(tolerance))

TOFarmer · 2019-11-06T17:04:19Z

example_scripts/example_runScripts_expert.py

@@ -89,7 +88,15 @@ def main(argv):

        # Processes software_options dictionary into Fitbenchmarking format
        minimizers, software = misc.get_minimizers(software_options)
-
+        num_runs = software_options.get('num_runs', None)


Does this pattern need to be included here as well as in fitbenchmark_group? Is it not dealt with there if the user doesn't define it correctly here? If it is required then maybe it would be worth adding a comment or two explaining what it does?

From our meeting on Friday I thought the idea was to remove the expert scirpt so this would not be a problem?

Yes, that's fine.

TOFarmer · 2019-11-11T17:04:17Z

I discovered that when running the example_runScripts.py, on some occasions (maybe 40% of the time) I got a UserWarning about the ratio of the maximum to minimum runtimes (see line 149 of fitbenchmark_one_problem.py). These were commonly associated with the cubic problem being run on scipy, which is the first problem run.

While @wathen did not experience the same warnings, @AndrewLister-STFC had the same while running on a clean cloud instance. The origin of this behaviour (max run time > 4 x min run time) is not obvious, and for the vast majority of problems/minimizer combinations, the ratio is < 2.

We decided that the current determination of anomalous run times (max/min ratio) is appropriate, and this warning should occur. We recommend that the html cell for each timing result (which will be the average) is hyperlinked to the individual results. This has the added benefit that the user can calculate the uncertainty on the runtimes.

NB: For context, the max/min ratio is the same criteria that timeit uses for warning the user.

TOFarmer

Check the accuracy tables are still the same

Accuracies are consistent with master.

New timing results are consistent between runs

Other than timing variations discussed in previous comment, timing are consistent.

Function: Does the change do what it's supposed to?

Yes, tabulated runtimes are now the average of n runs (default 5). Rather than discard the highest and lowest (as suggested in #257 ), the ratio of these is calculated and the user is warned if this is > some tolerance (set to 4).

Tests: Does it pass? Is there adequate coverage for new code?

There are currently no tests associated with either of the python modules modified, and so no new coverage has been added. This will be addressed in #290 .

Style: Is the coding style consistent? Is anything overly confusing?

Coding style is fine, other than problem with how options are set which is consistent with existing code - this will be rectified in #260 .

Documentation: Is there a suitable change to documentation for this change?

Yes.

Adding averaging time using timeit

cd9ad1b

wathen added the Enhancement New feature or request label Nov 5, 2019

wathen assigned TOFarmer Nov 5, 2019

wathen requested a review from TOFarmer November 5, 2019 10:48

wathen unassigned TOFarmer Nov 5, 2019

Minor updates to option.rst

d86fab1

TOFarmer reviewed Nov 5, 2019

View reviewed changes

Refactoring for PEP 8 compliance

51728ef

wathen and others added 10 commits November 5, 2019 15:56

Merge branch 'master' into 257_timing

babfa2b

Update docstrings for fitbenchmark_one_problem.py

afa69b8

Update to options.rst

986ec5d

Changing average timing to use timeit.repeat

433321a

Update docstrings for fitting_benchmarking.py

f0c20c2

Updating docstring

2c68507

Update expert example script to run with num_runs

c9c808f

Adding check for large standard deviation

1c31a9d

Update expert example script to run with num_runs

cc88b5a

Updated exception message printing to be Python 3 compliant

30dc3f7

TOFarmer reviewed Nov 6, 2019

View reviewed changes

Merge branch 'master' into 257_timing

84e0eee

tyronerees added this to To do in ALC Maintenance Nov 2019 Nov 11, 2019

tyronerees moved this from To do to In progress in ALC Maintenance Nov 2019 Nov 11, 2019

tyronerees removed this from In progress in ALC Maintenance Nov 2019 Nov 11, 2019

wathen added 3 commits November 11, 2019 13:33

Merge branch 'master' into 257_timing

3831ce5

Updates to warning message

057e44e

Updates to default num_runs

56d194f

TOFarmer self-requested a review November 11, 2019 14:44

Added variable for ratio tolerance

c750fb2

TOFarmer approved these changes Nov 11, 2019

View reviewed changes

Merge branch 'master' into 257_timing

ac4745c

TOFarmer merged commit bec98d5 into master Nov 12, 2019

TOFarmer deleted the 257_timing branch November 12, 2019 11:40

	Number of runs is used to define how many times FitBenchmarking calls a minimizer and thus calculates an average elapsed time using `timeit`.
	Number of runs defines how many times FitBenchmarking calls a minimizer and thus calculates an average elapsed time using `timeit`.

	cv = np.std(runtime_list) / np.mean(runtime_list)
	if cv > 0.2:
	raise Warning('The ratio of the standard deviation and mean'
	' is larger than {}, this may indicate caching'
	' is used in the timing results.'.format())

	minimizers, software = misc.get_minimizers(software_options)
	comparison_mode = software_options.get('comparison_mode', None)

	if comparison_mode is None:
	if 'options_file' in software_options:
	options_file = software_options['options_file']
	comparison_mode = options.get_option(options_file=options_file,
	option='comparison_mode')
	else:
	comparison_mode = options.get_option(option='comparison_mode')

	if comparison_mode is None:
	comparison_mode = 'both'


		return prob_results, results_dir
		def _benchmark(user_input, problem_group, num_runs=5):


		@param user_input :: all the information specified by the user


		parsed_problems = [parse.parse_problem_file(p) for p in problem_group]
		if not isinstance(user_input, list):

257 Adding averaging time using timeit #285

257 Adding averaging time using timeit #285

Conversation

wathen commented Nov 5, 2019

Description of Work

Testing Instructions

TOFarmer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TOFarmer commented Nov 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TOFarmer commented Nov 11, 2019

TOFarmer left a comment • edited

Choose a reason for hiding this comment

TOFarmer left a comment •

edited