Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: timeit: "lower bound" should read "upper bound" #47568

Closed
unutbu mannequin opened this issue Jul 8, 2008 · 6 comments
Closed

Documentation: timeit: "lower bound" should read "upper bound" #47568

unutbu mannequin opened this issue Jul 8, 2008 · 6 comments
Assignees
Labels
docs Documentation in the Doc dir

Comments

@unutbu
Copy link
Mannequin

unutbu mannequin commented Jul 8, 2008

BPO 3318
Nosy @birkenfeld

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/birkenfeld'
closed_at = <Date 2008-07-16.22:14:03.659>
created_at = <Date 2008-07-08.01:43:12.065>
labels = ['docs']
title = 'Documentation: timeit: "lower bound" should read "upper bound"'
updated_at = <Date 2008-09-13.14:07:49.630>
user = 'https://bugs.python.org/unutbu'

bugs.python.org fields:

activity = <Date 2008-09-13.14:07:49.630>
actor = 'unutbu'
assignee = 'georg.brandl'
closed = True
closed_date = <Date 2008-07-16.22:14:03.659>
closer = 'georg.brandl'
components = ['Documentation']
creation = <Date 2008-07-08.01:43:12.065>
creator = 'unutbu'
dependencies = []
files = []
hgrepos = []
issue_num = 3318
keywords = []
message_count = 6.0
messages = ['69405', '69848', '71499', '72908', '73143', '73187']
nosy_count = 2.0
nosy_names = ['georg.brandl', 'unutbu']
pr_nums = []
priority = 'normal'
resolution = 'works for me'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue3318'
versions = ['Python 2.5']

@unutbu
Copy link
Mannequin Author

unutbu mannequin commented Jul 8, 2008

Re: http://docs.python.org/lib/module-timeit.html
Where the documentation says "In a typical case, the lowest value gives
a lower bound for how fast your machine can run the given code snippet",
it should read instead,
"In a typical case, the lowest value gives an upper bound for how fast
your machine can run the given code snippet".

Clearly, if a machine can run a code snippet in x seconds with
background processes, then the fastest the machine can run the code
snippet (without background processes) should be <= x seconds. Therefore
x is an upper bound rather than a lower bound.

@unutbu unutbu mannequin assigned birkenfeld Jul 8, 2008
@unutbu unutbu mannequin added the docs Documentation in the Doc dir label Jul 8, 2008
@birkenfeld
Copy link
Member

I disagree. An ideal machine is not useful in practice, so any assertion
about it isn't helpful.

In that light, the snippet is correct in saying that if execution of a
snippet is done enough times, the lowest value is a lower bound for
execution speed.

@unutbu
Copy link
Mannequin Author

unutbu mannequin commented Aug 19, 2008

Dear Georg,
Would you please reconsider this issue (http://bugs.python.org/issue3318) for a moment?

The term "lower bound" as it is used in the timeit documentation is either misleading or mathematically incorrect. A lower bound is a number which is less than or equal to every member of a set. What set is the timeit documentation referring to? Here are two possibilities:

A = the set of times recorded from three runs
B = the set of all possible times a particular machine could return

The term "lower bound" is technically correct if the documentation means "lower bound of set A", but I think it would be misleading to use the term "lower bound" in this way, since the documentation would then be asserting: "the minimum time of three runs is the lower bound of three runs". It's not very exciting, and moreover, it's obvious.

The term "lower bound" is simply incorrect if the documentation meant "lower bound of set B". I explained the reason why in my first post.

I appreciate your point when you said, "An ideal machine is not useful in practice".
However, the documentation opens up this can of worms by its own use of the term lower bound. If ideal machines is not what it wants to discuss, then maybe the documentation needs to be changed in some other way to avoid the mathematically charged term, "lower bound".

I think you would agree that good documentation needs to use language accurately. The documentation as it stands is either fallacious (if it is talking about set B), trivial (if it is talking about set A), or possibly correct if it is talking about some set C which I have not imagined.

If it is the latter case, please update the documentation to make clear what set C is.

@birkenfeld
Copy link
Member

Sadly, this is not mathematics. Else, I'd concur that the designation
"lower bound" should be accurate with respect to a certain set.

I fail to see what is missing in the explanation "the lowest value is a
lower bound *in a typical case*". It allows for lower execution times.

Also, your suggested wording "the lowest value is an upper bound" is
certainly not more accurate, if not an oxymoron in itself. I repeat,
what you usually want when timing code snippets is the execution time
under realistic conditions, because you'd have a hard time defining what
"the fastest machine" is.

@unutbu
Copy link
Mannequin Author

unutbu mannequin commented Sep 12, 2008

Let B = the set of all possible times on a particular machine (the machine on which the timeit script is run).
Let x = the minimum of B.
Then "the lowest value is an upper bound for x".
This is correct, accurate, not an oxymoron.
The above does not refer to an ideal machine, nor does it refer to the fastest machine.

Let's try to agree on a principle: Every statement made in documentation should be correct and have meaningful substance.

If we can agree on this principle, then I think we will have to agree that the paragraph:

"""
Note: It's tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python's speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics.
"""

can not stay the way it is.

Here is why:

The correctness of the sentence, "In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet" relies on the presence of the word "typical". But what does typical mean? Here we come to the second half of the principle: the sentence must have meaning and substance.

By relying on the word typical, we reduce the sentence to meaninglessness, because there is no way to endow the word "typical" with meaning without also making the sentence incorrect. For example, if we tried to make the sentence

"In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet"

mean

"If you run the snippet 100 times, the lowest value would be less than the time in 50 cases"

then you run the risk of making a claim that may not be true.

The critical reader will be forced to simply throw away the entire paragraph as nonsense.

The uncritical reader may believe the output of timeit.repeat is less than or equal to x, which is simply not true. The output of timeit.repeat might not even be near x (whatever near means!) if for example, the script were being run on a server while lots of other processes were being run, or if there was a process with nice priority -19 running simultaneously.

What do you think we should do?

@unutbu
Copy link
Mannequin Author

unutbu mannequin commented Sep 13, 2008

Georg, please forgive me. I thought a sample size of 3 was much too small to make a claim about the typical case, but it appears after doing a computer experiment that I was wrong:

#!/usr/bin/env python
from __future__ import division
import timeit
import random

repeat=100
num=100

def test_func():
    l=1
    for idx in range(10000):
        l=l*idx

timer=timeit.Timer('test_func()','from __main__ import test_func')

data=timer.repeat(repeat=repeat,number=num)

def test_timer():
    sample=random.sample(data,3)
    minval=min(sample)
    onerun=random.choice(data)
    return 1 if minval<onerun else 0

successes=[test_timer() for idx in range(repeat)]
print "Those runs for which the claim in the documentation is true:",successes
s=sum(successes)
l=len(successes)
print "probability that the claim is true: %s/%s = %s"%(s,l,s/l)

Returns:

% timeit-statistics.py
Those runs for which the claim in the documentation is true: [0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1]
probability that the claim is true: 74/100 = 0.74

I'm satisfied the documentation is correct, and I'm really sorry for wasting your time.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

1 participant