New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation: timeit: "lower bound" should read "upper bound" #47568
Comments
Re: http://docs.python.org/lib/module-timeit.html Clearly, if a machine can run a code snippet in x seconds with |
I disagree. An ideal machine is not useful in practice, so any assertion In that light, the snippet is correct in saying that if execution of a |
Dear Georg, The term "lower bound" as it is used in the timeit documentation is either misleading or mathematically incorrect. A lower bound is a number which is less than or equal to every member of a set. What set is the timeit documentation referring to? Here are two possibilities: A = the set of times recorded from three runs
B = the set of all possible times a particular machine could return The term "lower bound" is technically correct if the documentation means "lower bound of set A", but I think it would be misleading to use the term "lower bound" in this way, since the documentation would then be asserting: "the minimum time of three runs is the lower bound of three runs". It's not very exciting, and moreover, it's obvious. The term "lower bound" is simply incorrect if the documentation meant "lower bound of set B". I explained the reason why in my first post. I appreciate your point when you said, "An ideal machine is not useful in practice". I think you would agree that good documentation needs to use language accurately. The documentation as it stands is either fallacious (if it is talking about set B), trivial (if it is talking about set A), or possibly correct if it is talking about some set C which I have not imagined. If it is the latter case, please update the documentation to make clear what set C is. |
Sadly, this is not mathematics. Else, I'd concur that the designation I fail to see what is missing in the explanation "the lowest value is a Also, your suggested wording "the lowest value is an upper bound" is |
Let B = the set of all possible times on a particular machine (the machine on which the timeit script is run). Let's try to agree on a principle: Every statement made in documentation should be correct and have meaningful substance. If we can agree on this principle, then I think we will have to agree that the paragraph: """ can not stay the way it is. Here is why: The correctness of the sentence, "In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet" relies on the presence of the word "typical". But what does typical mean? Here we come to the second half of the principle: the sentence must have meaning and substance. By relying on the word typical, we reduce the sentence to meaninglessness, because there is no way to endow the word "typical" with meaning without also making the sentence incorrect. For example, if we tried to make the sentence "In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet" mean "If you run the snippet 100 times, the lowest value would be less than the time in 50 cases" then you run the risk of making a claim that may not be true. The critical reader will be forced to simply throw away the entire paragraph as nonsense. The uncritical reader may believe the output of timeit.repeat is less than or equal to x, which is simply not true. The output of timeit.repeat might not even be near x (whatever near means!) if for example, the script were being run on a server while lots of other processes were being run, or if there was a process with nice priority -19 running simultaneously. What do you think we should do? |
Georg, please forgive me. I thought a sample size of 3 was much too small to make a claim about the typical case, but it appears after doing a computer experiment that I was wrong: #!/usr/bin/env python
from __future__ import division
import timeit
import random
repeat=100
num=100
def test_func():
l=1
for idx in range(10000):
l=l*idx
timer=timeit.Timer('test_func()','from __main__ import test_func')
data=timer.repeat(repeat=repeat,number=num)
def test_timer():
sample=random.sample(data,3)
minval=min(sample)
onerun=random.choice(data)
return 1 if minval<onerun else 0
successes=[test_timer() for idx in range(repeat)]
print "Those runs for which the claim in the documentation is true:",successes
s=sum(successes)
l=len(successes)
print "probability that the claim is true: %s/%s = %s"%(s,l,s/l) Returns: % timeit-statistics.py I'm satisfied the documentation is correct, and I'm really sorry for wasting your time. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: