Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pybench #51278

Closed
kristjanvalur mannequin opened this issue Oct 1, 2009 · 8 comments
Closed

Improve pybench #51278

kristjanvalur mannequin opened this issue Oct 1, 2009 · 8 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@kristjanvalur
Copy link
Mannequin

kristjanvalur mannequin commented Oct 1, 2009

BPO 7029
Nosy @malemburg, @kristjanvalur
Files
  • pybench.patch
  • pybench.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-10-07.14:16:37.115>
    created_at = <Date 2009-10-01.17:16:42.544>
    labels = ['type-bug']
    title = 'Improve pybench'
    updated_at = <Date 2009-10-09.14:32:55.506>
    user = 'https://github.com/kristjanvalur'

    bugs.python.org fields:

    activity = <Date 2009-10-09.14:32:55.506>
    actor = 'kristjan.jonsson'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-10-07.14:16:37.115>
    closer = 'kristjan.jonsson'
    components = ['Demos and Tools']
    creation = <Date 2009-10-01.17:16:42.544>
    creator = 'kristjan.jonsson'
    dependencies = []
    files = ['15016', '15056']
    hgrepos = []
    issue_num = 7029
    keywords = ['patch']
    message_count = 8.0
    messages = ['93416', '93417', '93422', '93655', '93656', '93661', '93699', '93796']
    nosy_count = 2.0
    nosy_names = ['lemburg', 'kristjan.jonsson']
    pr_nums = []
    priority = 'normal'
    resolution = 'later'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue7029'
    versions = ['Python 2.7']

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Oct 1, 2009

    The attached patch contains suggested fixes to pybench.py:

    1. add a processtime timer for windows
    2. fix a bug in timer selection: timer wasn't put in 'self'
    3. Collect 'gross' times for each round
    4. show statistics for per-round sums, both for sum of "adjusted" times
      for each round, and 'gross' time per round.

    The last one is important. Traditionally, the focus would be on
    individual tests. The series of individual timings for each test would
    be examined, min and afverage found. These minima and averages would
    then be summed up to get a total time. These results are very noisy.
    In addition, the sum of individual minima is not a linear operator that
    has any meaning and is thus extremely noisy.

    Looking at the minimum and average of the sum is a much stabler
    indication of total benchmark performance.

    Another thing that I found when working with this, is that using
    "calibration" significantly adds to noise and can produce incorrect
    results. It adds a level of non-linearity to the results that can
    appear very strange. Typically the 'overhead' is so low that we should
    consider skipping the calibration. The purpose of the benchmark must be
    to measure the performance of python in context. The meaning of
    individual operations in isolation are pretty meaningless. Although I
    didn't change the default number of calibration runs from 10, I would
    suggest documenting that it may be useful to turn it to 0 to increase
    predictability and get numbers indicative of real performance.

    @kristjanvalur kristjanvalur mannequin added the type-bug An unexpected behavior, bug, or error label Oct 1, 2009
    @malemburg
    Copy link
    Member

    Kristján Valur Jónsson wrote:

    Thanks for the patch. Here's a quick review (a bit terse, but
    hope you don't mind)...

    The attached patch contains suggested fixes to pybench.py:

    1. add a processtime timer for windows

    I'm not sure why you added this: the systimes.py module
    already has a ctypes wrapper for kernel32.GetProcessTimes() ?!

    All you have to do is use this command line option to
    enable it: --timer systimes.processtime

    1. fix a bug in timer selection: timer wasn't put in 'self'

    Thanks for spotting this one.

    1. Collect 'gross' times for each round

    I'm assuming that "gross" = test time + overhead. Is that right ?

    I like the idea, but not the implementation :-) Don't like my own
    pybench statistics implementation much either. I plan to
    rewrite it for Python 2.7.

    1. show statistics for per-round sums, both for sum of "adjusted" times
      for each round, and 'gross' time per round.

    The last one is important. Traditionally, the focus would be on
    individual tests. The series of individual timings for each test would
    be examined, min and afverage found. These minima and averages would
    then be summed up to get a total time. These results are very noisy.
    In addition, the sum of individual minima is not a linear operator that
    has any meaning and is thus extremely noisy.

    Looking at the minimum and average of the sum is a much stabler
    indication of total benchmark performance.

    I agree. For the minimum times, the minimum over all tests
    should be used.

    Another thing that I found when working with this, is that using
    "calibration" significantly adds to noise and can produce incorrect
    results. It adds a level of non-linearity to the results that can
    appear very strange. Typically the 'overhead' is so low that we should
    consider skipping the calibration. The purpose of the benchmark must be
    to measure the performance of python in context. The meaning of
    individual operations in isolation are pretty meaningless. Although I
    didn't change the default number of calibration runs from 10, I would
    suggest documenting that it may be useful to turn it to 0 to increase
    predictability and get numbers indicative of real performance.

    The idea behind the calibration is to hide away the overhead
    of setting up the test. You're right, that nowadays the tests
    run so fast, that the calibration causes more noise than do
    good.

    This is due to the fact that the number of iteration rounds
    and test "packs" were chosen in 2006. Back then, both Python
    and the CPUs were a lot slower.

    OTOH, timers have not gotten a lot more accurate.

    As a result, the measurements of pybench on todays machines
    have far more noise than they did in 2006.

    The only way to resolve this is to adjust all tests - something
    which I also plan to do in time for Python 2.7.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Oct 1, 2009

    Hello Marc-Andre.

    1. yes, this is nonsense. I thought that "systimes" was some archaeic
      module that had been discontinued, since I didn't find it in the
      standard modules (where "time" is). I didn't realize that it was a
      special helper sibling-module to pybench.

    2. 'gross', yes, as opposed to 'net'. The total time of each round.
      For timers which don't update frequently, such as GetProcessTimes(),
      summing up many small contributions compounds the error (in this case,
      +- 10ms per measurement). Can you be more specific about what it is
      that you don't like?

    About the calibration: I ran into problems where I was tuning the
    memory allocator. I put HeapAlloc in place of malloc() and found a
    consistent 10% performance degradation, even when using the LFH. I was
    puzzled by this until I turned off calibration and the effect went away.

    The calibration technique is not a bad idea, but it would be helpful,
    when one is comparing runs, to be able to keep the calibration times of
    the previous run, so as not to introduce a random element into the
    comparison. Alternatively, explain that when doing successive tests,
    the lowest variance in the result is to be expeced with -C 0.

    So, how can I improve this?

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Oct 6, 2009

    Here is an updated patch with the superfluous timing function removed.
    Please advise me on how you "don't like the implementation"

    I'm also considering adding a '--quiet' flag that causes it to only emit
    the 'total' line. This is useful for automated testing of builds. Any
    thoughts about that?

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Oct 6, 2009

    And here is the actual patch.

    @malemburg
    Copy link
    Member

    Kristján Valur Jónsson wrote:

    Kristján Valur Jónsson <kristjan@ccpgames.com> added the comment:

    Here is an updated patch with the superfluous timing function removed.
    Please advise me on how you "don't like the implementation"

    The reporting methods are a complete mess (not your patch, my code).
    They need a rewrite and then I can apply the extra gross output.

    Something that would need to change is the .stat() patch: why
    didn't you just add 3 new fields to the tuple instead of using
    a list with 2 tuples ?

    In any case, please don't invest too much time into this. I'm going
    to rewrite the statistics part anyway and can then add the new
    gross values while I'm at it.

    I'm also considering adding a '--quiet' flag that causes it to only emit
    the 'total' line. This is useful for automated testing of builds. Any
    thoughts about that?

    Could be useful, yes.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Oct 7, 2009

    Ok, thanks for the info.
    We are currently using the modified version in house for monitoring our
    builds and that's ok for my purposes. I have set the -C 0 and --timer
    systimes.processtime as default for us.
    I'll close this now and leave you to do the rewrite.

    @kristjanvalur kristjanvalur mannequin closed this as completed Oct 7, 2009
    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Oct 9, 2009

    Fixed the bug with "timer = timer" in trunk in revision 75293

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant