Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same constants in tuples are not merged while compile() #78281

Closed
rotu mannequin opened this issue Jul 11, 2018 · 24 comments
Closed

Same constants in tuples are not merged while compile() #78281

rotu mannequin opened this issue Jul 11, 2018 · 24 comments
Assignees
Labels
3.8 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@rotu
Copy link
Mannequin

rotu mannequin commented Jul 11, 2018

BPO 34100
Nosy @tim-one, @brettcannon, @terryjreedy, @mdickinson, @vstinner, @methane, @serhiy-storchaka, @miss-islington, @rotu
PRs
  • bpo-34100: Merge constants recursively #8341
  • bpo-34100: Partially revert merge_consts_recursive() #10743
  • bpo-34100: compile: Re-enable frozenset merging #10760
  • bpo-34100: merge_consts_recursive() merges frozenset #10762
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/methane'
    closed_at = <Date 2018-11-26.12:24:08.823>
    created_at = <Date 2018-07-11.19:08:31.044>
    labels = ['interpreter-core', '3.8', 'performance']
    title = 'Same constants in tuples are not merged while compile()'
    updated_at = <Date 2018-11-28.16:05:45.684>
    user = 'https://github.com/rotu'

    bugs.python.org fields:

    activity = <Date 2018-11-28.16:05:45.684>
    actor = 'vstinner'
    assignee = 'methane'
    closed = True
    closed_date = <Date 2018-11-26.12:24:08.823>
    closer = 'methane'
    components = ['Interpreter Core']
    creation = <Date 2018-07-11.19:08:31.044>
    creator = 'Dan Rose'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34100
    keywords = ['needs review']
    message_count = 24.0
    messages = ['321495', '321497', '321498', '321499', '321513', '321515', '321516', '321517', '321518', '321519', '321520', '321526', '321576', '321633', '321943', '321989', '330428', '330519', '330521', '330522', '330572', '330582', '330605', '330607']
    nosy_count = 9.0
    nosy_names = ['tim.peters', 'brett.cannon', 'terry.reedy', 'mark.dickinson', 'vstinner', 'methane', 'serhiy.storchaka', 'miss-islington', 'Dan Rose']
    pr_nums = ['8341', '10743', '10760', '10762']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue34100'
    versions = ['Python 3.8']

    @rotu
    Copy link
    Mannequin Author

    rotu mannequin commented Jul 11, 2018

    In the Python 3.7.0 interpreter, the following evaluates to False. In 3.6.4, it was True:

    c,d = 500,500
    c is d
    

    This seems to be because, in some cases, Python 3.7 fails to intern integers inside tuples:

        a = (500,500)
        print(a[0] is a[1]) # False
        
        a = (500,500,42)
        print(a[0] is a[1]) # False
        
        a = (500,500,'42')
        print(a[0] is a[1]) # False
        
        answer = 42
        a = (500,500,answer)
        print(a[0] is a[1]) # True
        
        a = (500,500,[42])
        print(a[0] is a[1]) # True
    
        a = [500,500]
        print(a[0] is a[1]) # True

    I believe the above should all return True.

    @rotu rotu mannequin added type-bug An unexpected behavior, bug, or error 3.7 (EOL) end of life labels Jul 11, 2018
    @rotu
    Copy link
    Mannequin Author

    rotu mannequin commented Jul 11, 2018

    Another curious case:

        a = (500,500); b = (500,500)
        print(a[0] is b[0]) # True
        print(a[0] is b[1]) # False
        print(a[1] is b[0]) # False
        print(a[1] is b[1]) # True

    @mdickinson
    Copy link
    Member

    This was a side-effect of bpo-29469.

    @vstinner
    Copy link
    Member

    Should the AST optimizer "interns" constants?

    @methane
    Copy link
    Member

    methane commented Jul 12, 2018

    Thank you for finding it.

    I had worked on constant merging. 1
    It didn't fix this case for now. I'll continue the suspended work.

    But it may be a too big to fix only this regression.
    How is this regression critical?
    If it's important, I'll try to create minimal version of the patch to backport.

    @methane methane self-assigned this Jul 12, 2018
    @methane methane added 3.8 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jul 12, 2018
    @methane methane changed the title Python doesn't intern integers in a tuple of constant literals Same integers in a tuple of constant literals are not merged Jul 12, 2018
    @methane methane added performance Performance or resource usage and removed type-bug An unexpected behavior, bug, or error labels Jul 12, 2018
    @serhiy-storchaka
    Copy link
    Member

    This is not only with integers.

    >>> a = ((1, 2), (1, 2))
    >>> a[0] is a[1]
    False
    >>> a = ('@#$', '@#$')
    >>> a[0] is a[1]
    False
    >>> a = (1.0, 1.0)
    >>> a[0] is a[1]
    False

    The only exception is short ASCII identifier-like strings (as a side effect of interning them):

    >>> a = ('foo', 'foo')
    >>> a[0] is a[1]
    True

    I'm not sure this is a problem which should be resolved.

    @tim-one
    Copy link
    Member

    tim-one commented Jul 12, 2018

    The language doesn't define anything about this - any program relying on accidental identity is in error itself.

    Still, it's nice if a code object's co_consts vector is as short as reasonably possible. That's a matter of pragmatics, not of correctness.

    @methane
    Copy link
    Member

    methane commented Jul 12, 2018

    OK, unless example code this regression affects seriously is shown, I target only 3.8.

    @methane methane removed the 3.7 (EOL) end of life label Jul 12, 2018
    @serhiy-storchaka
    Copy link
    Member

    The co_consts vector is already as short as possible, except cases when tuples are created at code generation time, but this is not related to this issue (see bpo-33318 and bpo-33325).

    >>> def f():
    ...     a = (1.0, 1.0)
    ...     b = (1.0, 1.0)
    ... 
    >>> f.__code__.co_consts
    (None, (1.0, 1.0))
    >>> f.__code__.co_consts[1][0] is f.__code__.co_consts[1][1]
    False

    @tim-one
    Copy link
    Member

    tim-one commented Jul 12, 2018

    Fine, Serhiy, so reword it a tiny bit: it's nice if a code object's co_consts vector references as few distinct objects as possible. Still a matter of pragmatics, not of correctness.

    @methane
    Copy link
    Member

    methane commented Jul 12, 2018

    FYI, same constants are shared even when they are in different tuples on 3.6.

    For modules having large constant table containing large integer or floats, non name-like string, and bytes are affected significantly.

    @vstinner
    Copy link
    Member

    Honestly, I don't think that it's a bug. We might enhance Python 3.8 to reduce constants duplication, but there is no need to "fix" Python 3.7.

    @methane methane changed the title Same integers in a tuple of constant literals are not merged Same constants in tuples are not merged while compile() Jul 12, 2018
    @brettcannon
    Copy link
    Member

    I also agree that there's no bug here, so I'm marking this an enhancement and removing the regression label.

    @brettcannon brettcannon added type-feature A feature request or enhancement and removed performance Performance or resource usage labels Jul 12, 2018
    @terryjreedy
    Copy link
    Member

    I think (space) 'performance' would be a better label, as this is strictly an implementation improvement, not a language change. But we often (usually? sometimes?) limit performance improvements to the 'next version' so we have the alpha/beta/candidate releases to discover possible regressions.

    I think this is worth considering just because the pattern is so odd. But if this is ironed out as part of a broader and better patch, great.

    @methane
    Copy link
    Member

    methane commented Jul 19, 2018

    Counting object types in logging/pycache/init.cpython-38.pyc:

    master:
    [('r', 1815), (')', 467), ('Z', 339), ('s', 314), ('z', 273), ('c', 157), ('N', 154), ('a', 24), ('F', 14), ('i', 11), ('T', 8)]

    #52588:
    [('r', 1737), (')', 375), ('Z', 339), ('s', 314), ('z', 264), ('c', 157), ('N', 138), ('a', 24), ('F', 14), ('i', 11), ('T', 8)]

    It reduced about 5% objects.

    I chose logging/init because it's large except tests, and it's written in OO-based. (Large application has many OO-based code).

    @methane methane added performance Performance or resource usage and removed type-feature A feature request or enhancement labels Jul 19, 2018
    @methane
    Copy link
    Member

    methane commented Jul 20, 2018

    This is another memory overhead comparison.
    It seems merging constants reduces 2~3% memory usage.

    import sys, django, flask
    sys._debugmallocstats()

    ### master branch

    class size num pools blocks in use avail blocks
    ----- ---- --------- ------------- ------------
    0 8 1 312 194
    1 16 1 135 118
    2 24 2 198 138
    3 32 22 2690 82
    4 40 62 6214 48
    5 48 149 12465 51
    6 56 254 18260 28
    7 64 354 22300 2
    8 72 208 11589 59
    9 80 124 6173 27
    10 88 59 2708 6
    ...
    # bytes in allocated blocks = 9,261,952

    ### merge-const branch

    class size num pools blocks in use avail blocks
    ----- ---- --------- ------------- ------------
    0 8 1 312 194
    1 16 1 135 118
    2 24 1 161 7
    3 32 22 2657 115
    4 40 62 6202 60
    5 48 102 8501 67
    6 56 235 16879 41
    7 64 346 21751 47
    8 72 204 11387 37
    9 80 122 6083 17
    10 88 58 2656 12
    ...
    # bytes in allocated blocks = 8,919,824

    @miss-islington
    Copy link
    Contributor

    New changeset c2e1607 by Miss Islington (bot) (INADA Naoki) in branch 'master':
    bpo-34100: Merge constants recursively (GH-8341)
    c2e1607

    @methane methane closed this as completed Nov 26, 2018
    @vstinner
    Copy link
    Member

    New changeset c2e1607 by Miss Islington (bot) (INADA Naoki) in branch 'master'

    This change introduced a lot of memory leaks (reference leaks):
    https://buildbot.python.org/all/#/builders/1/builds/422

    The following change fix "make && ./python -m test -R 3:3 test_code":

    diff --git a/Python/compile.c b/Python/compile.c
    index acb5cfe29b..cb3e73740d 100644
    --- a/Python/compile.c
    +++ b/Python/compile.c
    @@ -1277,6 +1277,7 @@ merge_consts_recursive(struct compiler *c, PyObject *o)
                 else {
                     u = k;
                 }
    +            Py_DECREF(k);
                 Py_INCREF(u);
                 PyTuple_SET_ITEM(tuple, i, u);
                 i++;
    @@ -1288,6 +1289,7 @@ merge_consts_recursive(struct compiler *c, PyObject *o)
                 Py_DECREF(key);
                 return NULL;
             }
    +        Py_DECREF(PyTuple_GET_ITEM(key, 1));
             PyTuple_SET_ITEM(key, 1, new);
         }
     

    But I dislike the frozenset branch of merge_consts_recursive(): modifying a tuple is a bad idea. For example, if you replace PyTuple_SET_ITEM() with PyTuple_SetItem(), you get a PyErr_BadInternalCall() because the reference count is greater than 1.

    I don't see how to rewrote properly the code, so I reverted (removed) it to let someone else fix the code: PR 10743.

    @vstinner
    Copy link
    Member

    New changeset 1005c84 by Victor Stinner in branch 'master':
    bpo-34100: Partially revert merge_consts_recursive() (GH-10743)
    1005c84

    @vstinner
    Copy link
    Member

    merge_consts_recursive(): rather than modifying the 'key' tuple, would it make sense to *add* the new key to the cache?

    @methane
    Copy link
    Member

    methane commented Nov 28, 2018

    I agree that modifying tuple is bad idea in general case.

    But in case of constants, in-place modify is the easiest approach.

    Our purpose is "merging" constant. On the other hand, "build new tuple and
    replace old tuple" approach makes two same constant tuples.
    If old tuple is referenced from somewhere, we fail to "merging".

    So I don't think we need to follow the general principle unless it reduces code.

    FYI, we have similar in-place editing for interning already.
    https://github.com/python/cpython/blob/master/Objects/codeobject.c#L47-L95

    @vstinner
    Copy link
    Member

    Well, it's up to you. I will review your PR ;-)

    @vstinner
    Copy link
    Member

    New changeset f7e4d36 by Victor Stinner (INADA Naoki) in branch 'master':
    bpo-34100: compile: Re-enable frozenset merging (GH-10760)
    f7e4d36

    @vstinner
    Copy link
    Member

    Thanks INADA Naoki! Thanks a simple and cool optimization!

    Maybe you might want to document it at https://docs.python.org/dev/whatsnew/3.8.html#optimizations ?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants