Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast-path for interned string compares #35491

Closed
malemburg opened this issue Nov 8, 2001 · 8 comments
Closed

Fast-path for interned string compares #35491

malemburg opened this issue Nov 8, 2001 · 8 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@malemburg
Copy link
Member

BPO 479615
Nosy @malemburg, @tim-one, @loewis, @arigo, @nascheme
Files
  • str-eq-cmp.patch: Patch
  • str-eq-cmp2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/malemburg'
    closed_at = <Date 2002-11-14.17:17:13.000>
    created_at = <Date 2001-11-08.15:19:38.000>
    labels = ['interpreter-core']
    title = 'Fast-path for interned string compares'
    updated_at = <Date 2002-11-14.17:17:13.000>
    user = 'https://github.com/malemburg'

    bugs.python.org fields:

    activity = <Date 2002-11-14.17:17:13.000>
    actor = 'lemburg'
    assignee = 'lemburg'
    closed = True
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2001-11-08.15:19:38.000>
    creator = 'lemburg'
    dependencies = []
    files = ['3740', '3741']
    hgrepos = []
    issue_num = 479615
    keywords = ['patch']
    message_count = 8.0
    messages = ['38120', '38121', '38122', '38123', '38124', '38125', '38126', '38127']
    nosy_count = 5.0
    nosy_names = ['lemburg', 'tim.peters', 'loewis', 'arigo', 'nascheme']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue479615'
    versions = []

    @malemburg
    Copy link
    Member Author

    This patch adds a fast-path for comparing equality of interned strings.

    The patch boosts performance for comparing identical string objects
    by some 20% on my machine while not causing any noticable slow-down
    for other operations (according to tests done with pybench).

    More infos and benchmarks later...

    @malemburg malemburg self-assigned this Nov 8, 2001
    @malemburg malemburg added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 8, 2001
    @malemburg malemburg self-assigned this Nov 8, 2001
    @malemburg malemburg added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 8, 2001
    @malemburg
    Copy link
    Member Author

    Logged In: YES
    user_id=38388

    Output from pybench comparing today's CVS Python with patch (eqpython) and without patch (stdpython):

    PYBENCH 1.0

    Benchmark: eqpython.bench (rounds=10, warp=20)

    Tests: per run per oper. diff *)
    ------------------------------------------------------------------------
    BuiltinFunctionCalls: 125.55 ms 0.98 us -1.68%
    BuiltinMethodLookup: 180.10 ms 0.34 us +1.75%
    CompareFloats: 107.30 ms 0.24 us +2.04%
    CompareFloatsIntegers: 185.15 ms 0.41 us -0.05%
    CompareIntegers: 163.50 ms 0.18 us -1.77%

        CompareInternedStrings:      79.50 ms    0.16 us   -20.78%
    

    ^^^^^^^^^^^^^^^^^^^^ This is the interesting line :-) ^^^^^^^^^^^^^^^^^^^^^^^^^^

                  CompareLongs:     110.25 ms    0.24 us    +0.09%
                CompareStrings:     143.40 ms    0.29 us    +2.14%
                CompareUnicode:     118.00 ms    0.31 us    +1.68%
                 ConcatStrings:     189.55 ms    1.26 us    -1.61%
                 ConcatUnicode:     226.55 ms    1.51 us    +1.34%
               CreateInstances:     202.35 ms    4.82 us    -1.87%
       CreateStringsWithConcat:     221.00 ms    1.11 us    +0.45%
       CreateUnicodeWithConcat:     240.00 ms    1.20 us    +1.27%
                  DictCreation:     213.25 ms    1.42 us    +0.47%
             DictWithFloatKeys:     263.50 ms    0.44 us    +1.15%
           DictWithIntegerKeys:     158.50 ms    0.26 us    -1.86%
            DictWithStringKeys:     147.60 ms    0.25 us    +0.75%
                      ForLoops:     144.90 ms   14.49 us    -4.64%
                    IfThenElse:     174.15 ms    0.26 us    -0.00%
                   ListSlicing:      88.80 ms   25.37 us    -1.11%
                NestedForLoops:     136.95 ms    0.39 us    +3.01%
          NormalClassAttribute:     177.80 ms    0.30 us    -2.68%
       NormalInstanceAttribute:     166.85 ms    0.28 us    -0.54%
           PythonFunctionCalls:     152.20 ms    0.92 us    +1.40%
             PythonMethodCalls:     133.70 ms    1.78 us    +1.60%
                     Recursion:     119.45 ms    9.56 us    +0.04%
                  SecondImport:     124.65 ms    4.99 us    -6.03%
           SecondPackageImport:     130.70 ms    5.23 us    -5.73%
         SecondSubmoduleImport:     161.65 ms    6.47 us    -5.88%
       SimpleComplexArithmetic:     245.50 ms    1.12 us    +2.08%
        SimpleDictManipulation:     108.50 ms    0.36 us    +0.05%
         SimpleFloatArithmetic:     125.80 ms    0.23 us    +0.84%
      SimpleIntFloatArithmetic:     128.50 ms    0.19 us    -1.46%
       SimpleIntegerArithmetic:     128.45 ms    0.19 us    -0.77%
        SimpleListManipulation:     159.15 ms    0.59 us    -5.32%
          SimpleLongArithmetic:     189.55 ms    1.15 us    +2.65%
                    SmallLists:     293.70 ms    1.15 us    -5.26%
                   SmallTuples:     230.00 ms    0.96 us    +0.44%
         SpecialClassAttribute:     175.70 ms    0.29 us    -2.79%
      SpecialInstanceAttribute:     199.70 ms    0.33 us    -1.55%
                StringMappings:     196.85 ms    1.56 us    -2.48%
              StringPredicates:     133.00 ms    0.48 us    -8.28%
                 StringSlicing:     165.45 ms    0.95 us    -3.47%
                     TryExcept:     193.60 ms    0.13 us    +0.57%
                TryRaiseExcept:     175.40 ms   11.69 us    +0.69%
                  TupleSlicing:     156.85 ms    1.49 us    -0.00%
               UnicodeMappings:     175.90 ms    9.77 us    +1.76%
             UnicodePredicates:     141.35 ms    0.63 us    +0.78%
             UnicodeProperties:     184.35 ms    0.92 us    -2.10%
                UnicodeSlicing:     179.45 ms    1.03 us    -1.10%
    

    ------------------------------------------------------------------------
    Average round time: 9855.00 ms -1.13%

    *) measured against: stdpython.bench (rounds=10, warp=20)

    As you can see, the rest of the results don't change much and the
    ones that do indicate some additional benefit gained by the patch.
    All slow-downs are way below the noise limit of around 5-10% (depending
    the platforms/machine/compiler).

    @nascheme
    Copy link
    Member

    Logged In: YES
    user_id=35752

    Attached is an updated version of this patch. I'm -0 on it
    since it doesn't seem to help much except for artificial
    benchmarks.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 8, 2002

    Logged In: YES
    user_id=21627

    Is there any progress on this patch, or should it be
    considered withdrawn?

    @malemburg
    Copy link
    Member Author

    Logged In: YES
    user_id=38388

    I still consider the patch worth adding. The application space
    where it helps may be small, but also important: it can
    massively
    speed up parsers which use interned strings as tokens.

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Nov 12, 2002

    Logged In: YES
    user_id=4771

    It seems to me that the whole status of interned strings is
    not clear from the user's perspective. Maybe we should
    avoid putting more emphasis on it. Deprecating intern() in
    favor of sys.intern() would even look like a good thing to
    do.

    Besides, in the use case you describe, you can compare
    tokens with "is" instead of "==" as you know for sure that
    you are comparing two explicitely interned strings. That's
    a hack, but calling intern() in the first place already
    looks like a hack.

    I'd vote against it, but if the patch is accepted don't
    forget to change the constants EQ, LE,... into PyCmp_EQ,
    PyCmp_LE,...

    @tim-one
    Copy link
    Member

    tim-one commented Nov 14, 2002

    Logged In: YES
    user_id=31435

    Marc, as Armin suggests, why is it that you can't use "is" in
    your code when you know you've got this special case?

    I'm inclined to reject this patch. While it gives a significant
    speedup in the specific CompareInternedStrings benchmark,
    by eyeball it adds Yet Antoher test and branch to all other
    non special-case compares, and I'm not inclined to believe
    that comparing interned strings should be favored at the
    expense of, e.g., slowing float compares, or compares of non-
    interned strings, or etc etc.

    I have to note too that the measured 21% speedup on a
    benchmark that does nothing else doesn't support a claim
    of "massive speedups". At best it looks like a small win for a
    small class of apps, at the expense of smaller losses for
    much larger classes of apps.

    @malemburg
    Copy link
    Member Author

    Logged In: YES
    user_id=38388

    I could use "is" in my code (and in fact, I am currently), but
    consider this a hack.

    Anyway, the PEP-0275 has a much wider scope, so I'll
    close this patch as rejected in the hope that PEP-275
    will make it into the core.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants