Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stdlib] Speedup Dict (changing modulus to bitshifting) #3071

Closed
wants to merge 2 commits into from

Conversation

rd4com
Copy link
Contributor

@rd4com rd4com commented Jun 17, 2024

Hello,

it could be a nice improvement, around +80% here (Ubuntu);

Hard to tell without feedbacks, here is the benchmark used:

from time import now
from random import *
from sys.param_env import is_defined
from __dict_simd import Dict2

alias iteration_size = 128
def main():
    var result: Int=0
    var start = now()
    var stop = now()

    
    small = Dict2[Int,Int]()
    start = now()
    for x in range(100):
        for i in range(iteration_size):
            small[i]=i
        for i in range(iteration_size): 
            result += small[i]
    stop = now()
    print(stop-start, result)

    result = 0

    small2 = Dict[Int,Int]()
    start = now()
    for x in range(100):
        for i in range(iteration_size):
            small2[i]=i
        for i in range(iteration_size):
            result += small2[i]
    stop = now()
    print(stop-start, result)

486225 812800
807652 812800


Because the dict always augment the reserved size by <<=1,

it is possible to use &=(self.reserved-1) instead of modulus

🥳 hope you got the same improvements as here

@rd4com rd4com requested a review from a team as a code owner June 17, 2024 21:35
Signed-off-by: rd4com <144297616+rd4com@users.noreply.github.com>
Signed-off-by: rd4com <144297616+rd4com@users.noreply.github.com>
Copy link
Contributor

@bethebunny bethebunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The fact that this wasn't done automatically makes me wonder if the subtraction can also be optimized out. I'll follow up with some of the compiler folks to understand why this case isn't optimized or if we can somehow tell the compiler that reserved is guaranteed to be a multiple of 2. As far as I'm concerned, % is the correct way to spell this and reads much more clearly as to the intent, but obviously this optimization is the core intention behind always keeping a power-of-2 reserved size ;)

@JoeLoser
Copy link
Collaborator

!sync

@bethebunny
Copy link
Contributor

Following up, the failure to optimize here is because we used a signed int for reserved rather than an unsigned type. That's also a reasonable followup.

@bethebunny
Copy link
Contributor

Consensus seems to be "relying on specific codegen optimizations for performance critical things is an antipattern", so +1 on merging this and we don't need any followups!

@JoeLoser
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Jun 18, 2024
@modularbot
Copy link
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added the merged-internally Indicates that this pull request has been merged internally label Jun 18, 2024
@gabrieldemarmiesse
Copy link
Contributor

gabrieldemarmiesse commented Jun 18, 2024

Hi, a warning here, a PR made previously make the capacity of the dict being decided by the underlying List, see #2905. While List always grows by a power a two, that doesn't cause any issue. But if in the future, we change the capacity growing logic in List, the capacity of the Dict might not be a power of two anymore.

Should we remove the optimization where the dict uses all the capacity of the list? That would make sure we always work with powers of two. It's the approach I recommend.

EDIT: I wonder if it would make sense to have a private type PowerOfTwo, which would be a wrapper around mlir.index to represent integers that are know, at compile-time, to be power of two. By adding a debug_assert in the constructor, this can help us debug when numbers that are not powers of two are misused as powers of two. This has also some value as it adds another information when reading the type in the struct. We can then implement the __div__, __mul__, __mod__ to use bit operations, and we're garanteed that optimizations will be used everywhere one of those ops are called. What do you think? Usually, having bit operations throughout of codebase makes it hard to debug if something goes wrong. This approach could help by grouping some of those bits ops in a type.

@modularbot modularbot added the merged-externally Merged externally in public mojo repo label Jun 19, 2024
@modularbot
Copy link
Collaborator

Landed in 504ed85! Thank you for your contribution 🎉

modularbot pushed a commit that referenced this pull request Jun 19, 2024
…(#41913)

[External] [Stdlib] Speedup `Dict` (changing modulus to bitshifting)

Hello,

it could be a nice improvement, around +80% here (Ubuntu);

Hard to tell without feedbacks, here is the benchmark used:

```mojo
from time import now
from random import *
from sys.param_env import is_defined
from __dict_simd import Dict2

alias iteration_size = 128
def main():
    var result: Int=0
    var start = now()
    var stop = now()

    small = Dict2[Int,Int]()
    start = now()
    for x in range(100):
        for i in range(iteration_size):
            small[i]=i
        for i in range(iteration_size):
            result += small[i]
    stop = now()
    print(stop-start, result)

    result = 0

    small2 = Dict[Int,Int]()
    start = now()
    for x in range(100):
        for i in range(iteration_size):
            small2[i]=i
        for i in range(iteration_size):
            result += small2[i]
    stop = now()
    print(stop-start, result)
```
486225 812800
807652 812800

---

Because the dict always augment the reserved size by `<<=1`,

it is possible to use `&=(self.reserved-1)` instead of modulus

:partying_face: hope you got the same improvements as here

Co-authored-by: rd4com <144297616+rd4com@users.noreply.github.com>
Closes #3071
MODULAR_ORIG_COMMIT_REV_ID: 522c3ccd50ac350ea321951ac526bdf137deaac7
@modularbot modularbot closed this Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants