Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: 10x String allocations moving from 0.4.0 to 0.5.0 in recursive loop #1216

Open
manatlan opened this issue Nov 2, 2023 · 16 comments
Open
Labels
bug Something isn't working mojo Issues that are related to mojo mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library

Comments

@manatlan
Copy link

manatlan commented Nov 2, 2023

Bug description

mojo 0.5.0 (& 0.5.1 too) / ubu23.10

This mojo code is a lot slower (10x) with the new 0.5.0, than the good old 0.4.0.

  • With mojo 0.4.0 : it was 6.65 seconds
  • With mojo 0.5.0 : it takes 60 seconds !
@manatlan manatlan added bug Something isn't working mojo Issues that are related to mojo labels Nov 2, 2023
@jackos
Copy link
Collaborator

jackos commented Nov 2, 2023

Do you have the grids.txt file to test with?

@ematejska ematejska added the mojo-lang Tag for all issues related to language. label Nov 2, 2023
@hoxha-saber
Copy link

Do you have the grids.txt file to test with?

@jackos I think the one here works
https://github.com/manatlan/sudoku_resolver/blob/master/grids.txt

@Sharktheone
Copy link

I think this is basically #1030 but with 0.5.0 instead of 0.4.0. To test this it would be nice to have #1032 fixed.

@jackos
Copy link
Collaborator

jackos commented Nov 3, 2023

I ran that file on 0.4.0 and 0.5.0 and both took 66 seconds

@jackos jackos closed this as completed Nov 3, 2023
@manatlan
Copy link
Author

manatlan commented Nov 3, 2023

@jackos ... No ... I have proofs that 0.4.0 take less than 10s...
(I saved the outputs , and calculate the median with those outputs)

The goal of my repo, is to test the same algo with différents languages.
https://github.com/manatlan/sudoku_resolver/blob/master/README.md
And current mojo results are with 0.4.0.

With 0.5.0, the simplest version is a lot slower, while the others versions are a bit slower (+0.5s)

I know that the sudoku.mojo is not optimized or specialized (because it works with strings)
But previous results (0.4) was really good.

With 0.5.0 ... It's a lot slower than python 3...
That's strange

@manatlan
Copy link
Author

manatlan commented Nov 4, 2023

more tests here :
#1225

@Sharktheone
Copy link

In #1225 you can find the code and grids.txt. I also tested this with 0.4.0 and 0.5.0 and I can proof, 0.5.0 is about 10x slower than 0.4.0 (my results)

@jackos
Copy link
Collaborator

jackos commented Nov 4, 2023

Thanks @manatlan on my machine I posted results in that ticket which is the same, but @Sharktheone reported results similar to yours. I'll test on Linux VM

@jackos jackos reopened this Nov 4, 2023
@jackos
Copy link
Collaborator

jackos commented Nov 5, 2023

@soraros I've been profiling both versions, and it's doing millions of String allocations even when I reduce it to just one 9x9 grid, suduko.mojo can be optimized a lot. If you want to improve performance significantly, I recommend building a struct named Grid containing a DTypePointer[int8] and converting each number in the String to int8 after you open the file.

Having said that, the allocations were close to 1/10th in 0.4.0 compared to 0.5.0, I've been through the involved types and there weren't changes that increase allocations, so there must be some compiler optimization that was lost. We pin to the latest llvm, and there were multiple patches between 0.4.0 and 0.5.0, I can't pin down exactly what caused it, but have raised with the mojo team.

Thanks for raising this is a good test for allocation performance, I'll keep this open until we get performance back to 0.4.0 speed. There is a lot of room for optimizations in our heap allocated String type.

@jackos jackos changed the title [BUG]: code runs slower since the "0.5.0" release ... 10 x slower [BUG]: 10x String allocations moving from 0.4.0 to 0.5.0 in recursive loop Nov 5, 2023
@soraros
Copy link
Contributor

soraros commented Nov 5, 2023

@jackos: thank you for the detailed explanation!
I'm not concerned about the suboptimal performance of this particular Sudoku implementation per se since many improvements can be made. I also agree that it would serve as a good test for allocation performance. However, I am curious about the tenfold performance difference between Mac and Linux in version 0.4.0. I think this might even reveal some platform-dependent lost compiler optimisations in LLVM.

@manatlan
Copy link
Author

manatlan commented Nov 6, 2023

Fyi ... I've got the same algo but specialized to use optimal mojo types:
https://github.com/manatlan/sudoku_resolver/blob/master/sudoku_specialized.mojo (2.12s)
To try to reach the speed of the rust(specialized: <1s ) or c(strings: 2.5s) version...

But as you said... The strings version is a good test, in that case, because it's the one which have got the worst perf downgrade from mojo 0.4 to 0.5. (as said, it's now slower than py3)

@manatlan
Copy link
Author

manatlan commented Nov 15, 2023

FYI

... I will update my repo, to use mojo0.5.0 version (which now use string.find() ...)

So the 0.4.0 "sudoku.mojo" (which was a lot faster with mojo0.4 than mojo0.5) will be available here :

https://github.com/manatlan/sudoku_resolver/blob/mojo_0.4.0/sudoku.mojo

EDIT : I have updated the link in original post to link to this one ^^

@manatlan
Copy link
Author

manatlan commented Dec 3, 2023

BTW, just updated my resuts with mojo 0.5.1 ...

Mojo 0.5.1 is still the slower ;-(
https://github.com/manatlan/sudoku_resolver

It's EXACTLY the same algorithm, using strings ... and every implementations (java/js/pypy/rust/nim) beats mojo, by far ;-(
(with 0.4.0, mojo was in pair with rust version ). Currently it's just in pair with python3 ;-(

Same results on github vms ...

@ematejska ematejska added mojo-stdlib Tag for issues related to standard library and removed mojo-lang Tag for all issues related to language. labels Dec 6, 2023
@manatlan
Copy link
Author

just a ping ...

mojo 0.6.1 is still the slowest, in comparison with rust, nim, go, js, java, pypy and codon (and only 2x faster than py3)
https://github.com/manatlan/sudoku_resolver/blob/master/RESULTS.md

(Currently, for a python guy ... codon seems to be a better option than mojo)

@gryznar
Copy link
Contributor

gryznar commented Jan 10, 2024

Mojo lacks small string optimization. This may be direct cause of these allocations. As jackos states, they are working on fix. @jackos is this indeed related?

@soraros
Copy link
Contributor

soraros commented Jan 10, 2024

Mojo lacks small string optimization. This may be direct cause of these allocations.

I think yes.

(Currently, for a python guy ... codon seems to be a better option than mojo)

Also, yes (currently; for a python guy), even if this issue is fixed.

@ematejska ematejska added the mojo-repo Tag all issues with this label label Apr 29, 2024
@ematejska ematejska removed the mojo-stdlib Tag for issues related to standard library label May 3, 2024
@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label May 3, 2024 — with Linear
@ematejska ematejska removed the mojo-stdlib Tag for issues related to standard library label May 6, 2024
@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label May 6, 2024 — with Linear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mojo Issues that are related to mojo mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library
Projects
None yet
Development

No branches or pull requests

7 participants