New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash.pick is very slow on big hashes. #2586
Comments
|
Just in case: not a regression, it was even slower some time ago. |
|
This is a simple matter of algorithm. Each time you call .roll / .pick without any parameters, it needs to walk half of the keys in the hash (on average). Is there a particular reason why you do not use the parameter to .pick / .roll to indicate how many items you want from the hash? The performance difference is staggering: Of course, one could create a list of keys in the hash the first time a .roll or .pick is done. But then this would need to get invalidated / updated each time a key was added or deleted. Adding overhead that is not needed for normal operation. FWIW, I've already spent considerable time optimizing .roll / .pick on Maps/Hashes. I don't think there's a lot to be gained there (but you're very welcome to prove me wrong!) Closing this ticket as I think this is in the category DIHWIDT. |
|
I created a doc ticket in case it's a common issue: Raku/doc#2537 |
|
I guess if one has a hash backed by an array, then perhaps it'd be possible to do something more efficient, but even then there's difficulties: some slots will be empty, while others might be in collision and so a list chained off that slot. So I don't immediately know if you can do better and get the probability distribution right, rather than having it biased in various odd ways. And there's no certainty that a hash will be backed by an array; the current MoarVM one is a linked list, for example, which enforces O(n) anyway. |
|
I think we could implement that in MoarVM. We would have to look at how many keys are in each bucket to make sure each key has the same probability, but I think it's quite do-able. |
|
FWIW, I think most of the algorithm exists in Map.roll(N) already. |
|
I trust @samcv when she says it's doable in the VM. The question remains: should we do it? That's again something every VM for Rakudo would have to provide. It's also something MoarVM will have to provide, which needs tests, JIT support, etc. Even though I understand the fun in getting such things to work, there's clearly some cost there. This raises the question of benefit which must outweigh the cost. I'm not convinced it does. @lizmat already pointed out that |
|
The use case was randomly evicting a key from a large hash. |
|
Perhaps we need to implement |
The Problem
The code above takes 30 seconds on i7-4770K. That's barely 170 picks per second.
(To put this into perspective array of this size can do 50_000 picks per second.)
The text was updated successfully, but these errors were encountered: