New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataframe-split is slow with many groups #5
Comments
I think the performance bottleneck is |
Issue might have been running out of stack in |
Switch to using hashtable to store and lookup keys in |
Performance is now not terrible for about 100,000 rows and 300 groups (i.e., split-example code above), but 50,000 rows and 1,000 groups (not shown here) are still frustratingly slow. I was perhaps overly optimistic that improving |
These tests show that run time increases roughly linearly with group size (when controlling for total df size):
|
Went back to square one with |
The text was updated successfully, but these errors were encountered: