-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrames.jl 1.0.1 is out, benchmarks are outdated (regarding Julia) #195
Comments
Hi Pali, groupby
join
|
@PallHaraldsson - also the benchmark was run on old setup of DataFrames.jl. Now the PR fixing this has been merged (and I hope that the next run will show improvements - especially in join operations). @jangorecki - your work is great. It allows us to pinpoint the performance choke points we have in DataFrames.jl! |
Thanks, I'm pretty sure I was looking at numbers after update to 1.0.1 earlier today (with good speedup, I guess from table above), but since then I see updated again and DF.jl got a lot slower, e.g. 2s vs 7s. I do not remember other numbers, and it's not simply I misremeber as I had calculated:
Now all queries much slower, except Query 4 and Query 5, that are the exact same (as above) or latter slightly faster for second run. |
@PallHaraldsson there was another run using different julia config, see #194 (comment) for details |
Yes - we are post 1.0 release and it introduced significant changes. Therefore we yet need to learn how to properly tune the whole ecosystem and H2O benchmarks are great to learn where we have problems (in short: the only change we did was a different setting of CSV reader - @quinnj is aware of this issue and I know he is working on improving things here). |
Hi,
Since "innerjoin, leftjoin, rightjoin, outerjoin, semijoin, and antijoin are now much faster" in 1.0 of DataFrames.jl, and you benchmarked older version it would be nice if you can rerun benchmarks. Also Julia 1.6.1 is out, while I'm not sure it should be faster for this, it's best to use it so people are not in doubt.
I'm also curious if out-of-core processing just works, I understand it's there in the package (maybe only for Arrow files?).
The text was updated successfully, but these errors were encountered: