Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrames.jl 1.0.1 is out, benchmarks are outdated (regarding Julia) #195

Closed
PallHaraldsson opened this issue Apr 26, 2021 · 5 comments
Closed
Labels

Comments

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Apr 26, 2021

Hi,

Since "innerjoin, leftjoin, rightjoin, outerjoin, semijoin, and antijoin are now much faster" in 1.0 of DataFrames.jl, and you benchmarked older version it would be nice if you can rerun benchmarks. Also Julia 1.6.1 is out, while I'm not sure it should be faster for this, it's best to use it so people are not in doubt.

I'm also curious if out-of-core processing just works, I understand it's there in the package (maybe only for Arrow files?).

@PallHaraldsson PallHaraldsson changed the title DataFrames.jl 1.0.1 is out, benchmark may be outdated DataFrames.jl 1.0.1 is out, benchmarks are outdated (regarding Julia) Apr 26, 2021
@jangorecki
Copy link
Contributor

jangorecki commented Apr 27, 2021

Hi Pali,
When you asked the question benchmark for julia 1.0.1 was already running. It finished later on and results that you asked for are already published on the report. Below you can find comparison of previously tested version vs 1.0.1. As we can see there is a big speed up.
If you asked for out-of-core processing for julia then I am not aware of it, if it just works, then we should see "join" task on 1e9 to giving some timings. It doesn't, so I assume it is not yet supported in julia.

groupby

|in_rows |knasorted                                       |question_group |question                    | 20210408_0.22.7| 20210426_1.0.1| new2old|
|:-------|:-----------------------------------------------|:--------------|:---------------------------|---------------:|--------------:|-------:|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           1.299|          0.304|    0.23|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.268|          0.084|    0.31|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           0.684|          0.456|    0.67|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           0.462|          0.185|    0.40|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |           0.888|          0.346|    0.39|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |           2.264|          1.861|    0.82|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |           1.231|          1.400|    1.14|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |           2.345|          2.096|    0.89|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |           2.012|          1.266|    0.63|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |           1.921|          2.484|    1.29|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           1.407|          0.322|    0.23|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.268|          0.073|    0.27|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           0.958|          0.610|    0.64|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           0.484|          0.206|    0.43|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |           1.298|          0.627|    0.48|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |           1.767|          1.390|    0.79|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |           2.352|          2.263|    0.96|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |           5.184|          4.423|    0.85|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |           1.534|          0.930|    0.61|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |           2.517|          2.485|    0.99|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           1.297|          0.305|    0.24|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.256|          0.072|    0.28|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           1.670|          0.755|    0.45|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           0.452|          0.191|    0.42|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |           1.473|          1.142|    0.78|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |           1.383|          1.126|    0.81|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |           4.059|          3.156|    0.78|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |          10.977|          8.847|    0.81|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |           1.350|          0.809|    0.60|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |           2.670|          2.479|    0.93|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1               |           1.314|          0.328|    0.25|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1:id2           |           0.273|          0.090|    0.33|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 mean v3 by id3       |           0.680|          0.457|    0.67|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |mean v1:v3 by id4           |           0.454|          0.182|    0.40|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1:v3 by id6            |           0.888|          0.383|    0.43|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |median v3 sd v3 by id4 id5  |           2.310|          1.898|    0.82|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |max v1 - min v2 by id3      |           1.230|          1.334|    1.08|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |largest two v3 by id6       |           2.357|          1.997|    0.85|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |regression v1 v2 by id2 id4 |           1.431|          0.857|    0.60|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |sum v3 count by id1:id6     |           1.905|          2.460|    1.29|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1               |           1.430|          0.343|    0.24|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.285|          0.084|    0.29|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           0.711|          0.486|    0.68|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           0.553|          0.325|    0.59|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1:v3 by id6            |           0.985|          0.400|    0.41|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |           2.576|          2.042|    0.79|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |           1.698|          1.562|    0.92|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |largest two v3 by id6       |           2.661|          2.338|    0.88|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |           2.972|          2.575|    0.87|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |           2.164|          2.776|    1.28|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           2.213|          0.942|    0.43|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           1.026|          0.771|    0.75|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           5.225|          3.755|    0.72|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           2.769|          1.301|    0.47|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |           9.921|          3.928|    0.40|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |          17.910|         13.667|    0.76|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |          16.864|         16.306|    0.97|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |          32.412|         25.756|    0.79|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |          15.148|         10.469|    0.69|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |          27.944|         18.766|    0.67|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           2.563|          1.000|    0.39|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.921|          0.710|    0.77|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |          10.768|          9.344|    0.87|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           2.706|          1.319|    0.49|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |          15.515|         12.834|    0.83|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |          12.004|          9.662|    0.80|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |          28.399|         25.419|    0.90|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |          64.707|         56.707|    0.88|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |          10.848|          8.683|    0.80|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |          36.952|         21.002|    0.57|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           3.981|          1.156|    0.29|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.936|          0.698|    0.75|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |          25.464|         12.803|    0.50|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           2.951|          1.678|    0.57|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |          19.090|         25.847|    1.35|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |          12.868|          9.075|    0.71|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |          53.671|         40.396|    0.75|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |         124.747|        126.466|    1.01|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |          12.218|          7.443|    0.61|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |          38.693|         26.716|    0.69|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1               |           2.399|          1.157|    0.48|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1:id2           |           1.155|          0.886|    0.77|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 mean v3 by id3       |           4.580|          3.597|    0.79|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |mean v1:v3 by id4           |           2.723|          1.198|    0.44|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1:v3 by id6            |           9.897|          3.997|    0.40|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |median v3 sd v3 by id4 id5  |          17.775|         14.362|    0.81|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |max v1 - min v2 by id3      |          16.593|         16.873|    1.02|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |largest two v3 by id6       |          32.623|         26.762|    0.82|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |regression v1 v2 by id2 id4 |           8.233|          5.496|    0.67|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |sum v3 count by id1:id6     |          26.806|         18.907|    0.71|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1               |           2.463|          1.000|    0.41|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           1.040|          0.809|    0.78|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           4.551|          3.727|    0.82|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           3.313|          1.733|    0.52|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1:v3 by id6            |          11.843|          4.255|    0.36|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |          20.167|         14.674|    0.73|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |          19.426|         17.914|    0.92|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |largest two v3 by id6       |          34.922|         26.448|    0.76|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |          20.954|         15.541|    0.74|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |          27.872|         19.549|    0.70|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |          12.678|         15.705|    1.24|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           9.801|          9.075|    0.93|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |         123.048|         89.388|    0.73|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |          32.543|         23.274|    0.72|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |         224.940|        120.389|    0.54|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |         224.752|        195.448|    0.87|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |         334.029|        357.317|    1.07|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1               |          21.852|          9.445|    0.43|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1:id2           |          12.664|          8.576|    0.68|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 mean v3 by id3       |         107.168|         54.936|    0.51|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |mean v1:v3 by id4           |          39.450|         11.435|    0.29|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1:v3 by id6            |         200.221|         79.478|    0.40|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |median v3 sd v3 by id4 id5  |         241.610|        193.551|    0.80|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |max v1 - min v2 by id3      |         301.821|        244.512|    0.81|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |largest two v3 by id6       |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |regression v1 v2 by id2 id4 |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |sum v3 count by id1:id6     |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1               |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1:id2           |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |mean v1:v3 by id4           |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1:v3 by id6            |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |largest two v3 by id6       |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |              NA|             NA|      NA|

join

|in_rows |knasorted               |question               | 20210408_0.22.7| 20210426_1.0.1| new2old|
|:-------|:-----------------------|:----------------------|---------------:|--------------:|-------:|
|1e7     |0% NAs, unsorted data   |small inner on int     |           2.635|          0.890|    0.34|
|1e7     |0% NAs, unsorted data   |medium inner on int    |           2.470|          0.808|    0.33|
|1e7     |0% NAs, unsorted data   |medium outer on int    |           7.924|          3.030|    0.38|
|1e7     |0% NAs, unsorted data   |medium inner on factor |           3.427|          0.959|    0.28|
|1e7     |0% NAs, unsorted data   |big inner on int       |           7.428|          2.387|    0.32|
|1e7     |5% NAs, unsorted data   |small inner on int     |           3.006|          0.999|    0.33|
|1e7     |5% NAs, unsorted data   |medium inner on int    |           2.415|          0.863|    0.36|
|1e7     |5% NAs, unsorted data   |medium outer on int    |           7.753|          3.154|    0.41|
|1e7     |5% NAs, unsorted data   |medium inner on factor |           3.521|          1.084|    0.31|
|1e7     |5% NAs, unsorted data   |big inner on int       |           7.548|          3.804|    0.50|
|1e7     |0% NAs, pre-sorted data |small inner on int     |           2.463|          0.720|    0.29|
|1e7     |0% NAs, pre-sorted data |medium inner on int    |           1.931|          0.742|    0.38|
|1e7     |0% NAs, pre-sorted data |medium outer on int    |           6.941|          2.449|    0.35|
|1e7     |0% NAs, pre-sorted data |medium inner on factor |           2.455|          0.840|    0.34|
|1e7     |0% NAs, pre-sorted data |big inner on int       |           7.530|          1.452|    0.19|
|1e8     |0% NAs, unsorted data   |small inner on int     |         122.010|         82.456|    0.68|
|1e8     |0% NAs, unsorted data   |medium inner on int    |         135.868|         94.706|    0.70|
|1e8     |0% NAs, unsorted data   |medium outer on int    |         217.156|        112.529|    0.52|
|1e8     |0% NAs, unsorted data   |medium inner on factor |         146.366|         96.223|    0.66|
|1e8     |0% NAs, unsorted data   |big inner on int       |         255.316|         91.470|    0.36|
|1e8     |5% NAs, unsorted data   |small inner on int     |         118.436|         92.100|    0.78|
|1e8     |5% NAs, unsorted data   |medium inner on int    |         131.347|         93.817|    0.71|
|1e8     |5% NAs, unsorted data   |medium outer on int    |         221.642|        110.364|    0.50|
|1e8     |5% NAs, unsorted data   |medium inner on factor |         145.716|         97.051|    0.67|
|1e8     |5% NAs, unsorted data   |big inner on int       |         259.400|        130.767|    0.50|
|1e8     |0% NAs, pre-sorted data |small inner on int     |         119.080|        100.430|    0.84|
|1e8     |0% NAs, pre-sorted data |medium inner on int    |         123.884|         90.533|    0.73|
|1e8     |0% NAs, pre-sorted data |medium outer on int    |         223.578|        104.135|    0.47|
|1e8     |0% NAs, pre-sorted data |medium inner on factor |         127.576|         93.672|    0.73|
|1e8     |0% NAs, pre-sorted data |big inner on int       |         253.908|         83.728|    0.33|
|1e9     |0% NAs, unsorted data   |small inner on int     |              NA|             NA|      NA|
|1e9     |0% NAs, unsorted data   |medium inner on int    |              NA|             NA|      NA|
|1e9     |0% NAs, unsorted data   |medium outer on int    |              NA|             NA|      NA|
|1e9     |0% NAs, unsorted data   |medium inner on factor |              NA|             NA|      NA|
|1e9     |0% NAs, unsorted data   |big inner on int       |              NA|             NA|      NA|
|1e9     |5% NAs, unsorted data   |small inner on int     |              NA|             NA|      NA|
|1e9     |5% NAs, unsorted data   |medium inner on int    |              NA|             NA|      NA|
|1e9     |5% NAs, unsorted data   |medium outer on int    |              NA|             NA|      NA|
|1e9     |5% NAs, unsorted data   |medium inner on factor |              NA|             NA|      NA|
|1e9     |5% NAs, unsorted data   |big inner on int       |              NA|             NA|      NA|
|1e9     |0% NAs, pre-sorted data |small inner on int     |              NA|             NA|      NA|
|1e9     |0% NAs, pre-sorted data |medium inner on int    |              NA|             NA|      NA|
|1e9     |0% NAs, pre-sorted data |medium outer on int    |              NA|             NA|      NA|
|1e9     |0% NAs, pre-sorted data |medium inner on factor |              NA|             NA|      NA|
|1e9     |0% NAs, pre-sorted data |big inner on int       |              NA|             NA|      NA|

@bkamins
Copy link
Contributor

bkamins commented Apr 27, 2021

@PallHaraldsson - also the benchmark was run on old setup of DataFrames.jl. Now the PR fixing this has been merged (and I hope that the next run will show improvements - especially in join operations).

@jangorecki - your work is great. It allows us to pinpoint the performance choke points we have in DataFrames.jl!

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Apr 27, 2021

Thanks, I'm pretty sure I was looking at numbers after update to 1.0.1 earlier today (with good speedup, I guess from table above), but since then I see updated again and DF.jl got a lot slower, e.g. 2s vs 7s. I do not remember other numbers, and it's not simply I misremeber as I had calculated:

julia> 0.3+0.08+0.46+0.18+0.35 # First run
1.37

julia> 0.06+0.08+0.24+0.10+0.27 # Second run
0.75

julia> 1.37+0.75 # What you reported, then, but rounded to 2s
2.12

Now all queries much slower, except Query 4 and Query 5, that are the exact same (as above) or latter slightly faster for second run.

@jangorecki
Copy link
Contributor

@PallHaraldsson there was another run using different julia config, see #194 (comment) for details

@bkamins
Copy link
Contributor

bkamins commented Apr 27, 2021

Yes - we are post 1.0 release and it introduced significant changes. Therefore we yet need to learn how to properly tune the whole ecosystem and H2O benchmarks are great to learn where we have problems (in short: the only change we did was a different setting of CSV reader - @quinnj is aware of this issue and I know he is working on improving things here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants