Benchmarks #6

ritchie46 · 2021-11-28T10:32:48Z

As we compare different tools here. It would be cool to run benchmarks from this repo.

Maybe in CI, and later maybe even a dedicated runner.

These can could then be shown on the website. I am already assuming here that polars does great. 😄

koaning · 2021-11-28T10:41:41Z

Benchmarks are most certainly the plan! Any preference on how though? Part of me likes the idea of running it on Github Actions, but I'm wondering if they provide consistent hardware. There's also "what datasets shall we use" and "where to host those". I may also imagine that we may want to consider versions of functions. After all, there may be multiple ways to implement "sessionize".

koaning · 2021-11-28T10:47:11Z

Come to think of it, do we really want to download large datasets and run potentially long-running benchmarks in Github Actions?

ritchie46 · 2021-11-28T12:13:49Z

Could create CI that only runs on manual triggers. In the polars repo, we create the datasets instead of downloading.

The VM's are shared, but I do think that within a pipeline we have the same compute (not really sure though), this would make relative comparisons still sensbible within one run.

koaning · 2021-11-30T10:30:01Z

Fair enough. Let's try and start with GithubCI just to keep things simple.

Where would you want to store the data from the benchmark results? Do we want to store the results of the runs in git?

ritchie46 · 2021-11-30T18:43:39Z

Hmm.. that's maybe a good idea yes. We could store it in a separate clean branch. The whole benchmarking is a large todo still.

I also want to run TCPH benchmarks in the polars repo, which would need dedicated compute. I can imagine eventually setting up a database etc.

koaning · 2021-11-30T21:08:17Z

I just got a base thing goin' on my local multiple dispatch branch.

What does TCPH stand for?

Also, simulating some of these datasets is tricky. How might we properly simulate a session dataset?

koaning · 2021-11-30T21:12:15Z

I was thinking about building a memo script, but I'm open to other ideas too. It kind of depends on how accurate you'd like these numbers. There's also stuff like measure parquet vs. csv and/or number of CPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks #6

Benchmarks #6

ritchie46 commented Nov 28, 2021

koaning commented Nov 28, 2021

koaning commented Nov 28, 2021

ritchie46 commented Nov 28, 2021

koaning commented Nov 30, 2021

ritchie46 commented Nov 30, 2021

koaning commented Nov 30, 2021

koaning commented Nov 30, 2021

Benchmarks #6

Benchmarks #6

Comments

ritchie46 commented Nov 28, 2021

koaning commented Nov 28, 2021

koaning commented Nov 28, 2021

ritchie46 commented Nov 28, 2021

koaning commented Nov 30, 2021

ritchie46 commented Nov 30, 2021

koaning commented Nov 30, 2021

koaning commented Nov 30, 2021