Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars build times #847

Closed
sophiajt opened this issue Jun 21, 2021 · 4 comments
Closed

Polars build times #847

sophiajt opened this issue Jun 21, 2021 · 4 comments

Comments

@sophiajt
Copy link

Are you using Python or Rust?

Rust

Which feature gates did you use?

Default and also ["serde", "rows"]

What version of polars are you using?

git = "https://github.com/pola-rs/polars"
rev = "f60d86bc0921bd42635e8a33e7aad28ebe62dc3e"
version = "0.14.2"

What operating system are you using polars on?

Linux

Describe your bug.

Build times both standalone and as part of Nushell are quite high. I did a cargo bloat run of a default build of polars: https://gist.github.com/jonathandturner/82cb8304996cc6c9ebc912b193eacce7.

When we enable dataframe support in Nushell, which is built on polars and arrow, the build times for the release build increase 3x. On my machine, the build times go from 10mins pre-dataframe to 30mins with dataframe.

I'm hoping we can work together to figure out how we can improve build times.

What are the steps to reproduce the behavior?

In polars:
cargo bloat

In Nushell:
cargo build --release --all --features=extra,dataframe

What is the expected behavior?

While we know that polars will add some build time, we're seeing large amounts of memory usage during builds and hoping we can work together to lower the memory usage. In theory, this should cause less memory thrashing during build, yielding faster build times.

cc @elferherrera

@ritchie46
Copy link
Member

I'm hoping we can work together to figure out how we can improve build times.

Me too! My two cents:

The Series has no generic type information. Its a trait object that's implemented for all possible polars data types. This means that, contrary to other generic code, you also compile what you don't use.

To mitigate compile times, I am actively introducing feature gates, both for the data types that should be compiled and the operation that should be compiled.

On my machine, the build times go from 10mins pre-dataframe to 30mins with dataframe

I am curious how this differs so much between machines. I can compile py-polars with all features in about 6-7 minutes if I use all virtual cores (12) and lld linker.

Anyway.. I will try to do a bloat scan in the coming weeks and see if I can find some low hanging fruit.

@ritchie46
Copy link
Member

I am curious how this differs so much between machines. I can compile py-polars with all features in about 6-7 minutes if I use all virtual cores (12) and lld linker.

For context. The mentioned compile times are with lto-fat and single threaded compilation. That explains the difference in mentioned compile times.

https://github.com/nushell/nushell/blob/55cab9eb4ff4ee3ee73efc6f8973901b2a91c921/Cargo.toml#L162

algitbot pushed a commit to alpinelinux/aports that referenced this issue May 7, 2022
The dataframe feature depends on polars, which consumes a lot of memory
and time to build. Enabling both LTO and dataframe feature may cause CI
build to fail due to out of memory.

pola-rs/polars#847

Signed-off-by: nibon7 <nibon7@163.com>
@ghuls
Copy link
Collaborator

ghuls commented Oct 21, 2022

Build times improved recently due to compiler improvements in rust nighty. Compile time for optimized Polars (python) went down from 30 minutes to 15 minutes.

@ritchie46
Copy link
Member

Yeah, we can close this. It is a something we constantly evaluate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants