Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is sccache applicable/useful for evcxr due the cdylib crate type? #184

Open
eldipa opened this issue Jun 25, 2021 · 10 comments
Open

Is sccache applicable/useful for evcxr due the cdylib crate type? #184

eldipa opened this issue Jun 25, 2021 · 10 comments

Comments

@eldipa
Copy link

eldipa commented Jun 25, 2021

Summary

sccache doesn't support cdylib crate types so evcxr cannot get any speed up from using it.

If that is correct,

  • would be useful to put a comment on the documentation of evcxr about this sccache limitation?
  • would make sense to implement a simple cache for evcxr like caching the .so files generated (I will like to here comments about this)

Long story

I installed sccache and while evcxr seems to send the requests to it, sccache keeps saying the it cannot cache them.

After a few try-and-error experimentation, I found that changing the crate type generated by evcxr from cdylib to rlib and disabling the incremental building are the requirements for sccache to start working (but I'm not claiming that this can be of any usefulness for evcxr)

A comment in the sccache code suggests that cdylib is not supported at all:
https://github.com/mozilla/sccache/blob/3f318a8675e4c3de4f5e8ab2d086189f2ae5f5cf/src/compiler/rust.rs#L1057-L1058

@davidlattimore
Copy link
Collaborator

The cdylib is the final target that gets built, but I would assume rlibs get produced for all the crates that get linked together to produce that cdylib. I just did a test compiling the regex crate and did see a speedup on subsequent runs.

My methodology was as follows:

  • Run evcxr
  • Enable timing with :timing
  • Run a simple println line just to make sure that evcxr has finished starting up.
  • Run :dep regex = "1.5.4"

I did this 3 times with sccache disabled and 3 times with it enabled and discarded the first run of each.

With sccache disabled, I got 12128ms and 12169ms.

With sccache enabled, I got 3474ms and 4139ms.

@bjorn3
Copy link

bjorn3 commented Jun 26, 2021

One thing you could try would be to make a dylib containing all dependencies. You can then depend on this dylib from all compiled user code dylibs. This should save linking time and memory.

@bjorn3
Copy link

bjorn3 commented Jun 26, 2021

Note that dependencies are not loaded by rustc unless you reference them. use dep_name; is enough.

@eldipa
Copy link
Author

eldipa commented Jun 26, 2021

@davidlattimore I did the same experiment that you did and you are right. I got between 52 and 59 secs without the cache and around 9 secs with the cache enabled.

So sccache really speeds up the execution but only for the :dep instructions and not for the snippets coded by the user.

Let me explain what I tested:

  • Start evcxr
  • Enable :timing
  • In a shell run sccache -s and take note of the Compile requests, Cache hits and Non-cacheable calls
  • Execute 1 + 2 or other simple Rust expression
  • Run sccache -s again

I repeated the test a few times with sccache enabled and in all the cases sccache reported an increasing number of Compile requests (so evcxr is using it) but the Cache hits remains constant while the Non-cacheable calls increases as well.

From that I deduce that sccache is not able to cache what the user types in evxcr (the 1+2 in this case) and my understanding is that evxcr compiles the user code as cdylib and that is not supported by sccache.

And the 1+2 is executed in 2 secs (in my machine) which it is quite slow that is a shame not to speed up that too.

My intuition says that most of the time is spent in evcxr writing to disk the Rust snippet (to :last_compile_dir) and to compile it with cargo. I assume that loading the dynamic library and calling it is negligible.

I think that evcxr users would benefit if evcxr could cache the dynamic library avoiding the writing and the compilation phase.

In particular:

  • people using jupyter that would be running the same cells over and over
  • people using byexample to run regression tests over their documentation (very similar to Rust's doctests)

I could do a proof of concept and get concrete numbers and if they are good, would the project willing to accept such feature (cache dynamic libraries)?

@bjorn3
Copy link

bjorn3 commented Jun 26, 2021

And the 1+2 is executed in 2 secs (in my machine) which it is quite slow that is a shame not to speed up that too.

Does using lld as linker help?

@eldipa
Copy link
Author

eldipa commented Jun 26, 2021

I don't know, but I would guess that the time writing to disk + running the compilation (running cargo, rustc and all of that) takes more time than linking.

Having said that, lld could improve the runtime if the dynamic libraries are cached before (so you don't write+compile).

But I'm just speculating here.

@bjorn3
Copy link

bjorn3 commented Jun 26, 2021

For me linking takes more than half the time of compiling hello world with rustc. Same for compiling a dynamic library with -Cprefer-dynamic like evcxr uses.

@eldipa
Copy link
Author

eldipa commented Jun 26, 2021

@bjorn3 I did a quick test and I'm having the same results. Using lld as the linker didn't show a significant difference.

I executed 1 + 2 several times with both linkers: the slowest result from using system gave me 2149 ms and the fastest result from lld was 1965 ms. It is a difference of 185 ms, a reduction of approx 8.56% (best case).

@davidlattimore
Copy link
Collaborator

I'm happy to accept a PR, if this is something you'd like to work on.

Probably any such cache should be behind a config option that's off by default, since otherwise the disk usage might surprise people, and running the same commands repeatedly is unlikely to be a super-common use-case, so people can probably turn it on if they find they need it.

@eldipa
Copy link
Author

eldipa commented Jun 27, 2021

Yes, I agree that this should be optional and turned off by default.

I will code a proof of concept in these days and see how far we can go.

TL;DR

My use-case is to use evcxr to execute snippets of Rust code in a Markdown file.

Theses Rust examples contain not only the code but also the expected output. Then evcxr executes them one by one and the byexample compares each obtained output with the expected output.

Basically it turns a documentation in a regression test suite and in this scenario executing the same Rust snippet is the most common case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants