Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed feels slow #52

Closed
briantliao opened this issue Aug 9, 2019 · 30 comments
Closed

Speed feels slow #52

briantliao opened this issue Aug 9, 2019 · 30 comments

Comments

@briantliao
Copy link

Compared to a Python Notebook, the speed feels slow. Running :timing, each block execution even for simple code is 5 to 10 seconds, and extern crate is really slow.
I was hoping for something like notebook driven development like you can do in Python, but the speed doesn't seem to allow me to do so.

@davidlattimore
Copy link
Collaborator

Thanks for the feedback. You're right, the speed is not ideal. I doubt it can be as fast as Python without many person years of work improving compilation speeds in rustc. It should however be better than it currently is. I think the compilation speed regressed somewhat with the change that made it work on Mac and Windows. On Linux I used to see timings of ~600ms, but now I'm seeing around 2 seconds (on the same machine). I've got some ideas for changes that I might be able to make that might be able to improve the speed, hopefully for all platforms, not just Linux. Out of curiosity, what platform are you on? I've had another report of ~10 second times that was on Windows, so I'm wondering if there's something odd happening there. Unfortunately I don't have Windows to test on.

@davidlattimore
Copy link
Collaborator

Oh, actually one big change I realised since when it used to take ~600ms, is that optimization was turned on. If you turn it off with ":opt", then it gets a bit faster to compile stuff. Stuff will then obviously run slower, but if you're not doing significant computation that may not matter.

@briantliao
Copy link
Author

image

I am running on a Mac. I think 1500 ms is slow for just let x = 5, or is that what is expected?

Another question, is there a way to not have to download extern crates each time jupyter is run? Something like cargo? I did see :dep, but don't think that is what I wanted.

Thanks!

@davidlattimore
Copy link
Collaborator

There's quite a lot of fixed overheads, so it probably doesn't make much difference to the time whether you've doing let a = 5, or writing 20 lines of code. 1500ms is certainly slow, but not entirely unexpected. I'm hopeful I can get it a little bit faster with the changes I'm wanting to make, but we'll see how we go.

Regarding downloading of crates, extern crate and :dep do more or less the same thing. In both cases, it's cargo that is doing the downloading. Cargo caches the downloads, so it shouldn't be downloading it again if you've already downloaded it. It will however have to recompile it again after you've restarted the Jupyter kernel. One workaround to avoid this, is that you can set the environment variable EVCXR_TMPDIR to some fixed directory. If you do this, then it will reuse the same cargo target directory each time you run.

@axil
Copy link

axil commented Aug 13, 2019

Yes, I can confirm that :opt matters! This is on ubuntu:

Optimization: 1
image

Optimization: 0
image

A huge leap forward! :)

But can you recall a revision or tag which had ~600ms?

On the same machine,

hi.rs:
fn main(){
    println!("{}", 2+2);
}

$ time cargo script hi.rs
4
cargo script hi.rs  0.26s user 0.16s system 102% cpu 0.417 total

@axil
Copy link

axil commented Aug 15, 2019

Finally got it working on Windows - it is a bit worse there:
image
(Optimization=0)

@axil
Copy link

axil commented Aug 15, 2019

I've written line/cell magics for rust which runs at full available speed. Does not store state. Runs separate cells independently.

@davidlattimore
Copy link
Collaborator

Nice work :)

On the evcxr front, I've pushed a few commits that might help speed a bit. For one, I've disabled optimization by default. I've also changed the execution model. Currently, for running things line println, I'm getting times of about ~415ms, whereas the currently release (0.3.5) is more like ~530ms. So probably about 100ms faster.

Note that 2+2 will be slightly slower because it needs to make two attempts at compiling in order to determine how to display the result. The first attempt tries to call evcxr_display on the result. When that fails, it then uses the Display trait to format as plain text. That said, even 2+2 only takes 500-600ms on my laptop.

It's also worth noting that the first thing run does tend to be a little slower that subsequent lines. All my times are taken after running a few similar, but slightly different bits of code - e.g. println with different strings, or adding different numbers together.

I haven't pushed a new version to crates.io yet. Will wait a few days and see if I decide to make any more changes first.

@axil
Copy link

axil commented Aug 20, 2019

Thanks :)
I'm most certainly doing something wrong but
image
How do I debug that?

@axil
Copy link

axil commented Aug 20, 2019

I'd expect timings such as
image
(that is 604 ms)

@davidlattimore
Copy link
Collaborator

I've just released 0.4.0, so you could try installing that. It'll probably be slightly easier to debug if nothing else. Previous versions wrote a separate crate for each bit of code, with each crate depending on the previous crates. With 0.4.0, it now writes a single crate and just updates it with each bit of new code.

If you do: :last_compile_dir it'll print the directory in which compilation is happening. Then you can try building there. The command that evcxr runs to compile the crate is something like:
cargo rustc -- -C prefer-dynamic

If you run println!("foo=123") then you can do something like the following to test compilation speeds repeatedly:

perl -pi -e "s/foo=\\d+/foo=$RANDOM/" src/lib.rs; time cargo rustc -- -C prefer-dynamic

That way the source won't be identical to the last time it compiled, forcing the compiler to do some work.

If compilation is slow there, then you could try editing Cargo.toml or adjusting cargo options to see if any make a big difference. If you have the nightly compiler installed, you can try running:

perl -pi -e "s/foo=\\d+/foo=$RANDOM/" src/lib.rs; time cargo +nightly rustc -- -C prefer-dynamic -Ztime-passes

That will tell you how long it's spending in different phases of the compilation. Again, it's probably good to run it a couple of times to see the results with incremental compilation after a small edit.

I'll be very interested in what you find.

@axil
Copy link

axil commented Aug 20, 2019

Yes, v0.4.0 feels faster (on the same ubuntu machine):
image

On windows something has supposedly got broken:
image

The error messages in the latter cell only appear in ":timing" mode.

@axil
Copy link

axil commented Aug 20, 2019

Here's what I get when I run "cargo rustc -- -C prefer-dynamic" in the last_compile_dir on windows:

C:\Users\ASUS\AppData\Local\Temp\.tmp794iwy>cargo rustc -- -C prefer-dynamic
   Compiling ctx v1.0.0 (C:\Users\ASUS\AppData\Local\Temp\.tmp794iwy)
warning: variable does not need to be mutable
   --> src\lib.rs:117:5
    |
117 | let mut evcxr_variable_store = evcxr_internal_runtime::VariableStore::new();
    |     ----^^^^^^^^^^^^^^^^^^^^
    |     |
    |     help: remove this `mut`
    |
    = note: #[warn(unused_mut)] on by default

warning: variable does not need to be mutable
   --> src\lib.rs:113:6
    |
113 | {let mut vars_ok = true;
    |      ----^^^^^^^
    |      |
    |      help: remove this `mut`

    Finished dev [unoptimized] target(s) in 4.00s

I also found that deleting the 'target' dir is sufficient to force the compiler to recompile the project.

@axil
Copy link

axil commented Aug 20, 2019

Added empty main function to lib.rs, compiled with 'rustc lib.rs' - fast.

Fixed mut warnings, removed all sections from Cargo.toml except [package] - slow:

   Compiling ctx v1.0.0 (C:\Users\ASUS\AppData\Local\Temp\.tmp794iwy)
    Finished dev [unoptimized + debuginfo] target(s) in 4.30s

@davidlattimore
Copy link
Collaborator

0.4.1 fixed the unused mut and dead code warnings (by ignoring them).

It's true that deleting the target dir does force a recompile, however it forces a full recompile, which is not what evcxr does with each bit of code you run. Only the first bit of code evaluated will be a full compile, the others will be incremental. That's why making a small edit to the code is more representative of what actually happens.

So compiling the crate takes 4.3s, but rust_magic, which presumably is also compiling a crate takes 0.6s. Something odd is happening. If you just create a new Rust crate with cargo new then compile that, how long does it take?

@axil
Copy link

axil commented Aug 22, 2019

Testing v0.4.2.
image
Still noticable time difference against 'cargo script' I used in 'rust_magic':
image

This is on debian. I ran the benchmarks several times, timings are roughly the same between the consecutive runs.

@davidlattimore
Copy link
Collaborator

Thanks, that's helpful! I've managed to track it down to the call std::panic::catch_unwind. If I don't call that, it speeds up compilation a good bit. I'm going to make it so that if :preserve_vars_on_panic is turned off, it doesn't emit the catch_unwind. Given the difference it makes, I'll probably also change preserve_vars_on_panic to default to false.

Interestingly with catch_unwind gone, opt level 0 vs 2 appears to make no difference, so I'm thinking of turning optimisation back on by default.

In my local timings, evcxr is still a tiny bit slower. That appears to be due to the code for loading and storing variables. I'm going to change it so that that code is only emitted when you actually have stored variables, so that a simple print or eval doesn't pay the cost.

@davidB
Copy link
Contributor

davidB commented Aug 23, 2019

An other idea to explore, avoid evalContext when calling command (like :deps,...) and pay the price at the next cell with statement/expression. So :deps call will be ~zero vs

@davidlattimore
Copy link
Collaborator

My main concern with doing that would be if the user makes a mistake with a :dep instruction. They wouldn't get feedback straight away that it was wrong, and worse, once the :dep is added, there's not currently any way to remove it, so every compile after that would fail. But that's a good point that having to wait separately for two relatively long compiles is annoying. It probably wouldn't take much to allow multiple :dep commands, together with code to be executed. Then you can put it all in the one cell.

@davidlattimore
Copy link
Collaborator

I've pushed 0.4.4, which allows putting one or more :dep commands at the start of a block of code. It also adds support for using sccache.

@davidB
Copy link
Contributor

davidB commented Aug 25, 2019 via email

@davidlattimore
Copy link
Collaborator

Huh, I wonder if I just duplicated functionality...

@davidlattimore
Copy link
Collaborator

You're quite correct. Sorry, I forgot about that. Here's what happened... I went to add support for init.evcxr and thought that CommandContext::execute supported mixed commands and code, but then found that it didn't. I then didn't think to actually try it in the Jupyter kernel, when I went to implement it, I just started by writing a test for mixed commands and code in CommandContext::execute, made that test pass, then confirmed that it worked in the Jupyter kernel.

So I guess at the moment, my splitting code isn't getting run because the Jupyter kernel pre-splits it.

@davidlattimore
Copy link
Collaborator

I finally got around to trying using lld to link with evcxr. I'd tried once before, but ran into problems. This time it worked. It does appear to make a difference, although how much difference depends on things like optimization level, what crates if any you're using etc.

Ran with:

RUSTFLAGS="-C link-arg=-fuse-ld=lld" evcxr

With no crates, compilation is already relatively fast, although it did make it marginally faster. More interesting though was when I pulled in the regex crate with optimization disabled (:opt 0). The following line took 978ms with the regular system linker and only 417ms with lld:

let re : regex::Regex = regex::Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();

At :opt 1 the same line went from 1328ms with the system linker to 862ms with lld.

At :opt 2 it went from 1401ms with the system linker to 1176ms with lld.

Might be worthwhile adding built-in support to make it easier to enable without needing to set the environment variable.

@alsuren
Copy link

alsuren commented Aug 2, 2020

Experience report:

I'm currently on a mission to create a version of your mybinder configs that will pre-cache some reasonable-sized dependency (I went for arrow). I think this is is a reasonable thing to optimise because it will affect people's first impressions.

I'm tracking my progress on this branch: master...alsuren:nightly-2020-07 which can be build at https://mybinder.org/v2/gh/alsuren/evcxr/nightly-2020-07

Currently I feel like I'm going backwards, but I will report the numbers anyway:

On the host (macos), :dep arrow = "1.0" takes 34 seconds then straight after deleting EVCXR_TMPDIR, then 2 seconds if I re-run in the same evcxr instance

alsuren@Davids-MacBook-Pro:~/src/evcxr$ time echo ':timing
:dep arrow = "1.0"
:dep arrow = "1.0"' | EVCXR_TMPDIR=$HOME/tmp/evcxr_build evcxr
Welcome to evcxr. For help, type :help
Timing: true

Took 205ms
Took 34086ms
Took 2365ms

real	0m37.241s
user	0m51.485s
sys	0m6.967s

If I kill the evcxr instance and rerun the test with the EVCXR_TMPDIR prepopulated, it takes 5 seconds then 2 seconds. This feels like a promising start.

alsuren@Davids-MacBook-Pro:~/src/evcxr$ time echo ':timing
:dep arrow = "1.0"                                            
:dep arrow = "1.0"' | EVCXR_TMPDIR=$HOME/tmp/evcxr_build evcxr
Welcome to evcxr. For help, type :help
Timing: true

Took 323ms
Took 4985ms
Took 2025ms

real	0m9.700s
user	0m0.889s
sys	0m1.063s

If I do jupyter-repo2docker . and then docker exec -it $(docker ps --last 1 --quiet) bash then I end up in a shell that should be able to do the same thing (I made the postBuild script set EVCXR_TMPDIR and pre-run :dep arrow = "1.0")

This now takes 46 seconds followed by 50 seconds.

alsuren@71e8bb11a621:~$ time echo ':timing
> :dep arrow = "1.0"                                            
> :dep arrow = "1.0"' | evcxr
Welcome to evcxr. For help, type :help
Timing: true

Took 121ms
Took 46467ms
Took 50557ms

real	1m37.874s
user	4m57.773s
sys	0m13.092s

I'm not sure why this is so slow, but my money is on the filesystem being slow (docker is using overlay fs for /). This is still quicker than if I don't have EVCXR_TMPDIR set (3m44s followed by 44s):

alsuren@71e8bb11a621:~$ time echo ':timing
> :dep arrow = "1.0"
> :dep arrow = "1.0"' | env -u EVCXR_TMPDIR evcxr
Welcome to evcxr. For help, type :help
Timing: true

Took 125ms
Took 224473ms
Took 43732ms

real	4m28.954s
user	16m3.864s
sys	0m48.214s

Note that my docker image has sccache: true and linker: lld.

I'm wondering if there's any way to set EVCXR_TMPDIR to point at a sensible filesystem and then copy the cache from the dockerfile. Does anyone else have experience with evcxr in docker?

@bjorn3
Copy link

bjorn3 commented Aug 2, 2020

I am curious how much difference -Cprefer-dynamic makes. That dynamically links libstd instead of statically.

@davidlattimore
Copy link
Collaborator

Evcxr already passes -Cprefer-dynamic to rustc. I'm afraid I haven't tried evcxr in docker, so probably can't comment. Good luck though.

@MartinKavik
Copy link

Regarding downloading of crates, extern crate and :dep do more or less the same thing. In both cases, it's cargo that is doing the downloading. Cargo caches the downloads, so it shouldn't be downloading it again if you've already downloaded it. It will however have to recompile it again after you've restarted the Jupyter kernel. One workaround to avoid this, is that you can set the environment variable EVCXR_TMPDIR to some fixed directory. If you do this, then it will reuse the same cargo target directory each time you run.

Hi, I'm writing one open-source project and want to use Ecxvr for both REPL and Jupyter Notebook. The project has one larger crate - the build time with :dep is over 2 minutes with :opt 0. And users need the library in almost all cells / REPL commands so it becomes quite painful when you need to restart the kernel / REPL often. Hence I would like to ask some questions, please:

  1. I'm not sure how to set env variables when I use Jupyter inside VS Code (a native extension) - I hoped a cell with
    std::env::set_var("EVCXR_TMPDIR", "./jupyter_test_target");
    would work, but it doesn't. Is there something like :env to set such things?
  2. An attempt to run REPL with EVCXR_TMPDIR="./jupyter_test_target" evcxr fails because it needs an absolute path. What's the recommended way to store built/cached/target code in the (sub)folder where the notebook or REPL has been started?
  3. Can I somehow pre-compile that large library/dependency? Then, the app/notebook/repl users would be able to automatically download it instead of downloading & rebuilding.

Thank you

@bjorn3
Copy link

bjorn3 commented Jan 26, 2022

I'm not sure how to set env variables when I use Jupyter inside VS Code (a native extension) - I hoped a cell with

EVCXR_TMPDIR is read before executing any code I think.

An attempt to run REPL with EVCXR_TMPDIR="./jupyter_test_target" evcxr fails because it needs an absolute path. What's the recommended way to store built/cached/target code in the (sub)folder where the notebook or REPL has been started?

You could use EVCXR_TMPDIR="$(pwd)/jupyter_test_target" evcxr I think.

Can I somehow pre-compile that large library/dependency? Then, the app/notebook/repl users would be able to automatically download it instead of downloading & rebuilding.

Rustc requires that the compiler version used to compile dependencies matches the current version. In addition convincing cargo to not rebuild it even with a matching compiler version may be hard. At the very least you have to restore the mtime of all files in the target directory and possibly of the dependency source files in ~/.cargo/registry/src.

@bjorn3
Copy link

bjorn3 commented Jan 26, 2022

You may be able to use sccache with a cloud storage, however this requires giving your users write access to said storage I think. In addition it requires the absolute path of the target dir and of all dependency sources (~/.cargo/registry/src) to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants