[Feature Request]: Persistent caches #4712

montekki · 2020-06-02T19:19:43Z

One of the latest posts states:

One of the current peculiarities of rust-analyzer is that it doesn’t persist caches to disk. Opening project in rust-analyzer means waiting a dozen seconds while we process standard library and dependencies.

And while that may be true when working with IDEs like VScode that are launched once and used for a long period of time, other workflows that involve editors like vim and closing/opening lots of windows that don't share rust-analyzers state between each other is actually quite painful. Each time a new window is opened one has to sit and wait and for large repos that is quite a long period.

P.S. Sorry if it's a duplicate

The text was updated successfully, but these errors were encountered:

bjorn3 · 2020-06-02T19:26:25Z

As far as I understand @matklad doesn't want to do this yet as it would reduce the necessity of optimizing the initial analysis, thus reducing the likelihood that people will work on reducing the initial analysis time. There has been some discussion about the specific use case of closing and re-opening vim often, but so far nothing has changed.

tjkirch · 2020-10-13T21:36:27Z

It could be almost as good if rust-analyzer could be left running and shared between editing sessions.

bjorn3 · 2020-10-14T05:39:02Z

I think that would be something the client needs to do. There have also already been enough complaints about rust-analyzer keeping running after the client exits because of bugs.

tjkirch · 2020-10-14T14:54:44Z

I think that would be something the client needs to do.

Perhaps not necessarily. The current lifetime of the rust-analyzer process is tied to an editing session. Instead, I could imagine the analysis being split off and done in a longer-lived process that the session process communicates with. The longer-lived process would need to handle concurrent access and perhaps purge data after a time, but it would remove the startup cost for later editing sessions and reduce the memory usage for multiple editors.

bjorn3 · 2020-10-14T16:14:34Z

Who would manage that longer-lived process if the client doesn't? If nobody does, it will keep running forever, which is bad. If the language server started by the client does, it would exit as soon as all language servers would exit, which makes it useless for closing and re-opening vim.

flodiebold · 2020-10-14T16:19:03Z

One could imagine a scheme where rust-analyzer checks for a running server and forks and daemonizes if one isn't running yet, maybe shutting itself down automatically if there aren't any clients after a while. flow does something similar, for example. But I feel that's far too much complexity to fix a problem that basically only exists for vim users used to a certain workflow, to be honest.

matklad · 2020-10-14T16:40:08Z

We definitely won't implement persistent process withing rust-analyzer itself -- it is indeed a job for the editor. However, I think for editors like vim someone could write a separate rust-binary, rust-analyzer-supervisor, which would cache the connection to the ra.

rwols · 2021-06-26T18:38:37Z

Instead of a daemon process, wouldn't it be simpler to cache index results on disk? I know clangd does this.

bjorn3 · 2021-06-26T18:55:25Z

Persistent caches will require changes to salsa to be able to serialize it's cache. In addition it has the disadvantage that it makes optimizing the initial analysis less important, which may over time result in not just regressions of the initial analysis time, but also when performing a change.

Salsa's issue for serializable caches: salsa-rs/salsa#10

Also an observation from rust-lang/rfcs#1317 (comment):

My tips:

[...]

Don't store anything to disk. It's likely the oracle can be fast enough without doing this; and unnecessary complexity creates bugs. "Have you tried deleting the .ncb file?" (I remember having to do this a couple times per day when using VS, ca. 2005)

[...]

Note: I am not saying that persistent caches shouldn't every be implemented. I just think that it shouldn't be implemented yet.

lnicola · 2021-06-29T07:27:23Z

I remember having to do this a couple times per day when using VS, ca. 2005

To be fair, I think they got their stuff together when they moved to a database in... some previous decade.

matklad · 2021-08-16T10:46:19Z

Thinking about this, there seems to be the following options here:

implement on-disk persistence as a memory usage optimization: spill large data to disk
implement on-disk persistence as startup-time optimization: save salsa query graph to disk when exiting, and reuse it upon restart (ideally, validation would be equivalent to calling set and figuring out that nothing has changed. That is, we could use old crate graph even we haven't completely validated it)
implement on-disk persistence as a build-system-aware way to use precompiled libraries. That is, in the distributed build scenario, rust-analyzer would ask the build-system to provide precompiled artifacts for dependencies.

The last case is the hardest, and the most interesting one.

It is hard because it makes persistence a public API: on disk data is no-longer a private impl detail, but a shared state between rust-analyzer and the build-system. It is another input, like file text or procedural macros.

It is the most interesting, because it makes rust-analyzer scale: it becomes possible to distribute the computation of such pre-analyzer libraries across several machines and to put the results into a distributed cache, re-used by many instances of rust-analyzer.

It seems we want the following litmus test for implementing persistence: the on-disk cache can be computed by a different machine (which runs a different OS) and be used locally.

Implementation wise, it's pretty clear that the cache should be computed on per-crate granularity. Some less-obvious questions:

should rust-analyzer use the cache as is (mmap it basiccally), or should it parse it into salsa's internal data structures?
should the cache be a separate flavor of input, or a way to cache existing inputs? Would we have code paths like if has_cache { from_cache } else { from_source } or would that be a unified code path
can be just store everything in cache? We can store, eg, original source files, which makes the same code-path logic work.

flodiebold · 2021-08-16T10:56:21Z

I'd note that the file format for case 3 doesn't need to be the same as our internal cache format for cases 1 and 2 -- we could have e.g. rlibs as a possible input while caching the salsa database in a different format.

flodiebold · 2021-08-16T11:17:27Z

should rust-analyzer use the cache as is (mmap it basiccally), or should it parse it into salsa's internal data structures?

I think it'd be super interesting to use rkyv and mmap it, but maybe it's overengineering 😅

lnicola · 2021-08-16T11:46:09Z

Some additional questions:

how nested is the data we're now storing in salsa? Table/relational data is easier to work with, but e.g. syntax trees will pose a problem.
how many salsa queries are we doing during a request, as an order of magnitude? Tens of thousands might require some smart caching.
would it be feasible (long-term) to hook into the salsa storage mechanism?
can we replace it completely, or do we store the same data both in salsa and on disk?

deontologician · 2021-08-25T16:46:50Z

It is hard because it makes persistence a public API: on disk data is no-longer a private impl detail, but a shared state between rust-analyzer and the build-system. It is another input, like file text or procedural macros.

There are lots of different compatibility contracts, such as "cache inputs are best effort. If rust-analyzer can't use it, it will recompute everything from scratch". That would also heavily imply not just doing mmap and trusting it to be correct, but validating the cache and failing back to the cold-cache path if it isn't compatible.

So concretely, in the build-system scenario, the inputs to each crate would be like:

rust-analyzer cache crate_b/src --existing-cache caches/crate_b_cache --is-lib=true > caches/crate_b_cache

ram19890 · 2021-09-06T13:05:26Z

Can we have an option to toggle(ON/OFF) sync, to turn off "fetching & caching" when an editor is opened while the cached data can be used from Memory or Disk, while it cached for the first time?? If the user wished the toggle to be enabled, let them have a persistence for the number of .rs files he/she has opened exceeded when he/she triggers it more than thrice or any number, and then it would sync automatically! (Setting a limit for the number of open files to trigger the sync! Default: "nolimit" )

There might be no use of daemon to run all the time!

Or Something simulate to "Android Project Treble" Like! (For Stability + Consistency)

lnicola · 2021-09-06T13:10:02Z

@ram19890 there's no persistent caching at all right now. If and when it is implemented, it's going to be possible to delete the persistent cache, but it's too early to tell if the cache is going to be optional or not.

There's also no daemon at all. That was a suggestion for Vim users who keep closing their editor and don't want to change their workflow. A daemon like that could probably be implemented outside of RA, but it's not the real solution.

matklad · 2021-09-14T09:01:45Z

Couple of thoughts here:

one unusual use-case here is that some people use .rlib as a way to distribute proprietary code. Such use-cases currently can't benefit from rust-analyzer (no source code available), but they could in theory use our own index format (if we actually erase method bodies)
there's a certain charm in just using rlibs -- that makes plugging rust-analyzer into existing build system easier. It's also true that rlibs are an end-game here -- it would be silly if compiler and IDE needed two separate "compiled library" formats. But using rlibs makes a deliberately unstable part of rust somewhat more stable, and there will be extra uncertainty as to who should be the emmitter of rlibs -- compiler or rustc. We'll also want to put extra things in rlibs (parmeter names), so 🤷

bjorn3 · 2021-09-14T09:09:52Z

one unusual use-case here is that some people use .rlib as a way to distribute proprietary code.

rlibs leak filenames, doc comments for private items, the position of every item in the source file, the name of every function and type even if private and much more. I wouldn't be surprised if you could decompile them to something reasonably resembling the original source without a terribly huge amount of effort.

bjorn3 · 2021-09-14T09:15:10Z

parmeter names

-Zalways-encode-mir and the MIR local debuginfo got you covered.

lnicola · 2021-09-14T09:18:17Z

I think that people who want to distribute closed-source libraries would be better served by going through a C API. It's more work and it's boring, but you get interop with other every language under the Sun.

pr2502 · 2022-01-29T18:23:03Z

We definitely won't implement persistent process withing rust-analyzer itself -- it is indeed a job for the editor. However, I think for editors like vim someone could write a separate rust-binary, rust-analyzer-supervisor, which would cache the connection to the ra.

I've written something like this, it's a binary that replaces rust-analyzer in your editor and pipes the input/output through a local tcp socket to a server which persists one rust-analyzer instance per workspace and works around LSP limitations to keep the important functionality while supporting multiple clients (vim editor instances) on a single rust-analyzer instance, it also persists the rust-analyzer process for a while when all clients are closed until a timeout runs out.

Repo is here https://github.com/pr2502/ra-multiplex, it's still work in progress but it is usable for me with neovim and coc-rust-analyzer.

Pinging users who asked for a feature like this, sorry for spam if you don't need it anymore @montekki @tjkirch @flodiebold

jackos · 2022-06-02T08:27:19Z

@pr2502 Can't tell you how much I appreciate https://github.com/pr2502/ra-multiplex, it works great, and having a dedicated terminal with rust-analyzer debug messages is an added bonus.

Just to confirm for anyone using helix that stumbles on this thread you can put this in your ~/.config/helix/langauges.toml:

[[language]]
name = "rust"
...
language-server = { command = "ra-multiplex" }

This is a really straightforward and a fantastic feature, worthy of consideration in adding it as a subcommand to rust-analyzer imo.

melMass · 2023-03-07T18:52:24Z

I've written something like this, it's a binary that replaces rust-analyzer in your editor and pipes the input/output through a local tcp socket

Thanks a lot, it seems to also work well with the VSCode extension:

{
"rust-analyzer.server.path": "/Users/user/.cargo/bin/ra-multiplex"
}

I don't want to use it until [this issue](rust-lang/rust-analyzer#4712) is fixed and a persistent cache is implemented. With an LSP this slow to startup, why am I even using vim?

akurniawan · 2024-01-18T14:39:35Z

Hi, are we still looking into this or we're using ra-multiplex to wrap RA now?

davidbarsky · 2024-01-19T21:28:26Z

I think persistent caches for rust-analyzer are still a nice-to-have that require a lot of design work before they're implemented. In the meantime, I recommend using ra-multiplex.

pandres95 · 2024-02-01T16:40:41Z

@pr2502, do you think ra-multiplexer would help me with keeping the indexing cache of large repositories that hold thousands of crates (i.e. projects that use polkadot-sdk) on VSCode?

I know it's a weird question, but I've been looking a solution for weeks as the number of deps in the project just keeps increasing, and it's hard enough not having a good solution to keep the cache running longer, especially when sometimes I'm opening multiple editor windows at the same time and multiple files in the same editor window.

pr2502 · 2024-02-02T12:47:13Z

it does work with vscode but it'll also make your life harder in other ways, the file watch events from clients (editors) are not propagated (yet), which means you have to manually reload the workspace (ra-multiplex reload) when adding/removing files or when Cargo.lock changes. you might end up with even more load if you change your project structure often.

bjorn3 mentioned this issue Jun 28, 2020

startup analysis process took 5 ~ 10 seconds #5109

Closed

matklad added E-hard fun A technically challenging issue with high impact S-unactionable Issue requires feedback, design decisions or is blocked on other work labels Oct 15, 2020

This was referenced Jun 9, 2021

Any plan about caching on disk to speed up the loading upon initialization? #9188

Closed

rust-analyzer takes too much time to load a project #9271

Closed

rwols mentioned this issue Jun 26, 2021

Is it possible to cache the index results? sublimelsp/LSP-rust-analyzer#22

Open

bjorn3 mentioned this issue Jul 27, 2021

Rust analyzer is extremely heavyweight by default #9704

Closed

lnicola mentioned this issue Aug 22, 2021

Indexing on every startup #9991

Closed

lnicola mentioned this issue Oct 1, 2021

Manually trigger indexing #10409

Closed

lnicola mentioned this issue Jan 21, 2022

Why does Rust Analyzer use so much RAM and CPU? #11325

Closed

bjorn3 mentioned this issue Mar 27, 2022

Tips for fast startup evcxr/evcxr#218

Open

bjorn3 mentioned this issue Mar 6, 2023

rust analyzer take too long to lunch #14258

Open

lnicola mentioned this issue Apr 14, 2023

Build index from command line? #14566

Closed

Veykril added the A-perf performance issues label Aug 29, 2023

rwols mentioned this issue Oct 10, 2023

Keep server running without a related source file open? sublimelsp/LSP#2387

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Persistent caches #4712

[Feature Request]: Persistent caches #4712

montekki commented Jun 2, 2020

bjorn3 commented Jun 2, 2020

tjkirch commented Oct 13, 2020

bjorn3 commented Oct 14, 2020

tjkirch commented Oct 14, 2020

bjorn3 commented Oct 14, 2020

flodiebold commented Oct 14, 2020

matklad commented Oct 14, 2020

rwols commented Jun 26, 2021

bjorn3 commented Jun 26, 2021 •

edited

Loading

lnicola commented Jun 29, 2021

matklad commented Aug 16, 2021

flodiebold commented Aug 16, 2021

flodiebold commented Aug 16, 2021

lnicola commented Aug 16, 2021

deontologician commented Aug 25, 2021

ram19890 commented Sep 6, 2021

lnicola commented Sep 6, 2021

matklad commented Sep 14, 2021

bjorn3 commented Sep 14, 2021

bjorn3 commented Sep 14, 2021

lnicola commented Sep 14, 2021 •

edited

Loading

pr2502 commented Jan 29, 2022

jackos commented Jun 2, 2022 •

edited

Loading

melMass commented Mar 7, 2023

akurniawan commented Jan 18, 2024

davidbarsky commented Jan 19, 2024

pandres95 commented Feb 1, 2024

pr2502 commented Feb 2, 2024

[Feature Request]: Persistent caches #4712

[Feature Request]: Persistent caches #4712

Comments

montekki commented Jun 2, 2020

bjorn3 commented Jun 2, 2020

tjkirch commented Oct 13, 2020

bjorn3 commented Oct 14, 2020

tjkirch commented Oct 14, 2020

bjorn3 commented Oct 14, 2020

flodiebold commented Oct 14, 2020

matklad commented Oct 14, 2020

rwols commented Jun 26, 2021

bjorn3 commented Jun 26, 2021 • edited Loading

lnicola commented Jun 29, 2021

matklad commented Aug 16, 2021

flodiebold commented Aug 16, 2021

flodiebold commented Aug 16, 2021

lnicola commented Aug 16, 2021

deontologician commented Aug 25, 2021

ram19890 commented Sep 6, 2021

lnicola commented Sep 6, 2021

matklad commented Sep 14, 2021

bjorn3 commented Sep 14, 2021

bjorn3 commented Sep 14, 2021

lnicola commented Sep 14, 2021 • edited Loading

pr2502 commented Jan 29, 2022

jackos commented Jun 2, 2022 • edited Loading

melMass commented Mar 7, 2023

akurniawan commented Jan 18, 2024

davidbarsky commented Jan 19, 2024

pandres95 commented Feb 1, 2024

pr2502 commented Feb 2, 2024

bjorn3 commented Jun 26, 2021 •

edited

Loading

lnicola commented Sep 14, 2021 •

edited

Loading

jackos commented Jun 2, 2022 •

edited

Loading