Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate caching of static strings #1386

Closed
RReverser opened this issue Mar 22, 2019 · 18 comments
Closed

Investigate caching of static strings #1386

RReverser opened this issue Mar 22, 2019 · 18 comments

Comments

@RReverser
Copy link
Member

Motivation

Long ago I noticed that currently even constant strings are routed through entire encoding/decoding process on each pass between Rust and JS, becoming unnecessarily expensive part of the resulting code.

Now I've noticed that this affects even internal functions which need intermediate strings, thus affecting even more potentially hot paths.

E.g. this is a profile from my WASM library that heavily relies on JS iteration - js_sys::try_iter - which, in turn, recreated same constant string for the iteration protocol over and over.

image

We can fix this particular function but, ideally, we should have a more generic solution for constant strings.

Proposed Solution

Either:

  1. Add a step to the postprocessor to walk over the generated WASM, detect calls to JsValue::from_str with pointers to a data section and statically extract these into JS.
  2. Do the same in runtime by using a cache backed by lazy_static! (but then we need to somehow detect whether a string is in the data section, which is a bit harder in runtime but not impossible).
  3. Add some sort of a proc-macro js_const_str!("...") that would extract these statically like (1), although could make it easier to detect by being more expicit.

All of the proposed solutions require choosing the right balance between complexity of the solution itself vs complexity of usage from API point of view (we don't want to add a function that most users would forget to use).

@RReverser
Copy link
Member Author

I specialised try_iter in my code to try Array::is_array + unchecked_ref::<Array>().values().into_iter() first, and the whole library benchmark got 5x faster 😱

I wonder how many more cases are affected by these hidden costs.

@alexcrichton
Copy link
Contributor

Thanks for the report! Could you expand a bit on how try_iter is connected to static strings in this case? Those seem to be naively unrelated, but hey 5x improvements are always nice

@RReverser
Copy link
Member Author

RReverser commented Mar 22, 2019

I was surprised about that connection showing up in profile as well, but from looking at the source it might be related to JsValue::from("next") being created each time you iterate over a new value?

Note that I'm mostly iterating over small tuple-like arrays (but need this code in a generic context to accept arbitrary iterables), so this overhead could well be the reason.

@alexcrichton
Copy link
Contributor

Oh I think that's fixable with a change like:

diff --git a/crates/js-sys/src/lib.rs b/crates/js-sys/src/lib.rs
index 0a51ec7d..05a5a656 100644
--- a/crates/js-sys/src/lib.rs
+++ b/crates/js-sys/src/lib.rs
@@ -1107,14 +1107,20 @@ pub fn try_iter(val: &JsValue) -> Result<Option<IntoIter>, JsValue> {
         return Ok(None);
     }
 
+    #[wasm_bindgen]
+    extern "C" {
+        type MaybeIterator;
+        #[wasm_bindgen(getter, method)]
+        fn next(this: &MaybeIterator) -> JsValue;
+    }
+
     let iter_fn: Function = iter_fn.unchecked_into();
     let it = iter_fn.call0(val)?;
     if !it.is_object() {
         return Ok(None);
     }
 
-    let next = JsValue::from("next");
-    let next = Reflect::get(&it, &next)?;
+    let next = it.unchecked_ref::<MaybeIterator>().next();
 
     Ok(if next.is_function() {

Want to test that out and see if it works for you?

@RReverser
Copy link
Member Author

Haha, that's literally what I did locally too, even the name of MaybeIterator matches :)

But, as mentioned above, I'm rather interested in a more generic solution to needless copies of constant strings, and decided to open this issue for that discussion/tracking.

@RReverser
Copy link
Member Author

RReverser commented Mar 26, 2019

Btw, strangely, replacing that doesn't make nearly as big difference as checking for is_array and using .values(). I wonder if there's another hidden string somewhere...

~~UPD: __wbindgen_string_new still shows up taking ~60% of total time on microbenchmarks involving small arrays, so apparently there is.~~

UPD: Ah no, sorry, I just didn't add [patch.crates-io] to this mini benchmark to point to local wasm-bindgen.

RReverser added a commit to RReverser/wasm-bindgen that referenced this issue Mar 26, 2019
This allows to significantly speed up iteration over small collections, where string encoding is the primary overhead.

Related to rustwasm#1386, but works around only this partial case.
@fitzgen
Copy link
Member

fitzgen commented Mar 29, 2019

I think it is worth providing string interning infrastructure.

This shouldn't require anything special from wasm-bindgen, such as JS glue codegen support.

Therefore, the question is raised of whether it should be

  • part of wasm-bindgen proper (which would allow adding inherent methods on JsValue),
  • part of a wasm-bindgen-strings crate,
  • or a separate third-party crate (part of Gloo? just a standalone crate that anyone could use?)

I'm not convinced that we need to do the interning at compile time with proc-macros; I think we can dynamically intern at runtime via fn intern(s: &'static str) -> &'static JsString or something like that. I don't think the overhead will be noticeable (although adding compile-time interning with proc-macros post facto should also be semver compatible, if we find that we do actually need it).

@RReverser
Copy link
Member Author

I tried several approaches so far:

  • Interning constant strings by finding them with walrus - this would need to be part of the cli-support, and is more complicated, but also more benefitial for end users.

  • Interning by using thread-local HashMap<&'static str, JsValue> - this is feasible today, but has relatively high runtime cost even for re-checking.

  • Interning by using a macro like

    macro_rules! js_string {
      ($s:literal) => {{
        thread_local! {
          static VALUE: JsValue = JsValue::from_str($s);
        }
        VALUE.with(Clone::clone)
      }};
    }

    this is the simplest to implement and has relatively low runtime cost compared to a HashMap, but requires explicit opt-in from the user side and doesn't work for inlined strings, only for literals.

@alexcrichton
Copy link
Contributor

I think I agree with @fitzgen here that this is best done in a separate crate for now rather than in wasm-bindgen itself, so I'm going to go ahead and close this.

@RReverser
Copy link
Member Author

that this is best done in a separate crate for now rather than in wasm-bindgen itself

Hmm, this can be hard to do in a separate crate in a way suggested above, because it needs to rewrite WASM itself in wasm-bindgen compatible manner...

But this is not an easy or short-term task so I guess it's fine to have it closed.

My only beef with closing long-standing issues is that it usually increases likelihood of someone not finding an open issue and raising exactly same suggestion.

@RReverser
Copy link
Member Author

because it needs to rewrite WASM itself in wasm-bindgen compatible manner...

@alexcrichton Do you think this issue could be reopened in light of these details? I can still work on a PR, but it's usually much better to be able to link to an existing issue.

@alexcrichton
Copy link
Contributor

Unless we have compelling data we should do this I'd rather not accumulate a large list of "wish to haves in the far and distant future" issues if we can.

@Pauan
Copy link
Contributor

Pauan commented Jun 21, 2019

I've been investigating (and improving) the performance of dominator, and I ran into this issue. It turns out, strings are really slow. They are orders of magnitude slower than everything else.

They are slow because of all the encoding and copying, and in addition also because it adds a lot of extra garbage collection pressure.

The end result is that right now, Rust apps are doomed to be slower than JS apps, simply because JS apps avoid the huge costs of encoding + GC.

Unfortunately, strings are unavoidable, because so many native web APIs use strings. And WebIDL probably won't help much (at least not in the short term).

I re-implemented over a dozen of the web-sys bindings so that they accept &JsString instead of &str, and then I implemented a very simple intern function which does runtime interning with a BTreeMap. This resulted in dramatically faster performance.

We can do a lot better than that, though. Nobody should need to re-implement all of web-sys just to get acceptable string performance. It gets worse when you consider that libraries like gloo use web-sys, and therefore I had to re-implement gloo as well.

I think we should do all of @RReverser 's ideas: have very fast automatic static caching for &'static str, have automatic runtime caching for &'a str (probably using an LRU cache or similar), and also provide a manual form of caching (such as an intern function).

That will give very good performance for static strings, good performance for non-static strings, and predictable performance for manually interned strings.

wasm-bindgen should be able to do the caching automatically, it just has to add in a call to the intern function for #[wasm_bindgen] functions which accept &str or String.

I'm willing to make a PR for this, if the idea is accepted. Or if you prefer we can go through the RFC process (I can write the RFC).

@alexcrichton
Copy link
Contributor

That all I think is some very compelling data and I'd be inclined to agree!

I think one alternative is to add bindings for both &str and &JsString, but I don't think that's tenable really since we don't want to duplicate everything everywhere. Having a global cache I think is probably a better option here.

One thing I'd ideally like to see is a global cache that's tunable and optional, so we might not have it on by default for the wasm-bindgen crate (via a feature) but then it could be enabled for extra performance if desired (and tuned if needed). @Pauan if you're willing to make a PR I'd be willing to review it :)

@Pauan
Copy link
Contributor

Pauan commented Jun 21, 2019

I think one alternative is to add bindings for both &str and &JsString, but I don't think that's tenable really since we don't want to duplicate everything everywhere.

That was going to be my first suggestion, but I decided against it since as you said it's untenable: gloo would need to provide &JsString bindings as well, and so would any other libraries that depends on web-sys.

The other thing I thought about was to create a new trait, something like AsStr, and then implement it for &str, String, and JsString. Then all of the APIs would take <A: AsStr> instead of &str.

However, that won't work for #[wasm_bindgen] bindings, and it's still very inconvenient for the user, since the user now has to make a choice: 1) they can pass a &str and get bad performance, or 2) they can use intern to manually create a JsString and then pass that. Both options are bad.

One thing I'd ideally like to see is a global cache that's tunable and optional

I absolutely agree.

if you're willing to make a PR I'd be willing to review it :)

Great, I'll get started on that, then!

@RReverser
Copy link
Member Author

Oh I have't noticed that @Pauan has since implemented this in #1612, this is pretty cool!

One nitpick is that there doesn't seem to be any scope for interned strings and no way to free them once they're no longer necessary?

My usecase is any Wasm module that creates classes. In this case it needs to intern strings that are not alive for the entire duration of the module, but rather will be unnecessary as soon as class is freed from the JS side. If my understanding is correct, these currently would never get evicted, but rather keep leaking memory for each class instance.

Would it be worth to expose wasm_bindgen::InternCache structure (already available internally) for this usecase, so that the entire cache would get freed once it's dropped?

@Pauan
Copy link
Contributor

Pauan commented Oct 21, 2019

Would it be worth to expose wasm_bindgen::InternCache structure (already available internally) for this usecase, so that the entire cache would get freed once it's dropped?

I don't like the idea of exposing internal details, but I definitely agree that we should have an API for clearing out items from the cache.

It was omitted from the initial implementation to keep things as simple as possible, but it was always planned to add it later.

@Pauan
Copy link
Contributor

Pauan commented Oct 22, 2019

@RReverser PR is up: #1828

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants