Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syntax: Optimize some literal parsing #53521

Merged
merged 1 commit into from
Aug 21, 2018

Conversation

alexcrichton
Copy link
Member

Currently in the wasm-bindgen project we have a very very large crate that's
procedurally generated, web-sys. To generate this crate we parse all of a
browser's WebIDL and we then generate bindings for all of the APIs contained
within.

The resulting Rust file is 18MB large (wow!) and currently takes a very long
time to compile in debug mode. On the nightly compiler a debug build takes 90s
for the crate to finish. I was curious what was taking so long and upon
investigating a massive portion of the time was spent in the lit_token
method of the compiler, primarily formatting strings via format!.

Upon some more investigation it looks like the byte_str_lit was allocating an
error message once per byte, causing a very large number of allocations to
happen for large literals, of which wasm-bindgen generates quite a few (some are
MB large).

This commit fixes the issue by lazily allocating the error message, only doing
so if the error message is actually needed (which should be never). As a result,
the debug mode compilation time for our web-sys crate decreased from 90s to
20s, a very nice improvement! (although we've still got some work to do).

@rust-highfive
Copy link
Collaborator

r? @michaelwoerister

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 20, 2018
@alexcrichton
Copy link
Member Author

cc @fitzgen (you're probably interested in this)
cc @nnethercote (and so are you!)

@@ -430,7 +430,7 @@ crate fn lit_token(lit: token::Lit, suf: Option<Symbol>, diag: Option<(Span, &Ha
// There are some valid suffixes for integer and float literals,
// so all the handling is done internally.
token::Integer(s) => (false, integer_lit(&s.as_str(), suf, diag)),
token::Float(s) => (false, float_lit(&s.as_str(), suf, diag)),
token::Float(s) => (false, float_lit(s, suf, diag)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary? It would be nice to keep integer_lit and float_lit's arguments similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

integer_lit could be changed to also take a Symbol instead of a &str.

-> Option<ast::LitKind> {
debug!("float_lit: {:?}, {:?}", s, suffix);
let sym = if s.as_str().contains('_') {
Symbol::intern(&s.as_str().chars().filter(|&c| c != '_').collect::<String>())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that float literals lacking a '_' will not be interned. This is a significant change in behaviour, and one that seems unintentional?

Also, the float_lit and integer_lit changes in this patch are orthogonal to the byte_str_lit changes, and are not mentioned in the commit message. Put them in a separate commit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that float literals lacking a '_' will not be interned.

No, with the patch, the literal is not re-interned. We can keep the original Symbol value if it stays unchanged.

@nnethercote
Copy link
Contributor

Heh, I did the "don't allocate when stripping out underscores" thing for \u{...} chars in char_lit in #50052 :)

@michaelwoerister
Copy link
Member

I agree that the other changes should be mentioned in the commit message or moved to their own commit. Looks great, otherwise! r=me with the comments addressed.

Currently in the `wasm-bindgen` project we have a very very large crate that's
procedurally generated, `web-sys`. To generate this crate we parse all of a
browser's WebIDL and we then generate bindings for all of the APIs contained
within.

The resulting Rust file is 18MB large (wow!) and currently takes a very long
time to compile in debug mode. On the nightly compiler a *debug* build takes 90s
for the crate to finish. I was curious what was taking so long and upon
investigating a *massive* portion of the time was spent in the `lit_token`
method of the compiler, primarily formatting strings via `format!`.

Upon some more investigation it looks like the `byte_str_lit` was allocating an
error message once per byte, causing a very large number of allocations to
happen for large literals, of which wasm-bindgen generates quite a few (some are
MB large).

This commit fixes the issue by lazily allocating the error message, only doing
so if the error message is actually needed (which should be never). As a result,
the debug mode compilation time for our `web-sys` crate decreased from 90s to
20s, a very nice improvement! (although we've still got some work to do).
@alexcrichton
Copy link
Member Author

Ok I've backed out the changes for integer/float literals which I wasn't specifically measuring for, so now it's just the one thing I know for sure benefits quite a lot!

@bors: r=michaelwoerister

@bors
Copy link
Contributor

bors commented Aug 20, 2018

📌 Commit 5bf2ad3 has been approved by michaelwoerister

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 20, 2018
kennytm added a commit to kennytm/rust that referenced this pull request Aug 21, 2018
…michaelwoerister

syntax: Optimize some literal parsing

Currently in the `wasm-bindgen` project we have a very very large crate that's
procedurally generated, `web-sys`. To generate this crate we parse all of a
browser's WebIDL and we then generate bindings for all of the APIs contained
within.

The resulting Rust file is 18MB large (wow!) and currently takes a very long
time to compile in debug mode. On the nightly compiler a *debug* build takes 90s
for the crate to finish. I was curious what was taking so long and upon
investigating a *massive* portion of the time was spent in the `lit_token`
method of the compiler, primarily formatting strings via `format!`.

Upon some more investigation it looks like the `byte_str_lit` was allocating an
error message once per byte, causing a very large number of allocations to
happen for large literals, of which wasm-bindgen generates quite a few (some are
MB large).

This commit fixes the issue by lazily allocating the error message, only doing
so if the error message is actually needed (which should be never). As a result,
the debug mode compilation time for our `web-sys` crate decreased from 90s to
20s, a very nice improvement! (although we've still got some work to do).
bors added a commit that referenced this pull request Aug 21, 2018
Rollup of 17 pull requests

Successful merges:

 - #53030 (Updated RELEASES.md for 1.29.0)
 - #53104 (expand the documentation on the `Unpin` trait)
 - #53213 (Stabilize IP associated constants)
 - #53296 (When closure with no arguments was expected, suggest wrapping)
 - #53329 (Replace usages of ptr::offset with ptr::{add,sub}.)
 - #53363 (add individual docs to `core::num::NonZero*`)
 - #53370 (Stabilize macro_vis_matcher)
 - #53393 (Mark libserialize functions as inline)
 - #53405 (restore the page title after escaping out of a search)
 - #53452 (Change target triple used to check for lldb in build-manifest)
 - #53462 (Document Box::into_raw returns non-null ptr)
 - #53465 (Remove LinkMeta struct)
 - #53492 (update lld submodule to include RISCV patch)
 - #53496 (Fix typos found by codespell.)
 - #53521 (syntax: Optimize some literal parsing)
 - #53540 (Moved issue-53157.rs into src/test/ui/consts/const-eval/)
 - #53551 (Avoid some Place clones.)

Failed merges:

r? @ghost
@bors bors merged commit 5bf2ad3 into rust-lang:master Aug 21, 2018
@alexcrichton alexcrichton deleted the optimize-lit-token branch August 29, 2018 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants