New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intern more kinds of things #6207

Open
Eh2406 opened this Issue Oct 22, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@Eh2406
Contributor

Eh2406 commented Oct 22, 2018

Cargo sometimes makes a lot of copies of data structures. For example the string "serde", was copied for the name of each dependency on that crate, as the name of each version of that package, and for each feature that enables that package. This got fixed by adding a InternedString data tipe that deduplicates the data and leeks it into a &'static reference. We have other data structures which use Arc/Rc to make cloning cheaper.

We should experiment with Interning them to see if it is a Speed or Memory or Ergonomic win.

Off the top of my head:

  • PackageId is already using Arc, is copied and hashed and compared all over the place in hot code.
  • SourceId is already using Arc, has a manual cache for crates.io and will probably be leaked anyway by caching PackageId.
  • semver::Version/semver::VersionReq are used a lot, probably often with the same value, and are bigger structures then they seem.
@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Oct 23, 2018

Note that {Package,Source}Id internally using Arc already was a "poor man's" form of interning early on. I definitely agree we should switch to an InternedString style approach, and in doing so it should be fine to remove the internal Arc

@derekdreery

This comment has been minimized.

Contributor

derekdreery commented Oct 31, 2018

There is interning infrastructure used in the html5ever crate. Might be useful here - you can pre-intern anything you want at compile-time

EDIT the crates are string_cache and string_cache_codegen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment