Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookups into HashMap<Atom, ...> have to create needless Atoms #25699

Open
pshaughn opened this issue Feb 6, 2020 · 7 comments
Open

Lookups into HashMap<Atom, ...> have to create needless Atoms #25699

pshaughn opened this issue Feb 6, 2020 · 7 comments

Comments

@pshaughn
Copy link
Member

@pshaughn pshaughn commented Feb 6, 2020

There's a comment pointing out this matter in:

impl StylePropertyMapReadOnlyMethods for StylePropertyMapReadOnly {
/// <https://drafts.css-houdini.org/css-typed-om-1/#dom-stylepropertymapreadonly-get>
fn Get(&self, property: DOMString) -> Option<DomRoot<CSSStyleValue>> {
// TODO: avoid constructing an Atom
self.entries
.get(&Atom::from(property))
.map(|value| DomRoot::from_ref(&**value))
}
/// <https://drafts.css-houdini.org/css-typed-om-1/#dom-stylepropertymapreadonly-has>
fn Has(&self, property: DOMString) -> bool {
// TODO: avoid constructing an Atom
self.entries.contains_key(&Atom::from(property))
}

but it's true in most or all places we have a HashMap with Atom keys.

What we should do instead is first ask "Is this string in the atom cache?", since if it's not then there's obviously no way for it to be a key of the HashMap.

Does string_cache already support asking that question?

@pshaughn
Copy link
Member Author

@pshaughn pshaughn commented Feb 6, 2020

If there's not already a way to do it, I'd suggest something like Atom::from_if_exists(&str) -> Option<Atom>.

@jdm jdm added the I-perf-slow label Feb 10, 2020
@jdm
Copy link
Member

@jdm jdm commented Feb 10, 2020

There is no way to determine if there's a string in the atom cache, as far as I can tell. The question is also not quite as clear to answer because of the presence of inline atoms, as shown in https://github.com/servo/string-cache/blob/af2c7707e797768660b3db90066b80218dbca6f7/src/atom.rs#L171-L205.

@jdm jdm added the I-papercut label Feb 10, 2020
@pshaughn
Copy link
Member Author

@pshaughn pshaughn commented Feb 10, 2020

I see. Very short strings get packed verbatim instead of converted to hashes, so there's no actual cache data structure including them. There's no static allocation there, so always creating those short atoms isn't a problem the way creating long ones could be. I could see optimizing with a method that returns Some(atom) if it's either inline or in-cache and None if it's neither, but I can't think of what to name that method or how to document its semantics in a way that isn't exposing too much internal state.

@pshaughn
Copy link
Member Author

@pshaughn pshaughn commented Feb 12, 2020

A way to encapsulate this idea without exposing which strings are inlined could be to make string-cache itself define a struct like AtomHashMap<T> that wraps an internal HashMap<Atom, T> but also has the traits for HashMap<String, T> and does this existence check in that getter.

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Feb 12, 2020

We could add Atom::something(&str) -> Result<Atom, ()> whose semantics would be: if Err is returned, there was at least one point in time during the duration of the call where there was no other Atom alive in the process with that string value. It may have been created since.

(The implementation would be the same as Atom::from(&str) for inline and static atoms, and for dynamic atoms it would do a hash map lookup without inserting. This still involves acquiring the global Mutex, though in principle we could make it a RwLock.)

These semantics are weak, but sufficient to skip a lookup in HashMap<Atom, _> if the map is known not to have concurrent mutation from other threads. (std::collections::HashMap does not have interior mutability, so holding either &mut HashMap or &HashMap is enough to ensure this.)

However this feels very much like a micro-optimization. Do the creation and destruction of dynamic atoms show up while profiling?

@pshaughn
Copy link
Member Author

@pshaughn pshaughn commented Feb 13, 2020

I doubt it; the thing that I'm thinking about is that someone might be running some script in a loop that does a large number of strings and then we're carrying those interned atoms around in RAM forever.

I don't disagree with the papercut label.

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Feb 13, 2020

Dynamic atoms are reference counted, so they won’t stay in RAM forever unless something keeps using them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.