Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atoms: hash table lookup without creating an Atom? #17375

Closed
asajeffrey opened this issue Jun 16, 2017 · 11 comments
Closed

Atoms: hash table lookup without creating an Atom? #17375

asajeffrey opened this issue Jun 16, 2017 · 11 comments
Assignees
Labels

Comments

@asajeffrey
Copy link
Member

@asajeffrey asajeffrey commented Jun 16, 2017

If we have a HashMap<Atom, Foo> it would be nice if we could look up a string without first converting it to an Atom. The best way to do this would be to implement Borrow<str> for Atom, but this requires the hash functions for Atom and &strto agree.

@asajeffrey asajeffrey self-assigned this Jun 16, 2017
@asajeffrey
Copy link
Member Author

@asajeffrey asajeffrey commented Jun 16, 2017

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Jun 16, 2017

The Hash impls for Atom and &str are not compatible. In string-cache the former hashes the u64 representation of Atom, which might be a pointer, an index in a static set, or inline data. In Stylo a u32 (which is already the result of a C++ hashing function) is hashed.

With Stylo we could add FFI bindings to the hashing function (though taking &[u16], not `&str), but string-cache’s current hashing pretty much requires atomizing.

@asajeffrey
Copy link
Member Author

@asajeffrey asajeffrey commented Jun 16, 2017

Is there any way to get them to agree? e.g. by storing the hash value in with the string content, and returning that as the hash? (This does mean an extra pointer indirection when hashing atoms, so may not be worth it.)

If we can't get Atom to implement Borrow<str> can we at least get it to do the lookup without allocating in the case that the atom doesn't exist? A primitive something like Atom::if_exists(&str) -> Option<Atom> which returns the Atom if it exists, and otherwise returns None would do.

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Jun 17, 2017

Sorry, my previous answer was outdated. In the case of static and dynamic atoms we do store a hash based on the string’s bytes since servo/string-cache#183. However for inline atoms there is by definition no data structure on the side where we could store it.

As to if_exists, we can make it not insert in the process-global hash table of dynamic atoms, but for length <= 7 there is no way to know if another inline Atom already exists somewhere with the same value. Similarly we can check if something is in the static table, but not if some runtime value is actually using it at a given moment. So at best I’d rather name it something else.

(This was all about https://github.com/servo/string-cache, I’m less familiar with Gecko atoms.)

@asajeffrey
Copy link
Member Author

@asajeffrey asajeffrey commented Jun 17, 2017

Ah, if static and dynamic atoms are already storing the hash, then we could just hash inline atoms based on their string content and get the Atom hash to agree with the &str hash? This would allow Atom to implement Borrow<str>.

@asajeffrey
Copy link
Member Author

@asajeffrey asajeffrey commented Jun 17, 2017

Yes, gecko atoms might be another story :(

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Jun 17, 2017

Given that the hasher is a type parameter of the hash method, I don’t know if it’s possible to implement Hash with a pre-computed hash (which is necessarily for a given hasher) and have something compatible with <str as Hash>::hash.

Also, the exact behavior of <str as Hash>::hash has changed several times in the history of Rust. (It’s now hashing the UTF-8 bytes, then 0xFF. At some point it was hashing the length as usize, then the UTF-8 bytes.) I don’t know if we can count on it to be stable going forward.

@asajeffrey
Copy link
Member Author

@asajeffrey asajeffrey commented Jun 17, 2017

Oh good point, there are multiple hash functions. We'd have to implement hashing an Atom by hashing its string content, which isn't very efficient.

Not sure what to name the function which returns an Atom. Perhaps Atom::maybe_from(&str)? Or Atom:: from_without_alloc(&str)?

@nox
Copy link
Member

@nox nox commented Oct 9, 2017

@SimonSapin Is this still something we want to do?

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Oct 9, 2017

I believe making hashing of Atom and str compatible is not possible without sacrificing other nice properties of Atom, so we’re likely not going to.

@SimonSapin SimonSapin closed this Oct 9, 2017
@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Oct 9, 2017

Somewhat relatedly, we also have PrecomputedHashMap that uses a specific Hasher to make hashing cheaper with Atom keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.