Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Atom generic over the set of static strings #178

Merged
merged 19 commits into from Nov 2, 2016
Merged

Make Atom generic over the set of static strings #178

merged 19 commits into from Nov 2, 2016

Conversation

@SimonSapin
Copy link
Member

SimonSapin commented Oct 27, 2016

The new string_cache_codegen crate is intended to be used by downstream users in their build scripts. Provided a list of static strings, it generates code that defines:

  • A type alias for Atom<_> with the type parameter set,
  • A macro like the former atom! macro.

Based on #136, original work by @aidanhs.

Fixes #22, fixes #136, closes #83.

r? @nox


This change is Reviewable

@SimonSapin SimonSapin force-pushed the static-sets branch from fe9d8f7 to d5b1001 Oct 27, 2016
@aidanhs
Copy link
Contributor

aidanhs commented Oct 27, 2016

Oh boy, I'm excited! No worries about taking so long, I understand it can take a while for the stars to align and get time. Some thoughts:

  1. the readme for string-cache-codegen is no longer present. Although there are docs, I tend to think it's nice to have something in a github readme as well (potentially even at the top level)
  2. the 'test' crate in a subfolder of string-cache-codegen is no longer present. I can't see string-cache-codegen ever breaking, but the test seems like a useful thing to have
  3. good catch on the 'hashing twice' thing. It's been a while since I wrote the code, but I think the trade-off is that you can either hash once and end up storing the strings for identical dynamic atoms twice (because the hash is based off the static atom set hash), or you can hash twice (the second time with the fixed 'dynamic set' key) and be guaranteed to only store each string once. On reflection, given that applications are likely only going to have one kind of atom the latter is not really a big win.

I'll actually try it out a bit later.

@SimonSapin
Copy link
Member Author

SimonSapin commented Oct 27, 2016

Oh, good point about using different hash functions in the one dynamic atom hash table. I had not thought of that.

So now I'm not sure it's the right thing to do :) I'll think on it a bit more.

@SimonSapin SimonSapin force-pushed the static-sets branch from b54d9d9 to 0a81c4d Oct 27, 2016
@SimonSapin
Copy link
Member Author

SimonSapin commented Oct 27, 2016

After chatting with @nox I think we’re gonna keep things as-is now regarding hashing. We can always change it later (without even breaking any public API).

Regarding testing, the string_cache crate’s own build.rs uses string_cache_codegen to generate a TestAtom, which is used in unit tests. Since this matches the expected usage of the codegen crate, I believe a separate testing crate is not needed.

I’ve pushed another commit that extends the README with some usage examples.

@aidanhs
Copy link
Contributor

aidanhs commented Oct 28, 2016

Ok, I've tried it out and it seems to work fine for me and the examples look good to me. It's a little sad it can't use phf generated sets directly, but I can see that's tricky given the desire to want to have direct access to the hash value without recomputing. Maybe someday.

Nothing else from me I don't think 👍

@SimonSapin
Copy link
Member Author

SimonSapin commented Oct 28, 2016

string_cache::PhfSet is a lot like phf::Set<&'static str>, just with more stable internals. (See e.g. this warning.)

I didn’t make it expose any public functionality to keep small the set of API we’ll need to maintain and because I didn’t see a use for it, but it could. What would you like to do with the set, and why?

@aidanhs
Copy link
Contributor

aidanhs commented Oct 31, 2016

It's more a general feeling that it'd be nice to use phf_codegen (and therefore phf_sets) directly rather than recreating them. As you note, given that parts of the api are not considered public it'd be a bit risky to depend on them for functionality. Perhaps the long term solution is to negotiate with phf to expose the bits that would be useful for string-cache, but I wouldn't want that to come before merging this PR.

Copy link
Member

Manishearth left a comment

Overall lgtm. What's the plan for namespaces?

build.rs Outdated
slice
string_cache_codegen::AtomType::new("atom::tests::TestAtom", "test_atom!")
.atoms(&[
"a", "b", "address", "area", "body", "font-weight", "br","html", "head", "id",

This comment has been minimized.

@Manishearth

Manishearth Nov 1, 2016

Member

nit: space after br

This comment has been minimized.

@SimonSapin

SimonSapin Nov 1, 2016

Author Member

Fixed.

#[allow(dead_code)]
mod shared;

use string_cache::Atom;
use string_cache::DefaultAtom;

This comment has been minimized.

@Manishearth

Manishearth Nov 1, 2016

Member

we should probably use DefaultAtom as Atom consistently across the examples

This comment has been minimized.

@SimonSapin

SimonSapin Nov 1, 2016

Author Member

Done.

}
fn from(string_to_add: Cow<'a, str>) -> Self {
let static_set = Static::get();
let hash = phf_shared::hash(&*string_to_add, static_set.key);

This comment has been minimized.

@Manishearth

Manishearth Nov 1, 2016

Member

why not continue to use get_index_or_hash()?

This comment has been minimized.

@SimonSapin

SimonSapin Nov 1, 2016

Author Member

It would have to be a method of a public trait, and thus changing it (if we need to in the future) would be a breaking change.

@SimonSapin SimonSapin force-pushed the static-sets branch from ce286ac to 4528d77 Nov 1, 2016
@SimonSapin
Copy link
Member Author

SimonSapin commented Nov 1, 2016

The plan for namespaces is to have them in a new html5ever-atoms crate in the html5ever repo: https://github.com/servo/html5ever/compare/string-cache-up#diff-04339865c75d55a398b31d0a84d21c56R42

Namespace and Prefix there are both type aliases of Atom<SomeStaticSet> generated with string_cache_codegen. The ns!() macro (which expands to a Namespace value based on the prefix) is generated separately by the build script.

@Manishearth
Copy link
Member

Manishearth commented Nov 1, 2016

r=me , unless @nox wants to have a look

@SimonSapin
Copy link
Member Author

SimonSapin commented Nov 2, 2016

@bors-servo r=Manishearth

@bors-servo
Copy link
Contributor

bors-servo commented Nov 2, 2016

📌 Commit 4528d77 has been approved by Manishearth

@bors-servo
Copy link
Contributor

bors-servo commented Nov 2, 2016

Test exempted - status

@bors-servo bors-servo merged commit 4528d77 into master Nov 2, 2016
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
bors-servo added a commit that referenced this pull request Nov 2, 2016
Make Atom generic over the set of static strings

The new `string_cache_codegen` crate is intended to be used by downstream users in their build scripts. Provided a list of static strings, it generates code that defines:
- A type alias for `Atom<_>` with the type parameter set,
- A macro like the former `atom!` macro.

Based on #136, original work by @aidanhs.

Fixes #22, fixes #136, closes #83.

r? @nox

<!-- Reviewable:start -->
---

This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/string-cache/178)

<!-- Reviewable:end -->
@SimonSapin SimonSapin deleted the static-sets branch Nov 2, 2016
@aidanhs
Copy link
Contributor

aidanhs commented Nov 2, 2016

🎆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.