Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose the precomputed hash using a trait so that I can use it from rust-selectors. #183

Merged
merged 1 commit into from
Apr 6, 2017

Conversation

emilio
Copy link
Member

@emilio emilio commented Mar 21, 2017

This allows us to get rid of the extra hashing overhead every time we check the
bloom filter.


This change is Reviewable

@emilio
Copy link
Member Author

emilio commented Mar 21, 2017

r? @nox

I'm not super-happy about this, but it's the best way I found to keep selectors away from depending on string-cache.

@SimonSapin
Copy link
Member

This is not a pre-computed hash. get_hash returns ((self.unsafe_data >> 32) ^ self.unsafe_data) as u32.

@SimonSapin
Copy link
Member

unsafe_data is a u64 that contains either:

  • A pointer address, for dynamic atoms
  • 1_u8 in the lowest byte and 7 bytes of zero-padded inline string, for inline atoms
  • 2_u32 in the lower half and an array index (so typically small, with many zero upper bits) in the upper half, for static atoms

In each case, neither unsafe_data itself nor both its halves XORed together is a particularly good hash.

Creating an atom involves hashing the string with SipHasher. For static and dynamic atoms we could keep that hash in respective tables and make get_hash return it. For inline atoms however there is no such table (which is the point).

@emilio
Copy link
Member Author

emilio commented Mar 22, 2017

How would you feel about using the upper bytes for the inline atoms, and storing the hash for the non-inline ones?

@bzbarsky
Copy link

bzbarsky commented Apr 4, 2017

@SimonSapin Do you mind answering @emilio's question? This is blocking servo/servo#16070 which is blocking the investigation in https://bugzilla.mozilla.org/show_bug.cgi?id=1348935 so it would be good to get some motion here...

@SimonSapin
Copy link
Member

Ah, sorry I let this slip through.

Yes, go ahead. For inline atoms that doesn’t sound like a great hash, but maybe that’s ok. (Is even XORing the two halves too costly?)

@emilio
Copy link
Member Author

emilio commented Apr 6, 2017

Ammended that way, and yeah, for inline atoms xoring should be fine.

r? @SimonSapin

@emilio
Copy link
Member Author

emilio commented Apr 6, 2017

Oh, and if the question re. XORing was wrt the current bloom filter. The answer is that right now we're sip-hashing the hash all the time, which is what this PR is trying to avoid.

@SimonSapin
Copy link
Member

Looks good overall. Some details:


Reviewed 5 of 5 files at r1.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


src/atom.rs, line 180 at r1 (raw file):

    pub disps: &'static [(u32, u32)],
    pub atoms: &'static [&'static str],
    pub hashes: &'static [u64],

Maybe make this [u32], pre-computing the XOR at compile-time and keeping the table smaller?


src/atom.rs, line 193 at r1 (raw file):

            disps: &[(0, 0)],
            atoms: &[""],
            hashes: &[0],

Maybe make this &[0x20dd4eb33d9590f2]? (This is what SipHasher::new() returns for me on the empty string.) Or 0x3ddddef3 pre-XORed to 32-bit. Though maybe it doesn’t matter…


src/atom.rs, line 271 at r1 (raw file):

    #[inline]
    fn hash<H>(&self, state: &mut H) where H: Hasher {
        state.write_u32(self.get_hash())

I’m not sure about this change. If this is going through another hasher anyway there is not much use re-using the pre-computed hash. And giving 32 bits to that hasher instead of 64 bits seems slightly worse, but maybe that also doesn’t matter?


Comments from Reviewable

…ust-selectors.

This allows us to get rid of the extra hashing overhead every time we check the
bloom filter.
@emilio
Copy link
Member Author

emilio commented Apr 6, 2017

Review status: all files reviewed at latest revision, 3 unresolved discussions.


src/atom.rs, line 180 at r1 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Maybe make this [u32], pre-computing the XOR at compile-time and keeping the table smaller?

Done.


src/atom.rs, line 193 at r1 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Maybe make this &[0x20dd4eb33d9590f2]? (This is what SipHasher::new() returns for me on the empty string.) Or 0x3ddddef3 pre-XORed to 32-bit. Though maybe it doesn’t matter…

Sounds good, I did that. I don't think it matters fwiw, but...


src/atom.rs, line 271 at r1 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

I’m not sure about this change. If this is going through another hasher anyway there is not much use re-using the pre-computed hash. And giving 32 bits to that hasher instead of 64 bits seems slightly worse, but maybe that also doesn’t matter?

Ok, reverted. Also it's unclear to me whether it really matters, but sounds fine.


Comments from Reviewable

@SimonSapin
Copy link
Member

Thanks!

@bors-servo r+


Reviewed 2 of 2 files at r2.
Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@emilio
Copy link
Member Author

emilio commented Apr 6, 2017

@bors-servo r=SimonSapin

@emilio
Copy link
Member Author

emilio commented Apr 6, 2017

Oh, great, bors is dead?

@SimonSapin
Copy link
Member

@bors-servo r+

Clicked "Synchronize" on http://build.servo.org/homu/queue/string-cache

@bors-servo
Copy link
Contributor

📌 Commit f7ce843 has been approved by SimonSapin

@bors-servo
Copy link
Contributor

⌛ Testing commit f7ce843 with merge a35578c...

bors-servo pushed a commit that referenced this pull request Apr 6, 2017
Expose the precomputed hash using a trait so that I can use it from rust-selectors.

This allows us to get rid of the extra hashing overhead every time we check the
bloom filter.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/string-cache/183)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

☀️ Test successful - status-travis
Approved by: SimonSapin
Pushing a35578c to master...

@bors-servo bors-servo merged commit f7ce843 into servo:master Apr 6, 2017
@emilio emilio deleted the precomputed-hash branch April 6, 2017 20:33
@emilio emilio mentioned this pull request Apr 6, 2017
bors-servo pushed a commit to servo/servo that referenced this pull request Apr 7, 2017
selectors: Get rid of hashing overhead using the precomputed hash atoms have.

I realized of this when @bzbarsky  mentioned the bloom filter in https://bugzilla.mozilla.org/show_bug.cgi?id=1348935#c7.

Right now we hash (the hash) all the time, when we can do better.

This requires a change in string-cache, which is at servo/string-cache#183.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/16070)
<!-- Reviewable:end -->
bors-servo pushed a commit to servo/servo that referenced this pull request Apr 7, 2017
selectors: Get rid of hashing overhead using the precomputed hash atoms have.

I realized of this when @bzbarsky  mentioned the bloom filter in https://bugzilla.mozilla.org/show_bug.cgi?id=1348935#c7.

Right now we hash (the hash) all the time, when we can do better.

This requires a change in string-cache, which is at servo/string-cache#183.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/16070)
<!-- Reviewable:end -->
bors-servo pushed a commit to servo/servo that referenced this pull request Apr 8, 2017
selectors: Get rid of hashing overhead using the precomputed hash atoms have.

I realized of this when @bzbarsky  mentioned the bloom filter in https://bugzilla.mozilla.org/show_bug.cgi?id=1348935#c7.

Right now we hash (the hash) all the time, when we can do better.

This requires a change in string-cache, which is at servo/string-cache#183.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/16070)
<!-- Reviewable:end -->
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Apr 8, 2017
…e precomputed hash atoms have (from emilio:selectors-bloom-hash-less); r=bholley

I realized of this when @bzbarsky  mentioned the bloom filter in https://bugzilla.mozilla.org/show_bug.cgi?id=1348935#c7.

Right now we hash (the hash) all the time, when we can do better.

This requires a change in string-cache, which is at servo/string-cache#183.

Source-Repo: https://github.com/servo/servo
Source-Revision: a811776df478a009ac6a7785ad13684e3a8e0925

--HG--
extra : subtree_source : https%3A//hg.mozilla.org/projects/converted-servo-linear
extra : subtree_revision : a84fb1b7a5ccab19ecfa8bb57499719f881fd1c1
Manishearth pushed a commit to Manishearth/gecko-dev that referenced this pull request Apr 11, 2017
…e precomputed hash atoms have (from emilio:selectors-bloom-hash-less); r=bholley

I realized of this when @bzbarsky  mentioned the bloom filter in https://bugzilla.mozilla.org/show_bug.cgi?id=1348935#c7.

Right now we hash (the hash) all the time, when we can do better.

This requires a change in string-cache, which is at servo/string-cache#183.

Source-Repo: https://github.com/servo/servo
Source-Revision: a811776df478a009ac6a7785ad13684e3a8e0925
JerryShih pushed a commit to JerryShih/gecko-dev that referenced this pull request Apr 12, 2017
…e precomputed hash atoms have (from emilio:selectors-bloom-hash-less); r=bholley

I realized of this when @bzbarsky  mentioned the bloom filter in https://bugzilla.mozilla.org/show_bug.cgi?id=1348935#c7.

Right now we hash (the hash) all the time, when we can do better.

This requires a change in string-cache, which is at servo/string-cache#183.

Source-Repo: https://github.com/servo/servo
Source-Revision: a811776df478a009ac6a7785ad13684e3a8e0925
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Oct 1, 2019
…e precomputed hash atoms have (from emilio:selectors-bloom-hash-less); r=bholley

I realized of this when bzbarsky  mentioned the bloom filter in https://bugzilla.mozilla.org/show_bug.cgi?id=1348935#c7.

Right now we hash (the hash) all the time, when we can do better.

This requires a change in string-cache, which is at servo/string-cache#183.

Source-Repo: https://github.com/servo/servo
Source-Revision: a811776df478a009ac6a7785ad13684e3a8e0925

UltraBlame original commit: 61af2690f34f238d0ce4f0b2a808f5d530581f76
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Oct 1, 2019
…e precomputed hash atoms have (from emilio:selectors-bloom-hash-less); r=bholley

I realized of this when bzbarsky  mentioned the bloom filter in https://bugzilla.mozilla.org/show_bug.cgi?id=1348935#c7.

Right now we hash (the hash) all the time, when we can do better.

This requires a change in string-cache, which is at servo/string-cache#183.

Source-Repo: https://github.com/servo/servo
Source-Revision: a811776df478a009ac6a7785ad13684e3a8e0925

UltraBlame original commit: 61af2690f34f238d0ce4f0b2a808f5d530581f76
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Oct 1, 2019
…e precomputed hash atoms have (from emilio:selectors-bloom-hash-less); r=bholley

I realized of this when bzbarsky  mentioned the bloom filter in https://bugzilla.mozilla.org/show_bug.cgi?id=1348935#c7.

Right now we hash (the hash) all the time, when we can do better.

This requires a change in string-cache, which is at servo/string-cache#183.

Source-Repo: https://github.com/servo/servo
Source-Revision: a811776df478a009ac6a7785ad13684e3a8e0925

UltraBlame original commit: 61af2690f34f238d0ce4f0b2a808f5d530581f76
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants