Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaces rust-wapcaplet with a new interning module in Rust, and supports interning on CSS selector matching #1135

Closed

Conversation

ryanhc
Copy link
Contributor

@ryanhc ryanhc commented Oct 25, 2013

Replaces current rust-wapcaplet with a new interning module written in Rust.
Supports interning on tag, id, class, attr_exists, attr_equal, attr_dash_match.

…ash_match using new interning library written in Rust
@metajack
Copy link
Contributor

I only briefly looked at this, but let's hold off on merging until the new rendering lands.

@SimonSapin
Copy link
Member

Also only a brief look, but here are a few comments:

  • Sorry to bikeshed, but Int in a type name really sounds like "integer" to me, making IntString sound very weird. If you want to keep the name short, I believe Gecko calls this "Atom".
  • Is it possible to have static (compile-time) interned strings? intern_string("foo") seems like it’s doing redundant work at runtime.
  • What’s the task-safety story? The interner seems to be using static mut and unsafe blocks.
  • The interner seems to implement a custom hashing algorithm. Is std::hash not appropriate for this? If custom hashing code is kept, it would be nicer to have it in a separate file/module.

@metajack
Copy link
Contributor

I also had the same reaction to IntString, but we can probably come up with some better naming candidates.

@kmcallister
Copy link
Contributor

The interner seems to implement a custom hashing algorithm... If custom hashing code is kept, it would be nicer to have it in a separate file/module.

Ditto for the hash table itself.

What’s the task-safety story? The interner seems to be using static mut and unsafe blocks.

I want to hear more about this too. I think it could be done safely with the right atomic intrinsics, but that will be tricky, and also suggests further that this should become a generic extra::atomic_hashmap or something.

Relatedly I'm not sure about the design of having each script task invoke interning::init(). It would be better if all the public functions ensure the table is initialized, and do so in a way that can't race and create two tables. I think the overhead should be minimal.

Also interning.rs needs to be commented. There's a lot going on (tables of magic numbers, fail!("internal logic error"), etc.) but nothing to guide understanding the code.

@ryanhc
Copy link
Contributor Author

ryanhc commented Oct 28, 2013

Thanks everyone for the comments.

  • I totally agree that I should've had a better name. I will certainly change it.
  • It currently does not support static interned strings. It won't be too difficult to implement it and I will do that.
  • I will add comments.
  • The magic numbers are the seeds for the hash table. These are to be generated randomly.

The interner is using a global variable with unsafe blocks. There are some design issues that I need to decide before we could implement string interning. To implement interning properly, hubbub and css parser need to share the interning table, but with the current design, this is not easy. For that, I would like to have your opinions on how to do this properly.

  • Global variable with locks is one option, but not sure whether this is the right approach.
  • Singleton pattern is another option, but is singleton working properly in Rust?
  • Can we have a "common task" between hubbub and css parser, and put the interning table in there?
  • Do we need to implement a thread-safe hashmap first?
  • other approaches?
    I believe that also solve whether to have interning::init() api or not.

To guarantee the probability of hash collision, I implemented a universal hash for the interning, and customized it for u8 string hash. From my toy experiment, that boosted the hash performance by about 3 times on average.

@sanxiyn
Copy link
Contributor

sanxiyn commented Oct 28, 2013

In my opinion, a new hash table implementation should be a separate patch from string interning.

@SimonSapin
Copy link
Member

@sanxiyn, I think it’s fine to have the hashing algorithm, hash table and string interning in the same patch, especially if the former is only used by the latter. I’d just rather have them in different source files. However, is it really useful to implement a new hashing algorithm rather than using std::hash and std::hashmap?

@ryanhc, I think we can have useful string interning without integrating with hubbub. We can intern stuff when creating the DOM nodes, hubbub doesn’t need to know about it.

In the current patch, the interner is in a single static mut which is effectively a global variable, but I don’t see anything to make it task-safe / thead-safe. Is there any reason this wouldn’t cause race conditions? I believe this is the largest issue with this patch.

@@ -210,7 +210,7 @@ impl Document {
}

pub fn GetElementsByTagName(&self, tag: &DOMString) -> @mut HTMLCollection {
self.createHTMLCollection(|elem| eq_slice(elem.tag_name, null_str_as_empty(tag)))
self.createHTMLCollection(|elem| elem.tag_name.eq(&intern_string(null_str_as_empty(tag))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this probably should not intern tag. elem.tag_name can be converted to string slice instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that createHTMLCollection loops over the entire tree, I think interning probably makes a lot of sense, but not every time the callback is called.

@SimonSapin SimonSapin mentioned this pull request Oct 30, 2013
@pcwalton
Copy link
Contributor

I think this will need the concurrent hash map, which I have written but is blocked on a Rust upgrade.

@ryanhc ryanhc closed this Feb 13, 2014
ChrisParis pushed a commit to ChrisParis/servo that referenced this pull request Sep 7, 2014
Expand the test for document.embeds/plugins; r=Manishearth
@ryanhc ryanhc deleted the interning_on_css_selector_matching branch August 30, 2017 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants