Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upServo's async Tokenizer: New! Improved! Now with multithreading! #17565
Conversation
highfive
commented
Jun 29, 2017
highfive
commented
Jun 29, 2017
|
cc @nox |
| } | ||
|
|
||
| #[derive(HeapSizeOf)] | ||
| enum ToSinkMsg { |
This comment has been minimized.
This comment has been minimized.
cynicaldevil
Jun 29, 2017
Author
Contributor
The messages sent from the main thread Tokenizer to the Sink are the results of the synchronous operations we perform. Only two sync operations exist.
This comment has been minimized.
This comment has been minimized.
| HtmlTokenizer(FromHtmlTokenizerMsg) | ||
| } | ||
|
|
||
| let mut send_tendrils = Vec::new(); |
This comment has been minimized.
This comment has been minimized.
cynicaldevil
Jun 29, 2017
Author
Contributor
Instead of making the BufferQueue implement Send, I now extract the StrTendrils from the BufferQueue and convert them to SendTendrils, which are then stored in a mutex. After we get a result from the feed function, we update input based on the contents of the mutex. The mutex is updated after HtmlTokenizer's feed method returns.
|
|
Yes, it requires the new tendril version, which implements clone for Sendtendril. It works, I have compiled this locally using that. |
| @@ -4,12 +4,13 @@ | |||
|
|
|||
| #![allow(unrooted_must_root)] | |||
|
|
|||
| use core::iter::FromIterator; | |||
| use core::str::FromStr; | |||
This comment has been minimized.
This comment has been minimized.
nox
Jul 3, 2017
Member
Aren't these already imported? At the very least, you shouldn't depend on them from core.
| } | ||
| } | ||
| #[derive(HeapSizeOf, JSTraceable)] | ||
| struct Attribute { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
cynicaldevil
Jul 5, 2017
Author
Contributor
The attributes defined in h5e can't be sent between threads, so I convert them to this type before sending them.
| enum ParseOperation { | ||
| GetTemplateContents(ParseNodeId, ParseNodeId), | ||
| CreateElement(ParseNodeId, QualName, Vec<Attribute>), | ||
| CreateComment(StrTendril, ParseNodeId), | ||
| CreateElement(ParseNodeId, QualName, Vec<Attribute>, u64), |
This comment has been minimized.
This comment has been minimized.
nox
Jul 3, 2017
Member
Please make these have named fields, as I mentioned in the first PR. What even is that u64?
| Pop(ParseNodeId), | ||
| SetQuirksMode( | ||
| #[ignore_heap_size_of = "Defined in style"] | ||
| ServoQuirksMode |
This comment has been minimized.
This comment has been minimized.
| } | ||
|
|
||
| #[derive(JSTraceable, HeapSizeOf)] | ||
| #[derive(HeapSizeOf)] | ||
| enum FromHtmlTokenizerMsg { |
This comment has been minimized.
This comment has been minimized.
| } | ||
|
|
||
| #[derive(HeapSizeOf)] | ||
| enum ToSinkMsg { |
This comment has been minimized.
This comment has been minimized.
| let send_tendrils = Arc::new(Mutex::new(send_tendrils)); | ||
| // Send message to parser thread, asking it to started reading from the shared input. | ||
| // Parser operation messages will be sent to main thread as they are evaluated. | ||
| self.html_tokenizer_sender.send(ToHtmlTokenizerMsg::Feed(send_tendrils.clone())).unwrap(); |
This comment has been minimized.
This comment has been minimized.
nox
Jul 3, 2017
Member
Why do we need to put the tendrils in a Arc<Mutex<T>> if we just send them afterwards?
This comment has been minimized.
This comment has been minimized.
cynicaldevil
Jul 5, 2017
Author
Contributor
I need to access send_tendrils again a few lines below....but this seems confusing to others. I think I'll remove the mutex and just pass it in the message itself.
This comment has been minimized.
This comment has been minimized.
cynicaldevil
Jul 6, 2017
Author
Contributor
Hmm...yeah... passing them via messages is not possible, needs HeapSizeOf impl for Tendril. Mutexes will have to do for now.
This comment has been minimized.
This comment has been minimized.
nox
Jul 18, 2017
Member
HeapSizeOf can be ignored, I think it's less important than avoiding mutices were we can.
| match receiver.recv().expect("Unexpected sink channel panic in Html parser thread") { | ||
| ToHtmlTokenizerMsg::Feed(data) => { | ||
| let mut data = data.lock().unwrap(); | ||
| let input = (*data).iter(). |
This comment has been minimized.
This comment has been minimized.
| ToHtmlTokenizerMsg::Feed(data) => { | ||
| let mut data = data.lock().unwrap(); | ||
| let input = (*data).iter(). | ||
| map(|st| StrTendril::from(st.clone())).collect::<VecDeque<StrTendril>>(); |
This comment has been minimized.
This comment has been minimized.
| is_integration_point: bool, | ||
| } | ||
|
|
||
| impl Default for ParseNodeData { |
This comment has been minimized.
This comment has been minimized.
Use tendril 0.3.1. |
|
Currently, tendril's types are imported via h5e, for example, StrTendril is imported as |
e42747d
to
ef283e5
|
@nox I've updated the PR with the changes. I also added some comments to better explain the workings. Now, if we ask bors to try this PR, it will test it using the synchronous tokenizer. Should I add an additional commit which enables the async tokenizer in preferences, just for testing purposes? |
|
|
ef283e5
to
bb05fd0
|
I rebased the PR, but don't know why the needs-rebase label wasn't removed. |
|
|
bb05fd0
to
5fae497
|
@nox ready for try, then? |
|
@bors-servo try |
Servo's async Tokenizer: New! Improved! Now with multithreading! <!-- Please describe your changes on the following line: --> **Note: Not to be merged until tendril v0.3.1 lands in Servo.** The Tokenizer, defined in the file `async_html.rs` will run on the main thread, while h5e's `Tokenizer`(along with its Sink) lives on the parser thread. Both h5e's `Tokenizer` and `Sink` communicate with the main thread Tokenizer via their own mpsc channels. The Sink keeps sending parser operations to the main thread, to be executed. For some operations, it waits for a message from the main thread before returning. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [ ] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors - [ ] These changes fix #__ (github issue number if applicable). <!-- Either: --> - [ ] There are tests for these changes OR - [ ] These changes do not require tests because _____ <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/17565) <!-- Reviewable:end -->
|
|
5fae497
to
aebd6d4
|
@nox Removed a sender-receiver pair and merged it with another one. Now @bors-servo try |
|
@bors-servo try |
Servo's async Tokenizer: New! Improved! Now with multithreading! <!-- Please describe your changes on the following line: --> The Tokenizer, defined in the file `async_html.rs` will run on the main thread, while h5e's `Tokenizer`(along with its Sink) lives on the parser thread. Both h5e's `Tokenizer` and `Sink` communicate with the main thread Tokenizer via their own mpsc channels. The Sink keeps sending parser operations to the main thread, to be executed. For some operations, it waits for a message from the main thread before returning. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [ ] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors - [ ] These changes fix #__ (github issue number if applicable). <!-- Either: --> - [ ] There are tests for these changes OR - [ ] These changes do not require tests because _____ <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/17565) <!-- Reviewable:end -->
|
|
highfive
commented
Jul 28, 2017
|
b33aef2
to
0750fdc
|
Forgot to add changes before committing, trying again: |
Servo's async Tokenizer: New! Improved! Now with multithreading! <!-- Please describe your changes on the following line: --> The Tokenizer, defined in the file `async_html.rs` will run on the main thread, while h5e's `Tokenizer`(along with its Sink) lives on the parser thread. Both h5e's `Tokenizer` and `Sink` communicate with the main thread Tokenizer via their own mpsc channels. The Sink keeps sending parser operations to the main thread, to be executed. For some operations, it waits for a message from the main thread before returning. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [ ] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors - [ ] These changes fix #__ (github issue number if applicable). <!-- Either: --> - [ ] There are tests for these changes OR - [ ] These changes do not require tests because _____ <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/17565) <!-- Reviewable:end -->
|
|
|
@bors-servo: r=nox,jdm |
|
|
Servo's async Tokenizer: New! Improved! Now with multithreading! <!-- Please describe your changes on the following line: --> The Tokenizer, defined in the file `async_html.rs` will run on the main thread, while h5e's `Tokenizer`(along with its Sink) lives on the parser thread. Both h5e's `Tokenizer` and `Sink` communicate with the main thread Tokenizer via their own mpsc channels. The Sink keeps sending parser operations to the main thread, to be executed. For some operations, it waits for a message from the main thread before returning. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [ ] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors - [ ] These changes fix #__ (github issue number if applicable). <!-- Either: --> - [ ] There are tests for these changes OR - [ ] These changes do not require tests because _____ <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/17565) <!-- Reviewable:end -->
|
|
… a CLOSED TREE Backs out servo#17565
Multiple gecko backouts Backed out changeset a417b9d7712d for vendoring bustage. r=backout on a CLOSED TREE Backs out #17565 <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/17910) <!-- Reviewable:end --> --- Backed out changeset c424ad1c5f94 for build failures a=backout CLOSED TREE Backs out #17892
|
This was reverted by Gecko and needs to be merged again. However, to avoid the problem that triggered the revert, we probably need to conditionally add HeapSizeOf in selectors if a "servo" feature is enabled for the crate. |
Run the async HTML Tokenizer on a new thread <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [X] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors Follow up for #17565 <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/17914) <!-- Reviewable:end -->
cynicaldevil commentedJun 29, 2017
•
edited
The Tokenizer, defined in the file
async_html.rswill run on the main thread, while h5e'sTokenizer(along with its Sink) lives on the parser thread. Both h5e'sTokenizerandSinkcommunicate with the main thread Tokenizer via their own mpsc channels. The Sink keeps sending parser operations to the main thread, to be executed. For some operations, it waits for a message from the main thread before returning../mach build -ddoes not report any errors./mach test-tidydoes not report any errorsThis change is