Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[idna] Update data to Unicode 10.0 and fix logic #351

Merged
merged 7 commits into from Jun 21, 2017
Merged

Conversation

@behnam
Copy link
Member

behnam commented May 29, 2017

  • Change the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in #166. (Another attempt on #337 and #171)

  • Update IdnaTest.txt file to UCD 10.0 and fix Validation Rules, specially Bidi Rules, for the tests to pass.

  • Add TODO marks for new flags introduced in Unicode 10.0 version of UTS#46. (http://www.unicode.org/reports/tr46/proposed.html)

  • Add integration test for rust-url crate for the new behavior.

Fix #166


This change is Reviewable

@behnam
Copy link
Member Author

behnam commented May 29, 2017

Filed unicode-rs/unicode-normalization#16 for is_combining_mark() failure for U+11C3A.

@behnam behnam force-pushed the behnam:ucd10 branch 5 times, most recently from a3ea472 to a48ab35 May 29, 2017
@behnam behnam requested a review from SimonSapin May 29, 2017
@behnam behnam self-assigned this May 29, 2017
@bors-servo
Copy link
Contributor

bors-servo commented Jun 13, 2017

The latest upstream changes (presumably #360) made this pull request unmergeable. Please resolve the merge conflicts.

@behnam behnam force-pushed the behnam:ucd10 branch from a48ab35 to e1a2736 Jun 15, 2017
@behnam
Copy link
Member Author

behnam commented Jun 16, 2017

Now that unicode-normalization is fixed, I dropped the workaround the workaround in latest update: https://github.com/behnam/rust-url/tree/ucd10

@behnam behnam force-pushed the behnam:ucd10 branch 3 times, most recently to d0eb5bf Jun 16, 2017
@bors-servo
Copy link
Contributor

bors-servo commented Jun 17, 2017

The latest upstream changes (presumably #364) made this pull request unmergeable. Please resolve the merge conflicts.

@SimonSapin
Copy link
Member

SimonSapin commented Jun 20, 2017

Sorry for the delays and repeated conflicts. If you’d prefer I can take over and do the minor changes I requested below.


Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 2 of 2 files at r4, 2 of 2 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


Cargo.toml, line 5 at r5 (raw file):

name = "url"
# When updating version, also modify html_root_url in the lib.rs
version = "1.5.1"

This change is not necessary, please remove it. Since the new version of idna is semver-compatible, end-users can update to it independently of url.


Cargo.toml, line 45 at r5 (raw file):

encoding = {version = "0.2", optional = true}
heapsize = {version = ">=0.1.1, <0.5", optional = true}
idna = { version = "0.1.0", path = "./idna" }

This is also unnecessary. "0.1.0" means ">=0.1.0,<0.2.0", same as "0.1".


idna/src/uts46.rs, line 416 at r4 (raw file):

#[cfg(test)]
mod tests {

Cargo.toml specifies [lib] test = false to avoid compiling the library code twice, so these tests are never run. Please move them to a new idna/tests/unit.rs file and add a [[test]] name = "unit" section in Cargo.toml.


Comments from Reviewable

behnam added 4 commits May 12, 2017
A retry of #171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS #46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS #46, and to some level, anticipated.

As mentioned in #166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix #166
* The code was disabled to allow tests pass. Now that `IdnaTest.txt` is
fixed for this failure, we can re-enable the code.
* As the first paragraph of The Bidi Rules section explains, the rules
need to be ignored if there are no Bidi labels present in the domain
name. So, add `is_bidi_domain` evaluation to `processing()`, and pass it
down to `passes_bidi()` to act on.

* Add unit tests for the bidi rules, making it faster and easier to
maintain the feature.
@behnam behnam force-pushed the behnam:ucd10 branch from d0eb5bf to a8a997a Jun 20, 2017
@behnam
Copy link
Member Author

behnam commented Jun 20, 2017

Thanks for the review, @SimonSapin. I've addressed all the comments and rebased, so should be good to land.


Review status: 1 of 5 files reviewed at latest revision, 3 unresolved discussions.


idna/src/uts46.rs, line 416 at r4 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Cargo.toml specifies [lib] test = false to avoid compiling the library code twice, so these tests are never run. Please move them to a new idna/tests/unit.rs file and add a [[test]] name = "unit" section in Cargo.toml.

Done.


Cargo.toml, line 5 at r5 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

This change is not necessary, please remove it. Since the new version of idna is semver-compatible, end-users can update to it independently of url.

Done.


Cargo.toml, line 45 at r5 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

This is also unnecessary. "0.1.0" means ">=0.1.0,<0.2.0", same as "0.1".

Done.


Comments from Reviewable

@SimonSapin
Copy link
Member

SimonSapin commented Jun 21, 2017

Reviewed 1 of 2 files at r7, 2 of 3 files at r10, 1 of 1 files at r11.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


idna/src/uts46.rs, line 346 at r10 (raw file):

#[derive(PartialEq, Eq, Clone, Copy, Debug)]
pub enum Error {

Can we not make this public? That would mean that adding a new variant (like you did in another commit of this PR) would be a breaking change. It looks like unit tests don’t need it.


Comments from Reviewable

@behnam behnam force-pushed the behnam:ucd10 branch from a8a997a to f220486 Jun 21, 2017
@behnam
Copy link
Member Author

behnam commented Jun 21, 2017

Review status: 3 of 5 files reviewed at latest revision, 1 unresolved discussion.


idna/src/uts46.rs, line 346 at r10 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Can we not make this public? That would mean that adding a new variant (like you did in another commit of this PR) would be a breaking change. It looks like unit tests don’t need it.

Right, there was no need for it anymore. Reverted it.


Comments from Reviewable

@behnam behnam force-pushed the behnam:ucd10 branch from f220486 to 3c7da07 Jun 21, 2017
@behnam
Copy link
Member Author

behnam commented Jun 21, 2017

Also updated the IdnaTest file to final release, which came out yesterday. So, there's nothing left from the draft versions and we're all up to date.

@behnam
Copy link
Member Author

behnam commented Jun 21, 2017

Btw, updating uts46_mapping_table.rs to Unicode 10.0.0 should be done in a separate diff. Let me know if you like me to submit a PR.

@SimonSapin
Copy link
Member

SimonSapin commented Jun 21, 2017

Looks great, thanks!

@bors-servo r+

Btw, updating uts46_mapping_table.rs to Unicode 10.0.0 should be done in a separate diff. Let me know if you like me to submit a PR.

Please do. (I’m a bit surprise the new tests pass without the new data. Maybe the tests don’t cover the normative differences.)


Reviewed 1 of 4 files at r6, 1 of 2 files at r12, 1 of 1 files at r13, 1 of 1 files at r14.
Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@bors-servo
Copy link
Contributor

bors-servo commented Jun 21, 2017

📌 Commit 3c7da07 has been approved by SimonSapin

@bors-servo
Copy link
Contributor

bors-servo commented Jun 21, 2017

Testing commit 3c7da07 with merge 40aa05a...

bors-servo added a commit that referenced this pull request Jun 21, 2017
[idna] Update data to Unicode 10.0 and fix logic

* Change the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in #166. (Another attempt on #337 and #171)

* Update `IdnaTest.txt` file to UCD 10.0 and fix Validation Rules, specially Bidi Rules, for the tests to pass.

* Add TODO marks for new flags introduced in Unicode 10.0 version of UTS#46. (http://www.unicode.org/reports/tr46/proposed.html)

* Add integration test for `rust-url` crate for the new behavior.

Fix #166

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/351)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Jun 21, 2017

☀️ Test successful - status-travis
Approved by: SimonSapin
Pushing 40aa05a to master...

@bors-servo bors-servo merged commit 3c7da07 into servo:master Jun 21, 2017
4 checks passed
4 checks passed
code-review/reviewable 5 files reviewed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants
You can’t perform that action at this time.