Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDNA support #119

Merged
merged 12 commits into from Jan 18, 2016
Merged

IDNA support #119

merged 12 commits into from Jan 18, 2016

Conversation

@valenting
Copy link
Collaborator

valenting commented Jul 29, 2015

I wrote a UTS 46 implementation based on http://intertwingly.net/projects/pegurl/idna.js
It needs a fast path for ascii codepoints, but I wanted to get some feedback on it first.

Review on Reviewable

# except according to those terms.


# Run as: python make_idna_table.py idna_table.txt > src/idna_table.rs

This comment has been minimized.

@SimonSapin

SimonSapin Jul 29, 2015

Member

Is idna_table.txt provided by Unicode? Where can it be found?

This comment has been minimized.

@SimonSapin

SimonSapin Jul 30, 2015

Member

Please add this URL in a code comment.

src/idna.rs Outdated
use std::char;

fn idna_mapped(mapping: &'static [u32]) -> Result<String, &'static str> {
let mut ret = "".to_string();

This comment has been minimized.

@SimonSapin

SimonSapin Jul 30, 2015

Member

Please use four space indents, for consistency with other files.

src/host.rs Outdated
if !domain.is_ascii() {
Err(ParseError::NonAsciiDomainsNotSupportedYet)
} else if domain.find(&[
domain = match punycode::encode_str(&domain) {

This comment has been minimized.

@SimonSapin

SimonSapin Jul 30, 2015

Member

Converting to punycode should be part of domain_to_ascii per http://www.unicode.org/reports/tr46/#ToASCII , and should happen on individual dot-separated labels, not the entire domain.

src/host.rs Outdated
Err(ParseError::NonAsciiDomainsNotSupportedYet)
} else if domain.find(&[
domain = match punycode::encode_str(&domain) {
Some(dom) => "xn--".to_string() + &dom,

This comment has been minimized.

@SimonSapin

SimonSapin Jul 30, 2015

Member

I think that adding the xn-- prefix should also be part of domain_to_ascii, but UTS#46 has a bug: whatwg/url#53

@SimonSapin
Copy link
Member

SimonSapin commented Jul 30, 2015

Could you also integrate the tests form http://www.unicode.org/Public/idna/latest/IdnaTest.txt ?


I’m considering rewriting the Python scripts to Rust and running them as Cargo build scripts to avoid having generated files in the repository. However in Gecko it’s not as easy if you’re not using Cargo. It would be a separate .rs file to be compiled for the host (does Gecko do cross-compilation?), run with an OUT_DIR environment variable, and have that run be a dependency of building the url crate, which should be done with the same environment variable.

How painful is it to do in Gecko? If it would make your like more difficult I can stick to Python and generated files in the repository.

@valenting
Copy link
Collaborator Author

valenting commented Jul 30, 2015

It would be a bit harder to have the Gecko build system run the rust files at build time, but at the moment that's not a blocker. We can include the the generated rust-generated files just as we do with the python generated ones at the moment.

@bors-servo
Copy link
Contributor

bors-servo commented Aug 11, 2015

The latest upstream changes (presumably #125) made this pull request unmergeable. Please resolve the merge conflicts.

@bors-servo
Copy link
Contributor

bors-servo commented Sep 3, 2015

The latest upstream changes (presumably #129) made this pull request unmergeable. Please resolve the merge conflicts.

@SimonSapin SimonSapin mentioned this pull request Dec 5, 2015
@SimonSapin
Copy link
Member

SimonSapin commented Dec 5, 2015

See #152

@valenting
Copy link
Collaborator Author

valenting commented Dec 15, 2015

I've implemented the BIDI checks, and now all of the tests pass (except the contextJ tests, which client software isn't required to pass).
For combining_mark I call into unicode_normalization::char::canonical_combining_class - which I fear might not map correctly to General_Category=Mark. The unicode-normalization table generator has this data, but doesn't use it:
https://github.com/unicode-rs/unicode-normalization/blob/master/scripts/unicode.py#L92

@SimonSapin
Copy link
Member

SimonSapin commented Jan 15, 2016

Other than a few nits I think this is ready to land. Thanks again!

@SimonSapin
Copy link
Member

SimonSapin commented Jan 18, 2016

@SimonSapin
Copy link
Member

SimonSapin commented Jan 18, 2016

Not sure what’s happening with homu, but the Travis build for the PR push is green.

SimonSapin added a commit that referenced this pull request Jan 18, 2016
IDNA support
@SimonSapin SimonSapin merged commit d6e1ac5 into servo:master Jan 18, 2016
2 of 3 checks passed
2 of 3 checks passed
code-review/reviewable Review in progress: 0 of 11 files reviewed, all discussions resolved
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.