Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New IntoUrl trait #177

Closed
wants to merge 40 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
918352b
Make it possible to define new encode sets in other crates.
SimonSapin Dec 4, 2015
db9de70
Define encode sets based on another set.
SimonSapin Dec 4, 2015
691aec2
Remove the HTTP_VALUE encode set. It can be defined in another crate.
SimonSapin Dec 4, 2015
d140dc8
Rewrite ALL THE THINGS!
SimonSapin Dec 9, 2015
9edff44
Remove the dependency on uuid.
SimonSapin Feb 8, 2016
576bd2a
Add URL slicing/indexing by component.
SimonSapin Feb 8, 2016
7b11445
Add stubs with partial implementation for the WebIDL API.
SimonSapin Feb 8, 2016
c617ed1
Shorter Cargo.toml syntax.
SimonSapin Feb 8, 2016
22cf104
serde_serialization -> serde
SimonSapin Feb 8, 2016
0cb3f2b
Make rustc-serialize an optional dependency.
SimonSapin Feb 8, 2016
61a8185
Rename *{Start,End} posititons to {Before,After}*
SimonSapin Feb 9, 2016
813d270
Replace from_hex() with char::to_digit(16)
SimonSapin Feb 9, 2016
0b5ffb4
Make percent-decoding an iterator.
SimonSapin Feb 9, 2016
244d999
Make percent-encoding an iterator.
SimonSapin Feb 9, 2016
7b33b33
Add percent-encoding convienience wrappers.
SimonSapin Feb 9, 2016
ca9f87d
Update tests from https://github.com/w3c/web-platform-tests/blob/mast…
SimonSapin Feb 10, 2016
7a0e467
Remove Url::has_host
SimonSapin Feb 11, 2016
9a8d394
Remove unused ParseError variants
SimonSapin Feb 12, 2016
903f1d2
Make context a field of Parser.
SimonSapin Feb 11, 2016
a9b4e71
Remove the redundant is_relative field.
SimonSapin Feb 15, 2016
ded48a2
Add Url::domain and Url::ip_address
SimonSapin Feb 15, 2016
d3dba86
Implement ToSocketAddrs
SimonSapin Feb 15, 2016
088c3ed
Remove Url::ip_address for now
SimonSapin Feb 15, 2016
641f940
Add Unicode and ASCII serializations of origins
SimonSapin Feb 16, 2016
946d950
Test WebIdl::origin
SimonSapin Feb 16, 2016
4dff876
Add a fragment setter
SimonSapin Feb 11, 2016
0ae07ed
Add a query setter.
SimonSapin Feb 12, 2016
542feb0
Make Url::parse_with usable. (EncodingOverride is private.)
SimonSapin Feb 19, 2016
dd0436a
Add Origin::is_tuple
SimonSapin Feb 19, 2016
f7e0d7c
More consistent checks for URL with authority or path-only.
SimonSapin Feb 19, 2016
fd16b74
Re-export OpaqueOrigin. It is exposed publicly through Origin::Opaque
SimonSapin Feb 19, 2016
f1bdaa6
Add a scheme setter
SimonSapin Feb 19, 2016
158145f
Add host setters.
SimonSapin Feb 19, 2016
b1b0916
More setters
SimonSapin Feb 23, 2016
47e31ef
Add a path setter
SimonSapin Feb 26, 2016
e7a4dc0
Username and passowrd setters
SimonSapin Feb 26, 2016
5b26c89
More WebIDL implementations.
SimonSapin Feb 26, 2016
bf0f670
Port setters
SimonSapin Mar 1, 2016
b89d7d7
All setters.
SimonSapin Mar 1, 2016
3f9dcd4
New IntoUrl trait
cmbrandenburg Mar 4, 2016
File filter...
Filter file types
Jump to…
Jump to file
Failed to load files.

Always

Just for now

Rewrite ALL THE THINGS!

This changes the data structure for `Url`:

Rather than having multiple `String` (or `Vec<String>`) components,
this uses a single `String` that contains the serialization of an URL
and some indices into it to access components in O(1) time.

This saves on memory allocations and makes serialization and some other
methods very cheap, as they return `&str` rather than building a new `String`.

As a consequence, most of `src/lib.rs` and `src/parser.rs` had to be rewritten.
  • Loading branch information
SimonSapin committed Mar 3, 2016
commit d140dc82a789ff7e875ad609045904b039216963
@@ -11,7 +11,6 @@ readme = "README.md"
keywords = ["url", "parser"]
license = "MIT/Apache-2.0"

[[test]] name = "format"
[[test]] name = "form_urlencoded"
[[test]] name = "idna"
[[test]] name = "punycode"
@@ -37,6 +37,7 @@ impl EncodingOverride {
}
}

#[inline]
pub fn utf8() -> EncodingOverride {
EncodingOverride { encoding: None }
}
@@ -75,6 +76,7 @@ pub struct EncodingOverride;

#[cfg(not(feature = "query_encoding"))]
impl EncodingOverride {
#[inline]
pub fn utf8() -> EncodingOverride {
EncodingOverride
}

This file was deleted.

@@ -6,39 +6,58 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

use std::ascii::AsciiExt;
use std::cmp;
use std::fmt::{self, Formatter};
use std::fmt::{self, Formatter, Write};
use std::net::{Ipv4Addr, Ipv6Addr};
use parser::{ParseResult, ParseError};
use percent_encoding::{from_hex, percent_decode};
use idna;

#[derive(Copy, Clone, Debug)]
#[cfg_attr(feature="heap_size", derive(HeapSizeOf))]
pub enum HostInternal {
None,
Domain,
Ipv4(Ipv4Addr),
Ipv6(Ipv6Addr),
}

/// The host name of an URL.
#[derive(PartialEq, Eq, Clone, Debug, Hash, PartialOrd, Ord)]
#[derive(Clone, Debug, Eq, PartialEq, Ord, PartialOrd, Hash)]
#[cfg_attr(feature="heap_size", derive(HeapSizeOf))]
pub enum Host {
/// A (DNS) domain name.
Domain(String),
/// A IPv4 address, represented by four sequences of up to three ASCII digits.
pub enum Host<S=String> {
/// A DNS domain name, as '.' dot-separated labels.
/// Non-ASCII labels are encoded in punycode per IDNA.
Domain(S),

/// An IPv4 address.
/// `Url::host_str` returns the serialization of this address,
/// as four decimal integers separated by `.` dots.
Ipv4(Ipv4Addr),
/// An IPv6 address, represented inside `[...]` square brackets
/// so that `:` colon characters in the address are not ambiguous
/// with the port number delimiter.

/// An IPv6 address.
/// `Url::host_str` returns the serialization of that address between `[` and `]` brackets,
/// in the format per [RFC 5952 *A Recommendation
/// for IPv6 Address Text Representation*](https://tools.ietf.org/html/rfc5952):
/// lowercase hexadecimal with maximal `::` compression.
Ipv6(Ipv6Addr),
}

impl<'a> Host<&'a str> {
pub fn to_owned(&self) -> Host<String> {
match *self {
Host::Domain(domain) => Host::Domain(domain.to_owned()),
Host::Ipv4(address) => Host::Ipv4(address),
Host::Ipv6(address) => Host::Ipv6(address),
}
}
}

impl Host {
impl Host<String> {
/// Parse a host: either an IPv6 address in [] square brackets, or a domain.
///
/// Returns `Err` for an empty host, an invalid IPv6 address,
/// or a or invalid non-ASCII domain.
pub fn parse(input: &str) -> ParseResult<Host> {
if input.len() == 0 {
return Err(ParseError::EmptyHost)
}
/// https://url.spec.whatwg.org/#host-parsing
pub fn parse(input: &str) -> Result<Self, ParseError> {
if input.starts_with("[") {
if !input.ends_with("]") {
return Err(ParseError::InvalidIpv6Address)
@@ -47,37 +66,24 @@ impl Host {
}
let decoded = percent_decode(input.as_bytes());
let domain = String::from_utf8_lossy(&decoded);

let domain = match idna::domain_to_ascii(&domain) {
Ok(s) => s,
Err(_) => return Err(ParseError::InvalidDomainCharacter)
};

if domain.find(&[
'\0', '\t', '\n', '\r', ' ', '#', '%', '/', ':', '?', '@', '[', '\\', ']'
][..]).is_some() {
let domain = try!(idna::domain_to_ascii(&domain));
if domain.find(|c| matches!(c,
'\0' | '\t' | '\n' | '\r' | ' ' | '#' | '%' | '/' | ':' | '?' | '@' | '[' | '\\' | ']'
)).is_some() {
return Err(ParseError::InvalidDomainCharacter)
}
match parse_ipv4addr(&domain[..]) {
Ok(Some(ipv4addr)) => Ok(Host::Ipv4(ipv4addr)),
Ok(None) => Ok(Host::Domain(domain.to_ascii_lowercase())),
Err(e) => Err(e),
if let Some(address) = try!(parse_ipv4addr(&domain)) {
Ok(Host::Ipv4(address))
} else {
Ok(Host::Domain(domain.into()))
}
}

/// Serialize the host as a string.
///
/// A domain a returned as-is, an IPv6 address between [] square brackets.
pub fn serialize(&self) -> String {
self.to_string()
}
}


impl fmt::Display for Host {
impl<S: AsRef<str>> fmt::Display for Host<S> {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
match *self {
Host::Domain(ref domain) => domain.fmt(f),
Host::Domain(ref domain) => domain.as_ref().fmt(f),
Host::Ipv4(ref addr) => addr.fmt(f),
Host::Ipv6(ref addr) => {
try!(f.write_str("["));
@@ -88,6 +94,19 @@ impl fmt::Display for Host {
}
}

/// Parse `input` as a host.
/// If successful, write its serialization to `serialization`
/// and return the internal representation for `Url`.
pub fn parse(input: &str, serialization: &mut String) -> ParseResult<HostInternal> {
let host = try!(Host::parse(input));
write!(serialization, "{}", host).unwrap();
match host {
Host::Domain(_) => Ok(HostInternal::Domain),
Host::Ipv4(address) => Ok(HostInternal::Ipv4(address)),
Host::Ipv6(address) => Ok(HostInternal::Ipv6(address)),
}
}

fn write_ipv6(addr: &Ipv6Addr, f: &mut Formatter) -> fmt::Result {
let segments = addr.segments();
let (compress_start, compress_end) = longest_zero_sequence(&segments);
@@ -165,6 +184,9 @@ fn parse_ipv4number(mut input: &str) -> ParseResult<u32> {
}

fn parse_ipv4addr(input: &str) -> ParseResult<Option<Ipv4Addr>> {
if input.is_empty() {
return Ok(None)
}
let mut parts: Vec<&str> = input.split('.').collect();
if parts.last() == Some(&"") {
parts.pop();
@@ -3,6 +3,7 @@
//! https://url.spec.whatwg.org/#idna

use self::Mapping::*;
use parser::ParseError;
use punycode;
use std::ascii::AsciiExt;
use unicode_normalization::UnicodeNormalization;
@@ -257,6 +258,10 @@ pub enum Error {
TooLongForDns,
}

impl From<Vec<Error>> for ParseError {
fn from(_: Vec<Error>) -> ParseError { ParseError::IdnaError }
}

/// http://www.unicode.org/reports/tr46/#ToASCII
pub fn uts46_to_ascii(domain: &str, flags: Uts46Flags) -> Result<String, Vec<Error>> {
let mut errors = Vec::new();
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.