Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a representation for relative URLs. #10

Closed
andreubotella opened this issue Oct 28, 2019 · 8 comments
Closed

Find a representation for relative URLs. #10

andreubotella opened this issue Oct 28, 2019 · 8 comments

Comments

@andreubotella
Copy link

andreubotella commented Oct 28, 2019

While working on hyperlink references as part of issue #7, I realized that we need a way to represent relative URLs in the RST doctree. The servo/rust-url crate we're using can parse relative URL strings with a base, but can only store absolute URLs. Unfortunately, looking through the first pages of crates.io, nobody seems to have tried to reinvent the wheel in that matter.

For the time being, I'm leaving relative URL references unimplemented. In the meantime I'll fork and play around with the url crate to try and make a version that can represent relative URLs.

@flying-sheep
Copy link
Owner

flying-sheep commented Oct 28, 2019

It’s been addressed in servo/rust-url#487. URLs can’t be relative, URIs can. http::uri::Uri exists! Our alternatives:

  1. Each reference knows which path the document it comes from has.
  2. Resolve relative URIs into URLs while parsing. This is a destructive operation.

Question: How do URIs interact with substitutions and .. include ::? Will the source document be relevant for relative path resolution or the document it ends in after transclusion?

@andreubotella
Copy link
Author

andreubotella commented Oct 28, 2019

Thanks for bringing that to my attention, I wasn't aware that that was brought up in servo/rust-url.

About URLs being relative, as a web dev by day I prefer taking my definitions from the WHATWG URL standard, which effectively deprecates the term URI. Sure, that standard doesn't give a way for browsers to represent relative URLs as a data structure, but it does define them as being URLs. But I'm fine with following the URI RFCs.

About the include directive, it seems to simply take the tree parsed by the included file and copy it into the document as is, without bothering to resolve paths. But if the included files in turn have include directives, those are resolved and replaced before the tree is included in the parent document.

Now, I haven't dug into the docutils code for this, but it seems that after validating a URL they simply store it as text in the refuri field, rather than keeping a URI data structure and serializing that on the other end. We could consider doing the same. The URL grammar I've written in the parser as part of my upcoming PR is too broad to identify a valid absolute URL, and I'm using Url::parse in the conversion stage and returning a string rather than a reference if it fails. We could even check for valid relative URLs that way by parsing with a dummy base URL.

@flying-sheep
Copy link
Owner

You’re right, it’s probably easiest to simply create a wrapper type that just validates if we want to construct it with a valid URL: What we need is the ability to represent links. Any link can have a scheme, host, port, path, query and fragment. Neither crate of http::uri and url seems to support this, as what http::uri calls “relative URIs” still contain absolute paths (“relative” here means “same host and scheme”):

use http::Uri;

fn main() {
    println!("{:?}", "/foo/bar?baz".parse::<Uri>());
    // Ok(/foo/bar?baz)
    println!("{:?}", "./foo/bar?baz".parse::<Uri>());
    // Err(InvalidUri(InvalidFormat))
}

@andreubotella
Copy link
Author

andreubotella commented Nov 5, 2019

I put this issue on hold while I tried to figure how to parse standalone hyperlinks (which can't have punctuation at the end) on pest, but now that I'm back to it, I see that the crate::target::Target enum seems to be meant to represent URLs that could be absolute or relative. However, the relative URL variant uses PathBuf, which is meant for storing a filesystem path. That means that the internal representation of the path is whatever the compile target platform uses for representing filesystem names.

Now, Target is used for marking URLs / URIs in the extra attributes of different elements, but it's also used as the source common attribute. In docutils, source is a filesystem path (apparently never an absolute URL) indicating the source file in the root document and in system messages. For that it would make sense to use simply PathBuf.

For every other use, we could keep Target as a type (perhaps renamed) but turn it into a tuple struct that contains a String and validates the URL on construction as you suggest, something like:

#[derive(Debug,PartialEq,Serialize,Clone)]
#[serde(transparent)]
pub struct Target(String);
impl Target {
    pub fn from_absolute_url(input: &str) -> Result<Self, url::ParseError> {
        let parse = Url::parse(input)?;
        Ok(Target(parse.into_string()))
    }
    pub fn from_relative_url(input: &str) -> Result<Self, url::ParseError> {
        unimplemented!()
    }
}

@flying-sheep
Copy link
Owner

flying-sheep commented Nov 5, 2019

Yes, you’re right, I didn’t think about compile location specificity of PathBuf or the fact that it could include a query or fragment part. Why would you like to rename it though?

@andreubotella
Copy link
Author

Well, the name Target suggests any kind of reference target – not just a URL but an internal target reference. Something like GenericUrl or UrlString would be more appropriate.

@flying-sheep
Copy link
Owner

Sure, let’s just call it Url then!

@andreubotella
Copy link
Author

Well, we're already using url::Url, but it seems all the cases where it was used can be replaced by Target, so I guess we can treat the former Target as a wrapper around it.

andreubotella pushed a commit to andreubotella/rust-rst that referenced this issue Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants