Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document whether path! and param (and param2) de-urlencode strings before conversion #242

Open
droundy opened this issue Jun 23, 2019 · 9 comments

Comments

@droundy
Copy link

droundy commented Jun 23, 2019

It looks (from experiment) like path!(String/String) gives the URL-encoded paths, rather than decoding. This feels wrong to me, since FromStr is expecting a canonical representation, which is never URL-encoded. e.g. FromStr for an IpAddr will happily parse "::1" according to its documentation, which could never show up in a segment of an URL.

If giving url-encoded strings is intended, then it would also be convenient to give an example of how to get an actual normal string out of this.

@droundy
Copy link
Author

droundy commented Jun 23, 2019

Here is how I figured out to work around this issue:

let list_of_lists = path!(String).map(|name: String| {
       let name = percent_encoding::percent_decode(name.as_bytes()).decode_utf8().unwrap();
       ...
});

It feels, however, like this should be done in the path! processing code.

droundy added a commit to droundy/lists that referenced this issue Jun 23, 2019
@seanmonstar
Copy link
Owner

I agree on both counts, it should be documented for sure, and probably should be decided automatically. First one is easier, the latter needs some thought to not affect performance too much.

@droundy
Copy link
Author

droundy commented Jun 25, 2019

I wonder for the next breaking change whether you'd want to switch to a new FromPercentEncoding trait, which for the standard types could be equivalent to FromStr for types for which no valid representation contains invalid HTML characters and does percent decoding for types where it could matter.

On the other hand, maybe the performance cost to percent decode is minimal and doesn't matter? It's hard to imagine a path becoming large...

@giacomocariello
Copy link

Bump

@bestia-dev
Copy link

Is this a good approach?

I crate a new type PercentDecoded that guarantees it has been decoded.
It has traits FromStr and ToString.
The standard String cannot guarantee if it has been decoded or not.

in warp I get the parameter from url directly in PercentDecoded type:

warp::path!("name" / PercentDecoded)
	.and_then(|name: PercentDecoded| async move {
		search(name);
		...

the function accepts PercentDecoded to be sure it has been decoded.
It cannot accept a String.

pub async fn search(name: PercentDecoded) {
	let name = name.to_string();
	dbg!(name);
	...
//! encode_decode_mod

// cargo.toml: 
// percent-encoding = "2.1.0"
// anyhow = "1.0.31"

use core::str::FromStr;
use anyhow::{Error};
use std::string::ToString;
use percent_encoding::{percent_decode_str, AsciiSet, CONTROLS};

/// https://url.spec.whatwg.org/#fragment-percent-encode-set
const FRAGMENT: &AsciiSet = &CONTROLS.add(b' ').add(b'"').add(b'<').add(b'>').add(b'`');

pub fn utf8_percent_encode(s:&str)->String{
    percent_encoding::utf8_percent_encode(s,FRAGMENT).to_string()
}

// region: type with guarantee that it has been decoded

#[derive(Clone, Debug)]
pub struct PercentDecoded {
    /// private inaccessible field
    s: String,
}
impl FromStr for PercentDecoded {
    type Err = Error;
    #[inline]
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        let s = percent_decode_str(s).decode_utf8()?.to_string();
        Ok(PercentDecoded{s})
    }
}
impl ToString for PercentDecoded {
    #[inline]
    fn to_string(&self) -> String {
        self.s.clone()
    }
}

// endregion: type with guarantee that it has been decoded

@seanmonstar
Copy link
Owner

That's not bad at all...

@remram44
Copy link

remram44 commented Jun 3, 2020

You should probably reverse it, with a special type for encoded sequences, and just String once decoded. If you don't have strong compatibility requirements that is probably more sane from a user's point of view (you never really want those % in your strings).

@bestia-dev
Copy link

ok. I got it. I reverse it. Something like this:

let author_route = warp::path!("rust-reviews" / "author" / UrlPartUtf8String)
        .map(|author_id: UrlPartUtf8String | {
            let author_id = author_id.to_string();

Inside the project I can than use always plain strings. Only on the entry point in path! I must be careful to use a special type, that cannot mistakenly go unencoded in other parts of the project.
It is not possible to 100% enforce it, because of FromStr, but it is good enough to show intent :-)

the module:

//! url_part_utf8_string_mod

// cargo.toml:
// percent-encoding = "2.1.0"
// anyhow = "1.0.31"

use anyhow::Error;
use core::str::FromStr;
use percent_encoding::{percent_decode_str, AsciiSet, CONTROLS};
use std::string::ToString;

/// https://url.spec.whatwg.org/#fragment-percent-encode-set
const FRAGMENT: &AsciiSet = &CONTROLS.add(b' ').add(b'"').add(b'<').add(b'>').add(b'`');

/// the url must be utf 8. Only the 5 control characters are encoded.
/// url has parts or fragments or segments delimited mostly by slash /
/// every part must be encoded/decoded separately, 
/// to maintain the control character slash /
#[derive(Clone, Debug)]
pub struct UrlPartUtf8String {
    /// private inaccessible field - normal string - decoded
    s: String,
}

impl UrlPartUtf8String{
    /// constructor from decoded (normal) string
    pub fn new_from_decoded_string(s: &str) -> Self {
        UrlPartUtf8String { 
            s: s.to_string()
        }
    }
    /// get encoded string
    pub fn get_encoded_string(&self)->String{
        Self::encode_fragment(&self.s)
    }
    /// encode fragment / part - associated fn
    pub fn encode_fragment(s: &str) -> String {
        percent_encoding::utf8_percent_encode(s, FRAGMENT).to_string()
    }
}
/// implementing FromStr because of path! in warp web server router
/// it assumes that the original string is encoded
impl FromStr for UrlPartUtf8String {
    type Err = Error;
    #[inline]
    /// constructor, decodes the string from encoded str. 
    /// It can error.
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        let s = percent_decode_str(s).decode_utf8()?.to_string();
        Ok(UrlPartUtf8String { 
            s 
        })
    }
}
impl ToString for UrlPartUtf8String {
    #[inline]
    /// returns decoded string (normal string)
    fn to_string(&self) -> String {
        // return
        self.s.clone()
    }
}

@sinking-point
Copy link

No progress on this? I think Warp should decode by default. That would be least surprise, at least to me.

This is an important omission in Warp and I think it should be addressed. I can do the work myself if @seanmonstar agrees to review and merge the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants