Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted data URL output from <rect style='fill:%23000000;' width='100%' height='100%'/> #795

Closed
SmaugPool opened this issue Sep 22, 2022 · 4 comments · Fixed by #797
Closed

Comments

@SmaugPool
Copy link
Contributor

Version:

data-url 0.2.0 & master

Description

The following data URL:

data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg'><rect style='fill:%23000000;' width='100%' height='100%'/></svg>

produces a corrupted result with duplicate fields and unexpected values with DataUrl::process:

<svg xmlns='http://www.w3.org/2000/svg'><rect style='fill:#000000;' width='100000000;' width='100%' height='100000000;' width='100%' height='100%'/></svg>

Note that the data URL produces a full window black rectangle on Firefox and keeps the initial SVG intact.

SSCCE

use data_url::DataUrl;    
    
static SVG: &str = r#"data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg'><rect style='fill:%23000000;' width='100%' height='100%'/></svg>"#;    
    
fn main() {    
    println!("{}", SVG);    
    let url = DataUrl::process(SVG).unwrap();    
    let (body, _) = url.decode_to_vec().unwrap();    
    let utf8 = std::str::from_utf8(&body).unwrap();    
    println!("{}", utf8);                                                                                                                                                     
} 
@tmccombs
Copy link
Contributor

I'm not sure if it is the only problem, but you need to escape the spaces and % characters in your data url.

@SmaugPool
Copy link
Contributor Author

I'm not sure if it is the only problem, but you need to escape the spaces and % characters in your data url.

It's a user submitted one based on browser support. It may not be correct, but this does not explain the result as far as I can tell reading the URL standard.

@SmaugPool
Copy link
Contributor Author

SmaugPool commented Oct 2, 2022

Spaces are not the issue here. The problem can be reproduced with the following even simpler example:

use data_url::DataUrl;    
    
static URL: &str = r#"data:text/plain;utf8,100%"#;    
    
fn main() {    
    println!("{}", URL);    
    let url = DataUrl::process(URL).unwrap();    
    let (body, _) = url.decode_to_vec().unwrap();    
    let utf8 = std::str::from_utf8(&body).unwrap();    
    println!("{}", utf8);    
}  

Which outputs:

data:text/plain;utf8,100%
100100%

So the % character is replaced by the whole input string (in the top example by a substring starting before the % character).

According to the URL standard:

https://url.spec.whatwg.org/#percent-encoded-bytes

  1. If byte is not 0x25 (%), then append byte to output.

  2. Otherwise, if byte is 0x25 (%) and the next two bytes after byte in input are not in the ranges 0x30 (0) to 0x39 (9), 0x41 (A) to 0x46 (F), and 0x61 (a) to 0x66 (f), all inclusive, append byte to output.

Here for 100%, the 0x25 (%) character is not followed by 2 bytes in range [0-9A-Fa-f], therefore if I understand correctly the spec it should be included in the output without more processing.

@SmaugPool
Copy link
Contributor Author

SmaugPool commented Oct 2, 2022

The problem comes from the decode_without_base64 function.

When a % character is encountered, the function "Write everything (if anything) "non-special" we’ve accumulated before this special byte":

// Write everything (if anything) "non-special" we’ve accumulated
// before this special byte
if i > slice_start {
write_bytes(&bytes[slice_start..i])?;
}

But slice_start is not updated, and because the characters following % do not match hex digits, it is not updated either in the following else condition:

if let (Some(h), Some(l)) = (h, l) {
// '%' followed by two ASCII hex digits
let one_byte = h as u8 * 0x10 + l as u8;
write_bytes(&[one_byte])?;
slice_start = i + 3;
} else {
// Do nothing. Leave slice_start unchanged.
// The % sign will be part of the next slice.
}

Therefore when at the end, bytes from slice_start to the end are written:

write_bytes(&bytes[slice_start..])?;
Ok(None)
}

we write again all accumulated characters, which duplicates all non-special characters accumulated.

I believe that slice_start should be set to i after writing accumulated "non-special" characters to avoid issues in later conditionals not updating it like the empty else described above:

              if i > slice_start {    
                  write_bytes(&bytes[slice_start..i])?;    
+                 slice_start = i;    
              } 

An alternative would be to update it in the else condition when i > slice_start, but there is the risk of adding new conditionals not updating it later.

PS: Maybe some code could be common with PercentDecode in percent_encoding lib to avoid duplicate (and different) implementations.

SmaugPool added a commit to nftcdn/rust-url that referenced this issue Oct 2, 2022
When writing accumulated "non-special" characters, `slice_start` must be
updated as some later conditionals/patter matches don't update it like
the case when `%` is not followed by 2 hex digits.

This fixes servo#795
SmaugPool added a commit to nftcdn/rust-url that referenced this issue Oct 2, 2022
When writing accumulated "non-special" characters, `slice_start` must be
updated as some later conditionals/pattern matches don't update it like
the case when `%` is not followed by 2 hex digits.

This fixes servo#795
SmaugPool added a commit to nftcdn/rust-url that referenced this issue Jul 11, 2023
When writing accumulated "non-special" characters, `slice_start` must be
updated as some later conditionals/pattern matches don't update it like
the case when `%` is not followed by 2 hex digits.

This fixes servo#795
SmaugPool added a commit to nftcdn/rust-url that referenced this issue Jul 11, 2023
When writing accumulated "non-special" characters, `slice_start` must be
updated as some later conditionals/pattern matches don't update it like
the case when `%` is not followed by 2 hex digits.

This fixes servo#795
SmaugPool added a commit to nftcdn/rust-url that referenced this issue Jul 11, 2023
When writing accumulated "non-special" characters, `slice_start` must be
updated as some later conditionals/pattern matches don't update it like
the case when `%` is not followed by 2 hex digits.

This fixes servo#795
SmaugPool added a commit to nftcdn/rust-url that referenced this issue Jul 11, 2023
When writing accumulated "non-special" characters, `slice_start` must be
updated as some later conditionals/pattern matches don't update it like
the case when `%` is not followed by 2 hex digits.

This fixes servo#795
SmaugPool added a commit to nftcdn/rust-url that referenced this issue Jul 12, 2023
When writing accumulated "non-special" characters, `slice_start` must be
updated as some later conditionals/pattern matches don't update it like
the case when `%` is not followed by 2 hex digits.

This fixes servo#795
SmaugPool added a commit to nftcdn/rust-url that referenced this issue Jul 12, 2023
When writing accumulated "non-special" characters, `slice_start` must be
updated as some later conditionals/pattern matches don't update it like
the case when `%` is not followed by 2 hex digits.

This fixes servo#795
github-merge-queue bot pushed a commit that referenced this issue Mar 4, 2024
…797)

When writing accumulated "non-special" characters, `slice_start` must be
updated as some later conditionals/pattern matches don't update it like
the case when `%` is not followed by 2 hex digits.

This fixes #795
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants