Corrupted data URL output from <rect style='fill:%23000000;' width='100%' height='100%'/> #795

SmaugPool · 2022-09-22T00:52:48Z

Version:

data-url 0.2.0 & master

Description

The following data URL:

data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg'><rect style='fill:%23000000;' width='100%' height='100%'/></svg>

produces a corrupted result with duplicate fields and unexpected values with DataUrl::process:

<svg xmlns='http://www.w3.org/2000/svg'><rect style='fill:#000000;' width='100000000;' width='100%' height='100000000;' width='100%' height='100%'/></svg>

Note that the data URL produces a full window black rectangle on Firefox and keeps the initial SVG intact.

SSCCE

use data_url::DataUrl;    
    
static SVG: &str = r#"data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg'><rect style='fill:%23000000;' width='100%' height='100%'/></svg>"#;    
    
fn main() {    
    println!("{}", SVG);    
    let url = DataUrl::process(SVG).unwrap();    
    let (body, _) = url.decode_to_vec().unwrap();    
    let utf8 = std::str::from_utf8(&body).unwrap();    
    println!("{}", utf8);                                                                                                                                                     
}

The text was updated successfully, but these errors were encountered:

tmccombs · 2022-09-22T01:33:12Z

I'm not sure if it is the only problem, but you need to escape the spaces and % characters in your data url.

SmaugPool · 2022-09-22T08:07:42Z

I'm not sure if it is the only problem, but you need to escape the spaces and % characters in your data url.

It's a user submitted one based on browser support. It may not be correct, but this does not explain the result as far as I can tell reading the URL standard.

SmaugPool · 2022-10-02T12:18:22Z

Spaces are not the issue here. The problem can be reproduced with the following even simpler example:

use data_url::DataUrl;    
    
static URL: &str = r#"data:text/plain;utf8,100%"#;    
    
fn main() {    
    println!("{}", URL);    
    let url = DataUrl::process(URL).unwrap();    
    let (body, _) = url.decode_to_vec().unwrap();    
    let utf8 = std::str::from_utf8(&body).unwrap();    
    println!("{}", utf8);    
}

Which outputs:

data:text/plain;utf8,100%
100100%

So the % character is replaced by the whole input string (in the top example by a substring starting before the % character).

According to the URL standard:

https://url.spec.whatwg.org/#percent-encoded-bytes

If byte is not 0x25 (%), then append byte to output.

Otherwise, if byte is 0x25 (%) and the next two bytes after byte in input are not in the ranges 0x30 (0) to 0x39 (9), 0x41 (A) to 0x46 (F), and 0x61 (a) to 0x66 (f), all inclusive, append byte to output.

Here for 100%, the 0x25 (%) character is not followed by 2 bytes in range [0-9A-Fa-f], therefore if I understand correctly the spec it should be included in the output without more processing.

SmaugPool · 2022-10-02T13:05:51Z

The problem comes from the decode_without_base64 function.

When a % character is encountered, the function "Write everything (if anything) "non-special" we’ve accumulated before this special byte":

rust-url/data-url/src/lib.rs

Lines 268 to 272 in 359bc90

    
           // Write everything (if anything) "non-special" we’ve accumulated 
        
           // before this special byte 
        
           if i > slice_start { 
        
               write_bytes(&bytes[slice_start..i])?; 
        
           }

But slice_start is not updated, and because the characters following % do not match hex digits, it is not updated either in the following else condition:

rust-url/data-url/src/lib.rs

Lines 278 to 286 in 359bc90

    
           if let (Some(h), Some(l)) = (h, l) { 
        
               // '%' followed by two ASCII hex digits 
        
               let one_byte = h as u8 * 0x10 + l as u8; 
        
               write_bytes(&[one_byte])?; 
        
               slice_start = i + 3; 
        
           } else { 
        
               // Do nothing. Leave slice_start unchanged. 
        
               // The % sign will be part of the next slice. 
        
           }

Therefore when at the end, bytes from slice_start to the end are written:

rust-url/data-url/src/lib.rs

Lines 300 to 302 in 359bc90

    
               write_bytes(&bytes[slice_start..])?; 
        
               Ok(None) 
        
           }

we write again all accumulated characters, which duplicates all non-special characters accumulated.

I believe that slice_start should be set to i after writing accumulated "non-special" characters to avoid issues in later conditionals not updating it like the empty else described above:

              if i > slice_start {    
                  write_bytes(&bytes[slice_start..i])?;    
+                 slice_start = i;    
              }

An alternative would be to update it in the else condition when i > slice_start, but there is the risk of adding new conditionals not updating it later.

PS: Maybe some code could be common with PercentDecode in percent_encoding lib to avoid duplicate (and different) implementations.

When writing accumulated "non-special" characters, `slice_start` must be updated as some later conditionals/patter matches don't update it like the case when `%` is not followed by 2 hex digits. This fixes servo#795

When writing accumulated "non-special" characters, `slice_start` must be updated as some later conditionals/pattern matches don't update it like the case when `%` is not followed by 2 hex digits. This fixes servo#795

…797) When writing accumulated "non-special" characters, `slice_start` must be updated as some later conditionals/pattern matches don't update it like the case when `%` is not followed by 2 hex digits. This fixes #795

SmaugPool mentioned this issue Oct 2, 2022

Fix non-base64 data URLs with % characters not followed by hex digits #797

Merged

valenting closed this as completed in #797 Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrupted data URL output from <rect style='fill:%23000000;' width='100%' height='100%'/> #795

Corrupted data URL output from <rect style='fill:%23000000;' width='100%' height='100%'/> #795

SmaugPool commented Sep 22, 2022

tmccombs commented Sep 22, 2022

SmaugPool commented Sep 22, 2022

SmaugPool commented Oct 2, 2022 •

edited

Loading

SmaugPool commented Oct 2, 2022 •

edited

Loading

Corrupted data URL output from <rect style='fill:%23000000;' width='100%' height='100%'/> #795

Corrupted data URL output from <rect style='fill:%23000000;' width='100%' height='100%'/> #795

Comments

SmaugPool commented Sep 22, 2022

Version:

Description

SSCCE

tmccombs commented Sep 22, 2022

SmaugPool commented Sep 22, 2022

SmaugPool commented Oct 2, 2022 • edited Loading

SmaugPool commented Oct 2, 2022 • edited Loading

SmaugPool commented Oct 2, 2022 •

edited

Loading

SmaugPool commented Oct 2, 2022 •

edited

Loading