Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra Newline Added to Binary File Upload #2761

Closed
mohamedlmoutaouakil opened this issue Aug 10, 2023 · 3 comments · Fixed by #2763
Closed

Extra Newline Added to Binary File Upload #2761

mohamedlmoutaouakil opened this issue Aug 10, 2023 · 3 comments · Fixed by #2763

Comments

@mohamedlmoutaouakil
Copy link

Problem:

After uploading a specific binary file, an unexpected newline character (\n) is being added to the beginning of the file. This behavior is traced back to Werkzeug's multipart request body parsing during POST requests.

Steps to Reproduce:

  1. Create a binary file named one_newline_bigger_than_64k with the following content:
with open("one_newline_bigger_than_64k", "wb") as file:
    bytes_to_write = b'X' + b'\n' + 64 * 1024 * b'c'
    file.write(bytes_to_write)
  1. Upload the one_newline_bigger_than_64k file.
  2. Compare the binary content of the original and uploaded files.

Root Cause:

This issue arises due to Werkzeug's _parse_data method. Specifically, when a newline (\n or \r) is present at the second position (index 1) of the file's content and no additional newlines are within the first 64KB of the request body, an extra newline is inadvertently introduced during parsing.

Code Flow:

  1. Werkzeug's next_event method invokes _parse_data.
  2. In _parse_data, a data_start value is set based on the file's content type.
  3. If the boundary indicator is absent in the buffer, the last_newline method is employed to locate the latest newline (\n or \r) character in the file's content. However, this method operates on a modified data slice that starts at data_start.
  4. Since data is modified, any index from last_newline needs data_start adjustment.
  5. Incorrectly adjusted indices lead to an empty byte segment when slicing data[data_start:data_end], resulting in an empty data segment.
  6. This leads to an unintended addition of a newline character in the file's content.

Solution:

To resolve this, a correction in the _parse_data method is needed. The index returned by last_newline should be adjusted by adding the data_start before slicing data.

Proposed Fix:

Update _parse_data as follows:

# Inside _parse_data method
if self.buffer.find(b"--" + self.boundary) == -1:
    # No complete boundary in buffer, but a partial boundary might exist. 
    # Find the earliest newline (nl or cr) and return data up to that.
    last_newline_idx = self.last_newline(data[data_start:])
    data_end = del_index = last_newline_idx + data_start  # Adjusted index
    more_data = True

Environment:

Python: 3.10.12
Werkzeug: 2.3.6

Resolution:

A pull request will be created to implement this fix, preventing unintended newline addition in binary file uploads. This adjustment ensures proper index handling, preventing unintended manipulation of file content during parsing.

mohamedlmoutaouakil pushed a commit to mohamedlmoutaouakil/werkzeug that referenced this issue Aug 10, 2023
mohamedlmoutaouakil pushed a commit to mohamedlmoutaouakil/werkzeug that referenced this issue Aug 11, 2023
mohamedlmoutaouakil pushed a commit to mohamedlmoutaouakil/werkzeug that referenced this issue Aug 11, 2023
mohamedlmoutaouakil pushed a commit to mohamedlmoutaouakil/werkzeug that referenced this issue Aug 11, 2023
@raulpuric

This comment was marked as off-topic.

@davidism

This comment was marked as off-topic.

@pgjones
Copy link
Member

pgjones commented Aug 12, 2023

Closed by #2763

@pgjones pgjones closed this as completed Aug 12, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants