Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sanitization to parseFilename function #215

Closed
wants to merge 5 commits into from
Closed

Add sanitization to parseFilename function #215

wants to merge 5 commits into from

Conversation

ozcodex
Copy link

@ozcodex ozcodex commented Sep 18, 2019

When a form is sent with a file whose name contains special characters, this name is not encoded, the match function is not able to resolve the header value and returns null, causing the file not to be interpreted correctly. By removing the special characters (sanitize the value) the function continues to work correctly.

When a form is sent with a file whose name contains special characters, this name is not encoded, the match function is not able to resolve the header value and returns null, causing the file not to be interpreted correctly. By removing the special characters (sanitize the value) the function continues to work correctly.
@ozcodex
Copy link
Author

ozcodex commented Sep 18, 2019

This also can be solved encoding the filename before runing the match function instead of removing the special characters.

Copy link
Contributor

@dougwilson dougwilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

I'm not sure I understand what you're saying is an issue here. Ideally this module is presenting you with the actual file name as specified by the client. Stripping characters the client supplied is not a typical design goal. Maybe it would help if you can explain what the issue is that you are encountering is and maybe an example of the issue we can run and see?

The PR itself would at minimum need at least one test to validate the behavior to land, of course. But I am just noting this, because I still don't understand the motivation of the change, as on the surface it seems incorrect, so probably start with the explanation of the issue you are having and code to demonstrate the issue to help drive the next step.

@ozcodex
Copy link
Author

ozcodex commented Sep 18, 2019

Yep, of course, here is a brief explanation:

  • I have a file, the name of this file contains a unicode character (in this case U+2028)
  • I have a form, this form is sent by POST using "multipart/form-data" to a backend in express.
  • When i send a regular file, the middleware puts my file in req.files and everything works well.
  • When i send my file with the unicode char, the middleware puts the content of the file (binary) as a input field in req.body

I was tracing the bug for at least 1 week, and discover this is what I discover:

  • The function onParseHeadersEnd (index.js:489) do a comparison against the property "filename" of the object self.destStream to determine if the current data is a file or a field.
  • The reason why the filename is null even if the file have a name is because the method match in the function parseFilename (index:771) is returning null
  • I tryed the regular expression outside (regex101.com), and recognize the filename in the header even with the special character, so the problem is not with the regular expression.
  • I tryed to sanitize (removing special characters) from the header before parse the file name and everyting worked as expected.

for that reason i send this PR 😄

@dougwilson
Copy link
Contributor

Gotcha, thanks for the explanation. Wouldn't this pull request then no longer have the U+2028 character in the file name? I am assuming that due to the change stripping bytes from the input. I would think we would what to preserve the actual filename and just fix the parsing rather than discard those characters.

@ozcodex
Copy link
Author

ozcodex commented Sep 18, 2019

ok, i will play a bit with that idea, and I'll share what happens to you

The 'u' flag indicate to regex to match unicode characters. The 's' flag allows .* (dot-star) regex to match line-breaker characters, like U+2028 or other special characters, allowing the regular expression to work with it.
@ozcodex
Copy link
Author

ozcodex commented Sep 18, 2019

Douglas, that was easier than i think, the special character that is causing troubles to me is "Line Separator", for that reason the match function took the input string as multiple lines, making 0 matches, and returning null. (whithout the 's' flag the regex .* (dot-star) dont match line-breaker characters)

@dougwilson
Copy link
Contributor

Awesome, that is great to hear! So now ideally we'd want to get a test added that would test the case you had so we can make sure this bug doesn't come back up again :) and your site will keep working with this fix.

You're welcome to add a test to the suite, or if you're not sure, you're also welcome to put together a step by step guide on how to test this manually, which I can help turn into an automated test to add 👍

@ozcodex
Copy link
Author

ozcodex commented Sep 18, 2019

Oh, the last changes have conflicts with other tests..... that is a problem, I guest.

The only thing that you have to take in consideration for test this problem, is add a file to the form with this header:
Content-Disposition form-data; name="resume"; filename="2028
.pdf"

the content really dont matters, and, probably this text editor will remove the special character. but the unicode char is between the 2028 and the .pdf

@dougwilson
Copy link
Contributor

Ok, I will try to check out why your PR causes existing tests to fail. For your file name, which web browser is creating the header like that? This way I can put the test in the appropriate place.

@ozcodex
Copy link
Author

ozcodex commented Sep 19, 2019

The browser that is creating that header is Chrome, latest version

@panzy
Copy link

panzy commented Nov 1, 2019

Hi guys, I have a similar problem so I'd like to post what I've found, in the hope it helps to make things more clear.

Suppose I'm uploading a file with a very long name.

multiparty will parse this part as a field, not a file:
(In this case, it's easy to encounter a "maxFieldsSize exceed" error)

--a700da2b-2a31-443b-8a8c-a8430e0dbad1
Content-Disposition: form-data; name="audio"; filename="rtc_cGFuenk=
_UVpsUHQxM2ZmMFF5QnhPeUFBQUY=
_7807752c-ffca-42a9-8bf2-4a3355827ead_270.audio"
Content-Type: application/octet-stream
Content-Length: 129509

...

This part will be a file:

--93755f8e-3b9b-4fa5-92fe-780d1f718f8f
Content-Disposition: form-data; name="video"; filename="rtc_cGFuenk=%0A_UVpsUHQxM2ZmMFF5QnhPeUFBQUY=%0A_7807752c-ffca-42a9-8bf2-4a3355827ead_270-0.dat"
Content-Type: multipart/form-data
Content-Length: 2032906

...

The key difference is that the line feeds in filename are encoded as %0A in the second request. If I understood correctly, RFC 822 allows line feeds in quoted string?

@ozcodex
Copy link
Author

ozcodex commented Apr 13, 2020

Im not longer working on this, sorry guys.

@ozcodex ozcodex closed this Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants