Add sanitization to parseFilename function #215

ozcodex · 2019-09-18T18:59:09Z

When a form is sent with a file whose name contains special characters, this name is not encoded, the match function is not able to resolve the header value and returns null, causing the file not to be interpreted correctly. By removing the special characters (sanitize the value) the function continues to work correctly.

ozcodex · 2019-09-18T19:09:26Z

This also can be solved encoding the filename before runing the match function instead of removing the special characters.

dougwilson

Thank you for your contribution!

I'm not sure I understand what you're saying is an issue here. Ideally this module is presenting you with the actual file name as specified by the client. Stripping characters the client supplied is not a typical design goal. Maybe it would help if you can explain what the issue is that you are encountering is and maybe an example of the issue we can run and see?

The PR itself would at minimum need at least one test to validate the behavior to land, of course. But I am just noting this, because I still don't understand the motivation of the change, as on the surface it seems incorrect, so probably start with the explanation of the issue you are having and code to demonstrate the issue to help drive the next step.

ozcodex · 2019-09-18T19:54:32Z

Yep, of course, here is a brief explanation:

I have a file, the name of this file contains a unicode character (in this case U+2028)
I have a form, this form is sent by POST using "multipart/form-data" to a backend in express.
When i send a regular file, the middleware puts my file in req.files and everything works well.
When i send my file with the unicode char, the middleware puts the content of the file (binary) as a input field in req.body

I was tracing the bug for at least 1 week, and discover this is what I discover:

The function onParseHeadersEnd (index.js:489) do a comparison against the property "filename" of the object self.destStream to determine if the current data is a file or a field.
The reason why the filename is null even if the file have a name is because the method match in the function parseFilename (index:771) is returning null
I tryed the regular expression outside (regex101.com), and recognize the filename in the header even with the special character, so the problem is not with the regular expression.
I tryed to sanitize (removing special characters) from the header before parse the file name and everyting worked as expected.

for that reason i send this PR 😄

dougwilson · 2019-09-18T19:59:29Z

Gotcha, thanks for the explanation. Wouldn't this pull request then no longer have the U+2028 character in the file name? I am assuming that due to the change stripping bytes from the input. I would think we would what to preserve the actual filename and just fix the parsing rather than discard those characters.

ozcodex · 2019-09-18T20:06:54Z

ok, i will play a bit with that idea, and I'll share what happens to you

The 'u' flag indicate to regex to match unicode characters. The 's' flag allows .* (dot-star) regex to match line-breaker characters, like U+2028 or other special characters, allowing the regular expression to work with it.

ozcodex · 2019-09-18T20:34:51Z

Douglas, that was easier than i think, the special character that is causing troubles to me is "Line Separator", for that reason the match function took the input string as multiple lines, making 0 matches, and returning null. (whithout the 's' flag the regex .* (dot-star) dont match line-breaker characters)

dougwilson · 2019-09-18T20:41:27Z

Awesome, that is great to hear! So now ideally we'd want to get a test added that would test the case you had so we can make sure this bug doesn't come back up again :) and your site will keep working with this fix.

You're welcome to add a test to the suite, or if you're not sure, you're also welcome to put together a step by step guide on how to test this manually, which I can help turn into an automated test to add 👍

ozcodex · 2019-09-18T20:57:31Z

Oh, the last changes have conflicts with other tests..... that is a problem, I guest.

The only thing that you have to take in consideration for test this problem, is add a file to the form with this header:
Content-Disposition form-data; name="resume"; filename="2028 .pdf"

the content really dont matters, and, probably this text editor will remove the special character. but the unicode char is between the 2028 and the .pdf

dougwilson · 2019-09-18T21:19:16Z

Ok, I will try to check out why your PR causes existing tests to fail. For your file name, which web browser is creating the header like that? This way I can put the test in the appropriate place.

ozcodex · 2019-09-19T16:54:23Z

The browser that is creating that header is Chrome, latest version

panzy · 2019-11-01T07:07:33Z

Hi guys, I have a similar problem so I'd like to post what I've found, in the hope it helps to make things more clear.

Suppose I'm uploading a file with a very long name.

multiparty will parse this part as a field, not a file:
(In this case, it's easy to encounter a "maxFieldsSize exceed" error)

--a700da2b-2a31-443b-8a8c-a8430e0dbad1
Content-Disposition: form-data; name="audio"; filename="rtc_cGFuenk=
_UVpsUHQxM2ZmMFF5QnhPeUFBQUY=
_7807752c-ffca-42a9-8bf2-4a3355827ead_270.audio"
Content-Type: application/octet-stream
Content-Length: 129509

...

This part will be a file:

--93755f8e-3b9b-4fa5-92fe-780d1f718f8f
Content-Disposition: form-data; name="video"; filename="rtc_cGFuenk=%0A_UVpsUHQxM2ZmMFF5QnhPeUFBQUY=%0A_7807752c-ffca-42a9-8bf2-4a3355827ead_270-0.dat"
Content-Type: multipart/form-data
Content-Length: 2032906

...

The key difference is that the line feeds in filename are encoded as %0A in the second request. If I understood correctly, RFC 822 allows line feeds in quoted string?

ozcodex · 2020-04-13T14:49:34Z

Im not longer working on this, sorry guys.

ozcodex added 2 commits September 18, 2019 13:57

Change Typo '.'->';' in function parseFilename

cda5c74

Add new var with the value of the sanitized param

992b7c8

dougwilson suggested changes Sep 18, 2019

View reviewed changes

dougwilson added awaiting more info needs tests pr labels Sep 18, 2019

Remove sanitization. Add 's' & 'u' flags to regex

0cc6122

The 'u' flag indicate to regex to match unicode characters. The 's' flag allows .* (dot-star) regex to match line-breaker characters, like U+2028 or other special characters, allowing the regular expression to work with it.

Update index.js

3637dcc

ozcodex closed this Apr 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sanitization to parseFilename function #215

Add sanitization to parseFilename function #215

ozcodex commented Sep 18, 2019

ozcodex commented Sep 18, 2019

dougwilson left a comment

ozcodex commented Sep 18, 2019

dougwilson commented Sep 18, 2019

ozcodex commented Sep 18, 2019

ozcodex commented Sep 18, 2019

dougwilson commented Sep 18, 2019

ozcodex commented Sep 18, 2019

dougwilson commented Sep 18, 2019

ozcodex commented Sep 19, 2019

panzy commented Nov 1, 2019

ozcodex commented Apr 13, 2020

Add sanitization to parseFilename function #215

Add sanitization to parseFilename function #215

Conversation

ozcodex commented Sep 18, 2019

ozcodex commented Sep 18, 2019

dougwilson left a comment

Choose a reason for hiding this comment

ozcodex commented Sep 18, 2019

dougwilson commented Sep 18, 2019

ozcodex commented Sep 18, 2019

ozcodex commented Sep 18, 2019

dougwilson commented Sep 18, 2019

ozcodex commented Sep 18, 2019

dougwilson commented Sep 18, 2019

ozcodex commented Sep 19, 2019

panzy commented Nov 1, 2019

ozcodex commented Apr 13, 2020