-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sanitization to parseFilename function #215
Conversation
When a form is sent with a file whose name contains special characters, this name is not encoded, the match function is not able to resolve the header value and returns null, causing the file not to be interpreted correctly. By removing the special characters (sanitize the value) the function continues to work correctly.
This also can be solved encoding the filename before runing the match function instead of removing the special characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
I'm not sure I understand what you're saying is an issue here. Ideally this module is presenting you with the actual file name as specified by the client. Stripping characters the client supplied is not a typical design goal. Maybe it would help if you can explain what the issue is that you are encountering is and maybe an example of the issue we can run and see?
The PR itself would at minimum need at least one test to validate the behavior to land, of course. But I am just noting this, because I still don't understand the motivation of the change, as on the surface it seems incorrect, so probably start with the explanation of the issue you are having and code to demonstrate the issue to help drive the next step.
Yep, of course, here is a brief explanation:
I was tracing the bug for at least 1 week, and discover this is what I discover:
for that reason i send this PR 😄 |
Gotcha, thanks for the explanation. Wouldn't this pull request then no longer have the U+2028 character in the file name? I am assuming that due to the change stripping bytes from the input. I would think we would what to preserve the actual filename and just fix the parsing rather than discard those characters. |
ok, i will play a bit with that idea, and I'll share what happens to you |
The 'u' flag indicate to regex to match unicode characters. The 's' flag allows .* (dot-star) regex to match line-breaker characters, like U+2028 or other special characters, allowing the regular expression to work with it.
Douglas, that was easier than i think, the special character that is causing troubles to me is "Line Separator", for that reason the match function took the input string as multiple lines, making 0 matches, and returning null. (whithout the 's' flag the regex .* (dot-star) dont match line-breaker characters) |
Awesome, that is great to hear! So now ideally we'd want to get a test added that would test the case you had so we can make sure this bug doesn't come back up again :) and your site will keep working with this fix. You're welcome to add a test to the suite, or if you're not sure, you're also welcome to put together a step by step guide on how to test this manually, which I can help turn into an automated test to add 👍 |
Oh, the last changes have conflicts with other tests..... that is a problem, I guest. The only thing that you have to take in consideration for test this problem, is add a file to the form with this header: the content really dont matters, and, probably this text editor will remove the special character. but the unicode char is between the |
Ok, I will try to check out why your PR causes existing tests to fail. For your file name, which web browser is creating the header like that? This way I can put the test in the appropriate place. |
The browser that is creating that header is Chrome, latest version |
Hi guys, I have a similar problem so I'd like to post what I've found, in the hope it helps to make things more clear. Suppose I'm uploading a file with a very long name. multiparty will parse this part as a field, not a file:
This part will be a file:
The key difference is that the line feeds in filename are encoded as %0A in the second request. If I understood correctly, RFC 822 allows line feeds in quoted string? |
Im not longer working on this, sorry guys. |
When a form is sent with a file whose name contains special characters, this name is not encoded, the match function is not able to resolve the header value and returns null, causing the file not to be interpreted correctly. By removing the special characters (sanitize the value) the function continues to work correctly.