Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPR-17459 Catering for comma being part of the media/mime type as double quoted #2010

Closed
wants to merge 8 commits into from

Conversation

dimitrisli
Copy link
Contributor

@dimitrisli dimitrisli commented Nov 6, 2018

Adding functionality to:

  1. detect if double quoted comma provided
  2. if not, then flow continues exactly as before (tokenization on commas)
  3. if yes, then calculate the indexes of of commas ignoring the double quoted ones
  4. tokenize on the above comma indexes and proceed as before

@rstoyanchev rstoyanchev self-assigned this Nov 7, 2018
@rstoyanchev
Copy link
Contributor

Parsing media types is a known hot spot and we need to be careful in how it's implemented for efficiency. The proposed solution produces more garbage than it needs to by creating a List<Integer> first, then a List<String>, and finally a String[]. It's also more complex than it needs to be.

Couldn't we simply always loop over each character from the input string along with a boolean for whether we're inside quotes? For every "," that's not inside quotes, take a substring and it to the List.

@dimitrisli
Copy link
Contributor Author

@rstoyanchev thanks for the review. Totally understand. Submitting now an update to this PR that is focused primarily on performance.

@Kuerten
Copy link

Kuerten commented Nov 8, 2018

Thank you for working on the issue.
According to my understanding of https://tools.ietf.org/html/rfc7230#section-3.2.6
there is one even stranger case, a quote inside a parameter which is possible if escaped.
Consider this: foo/bar;param="\"" (without Java Quotes) foo/bar;param=\"\\\"\" (with additional java escaping). While I don't think that is common it is a case that the old code supports so there might be someone using this. I suggest adding a test case for this.

Essentially it means that just toggling on quotes is not enough. Maybe add another boolean for escapeMode:

  • If quote is next and inside a quote but not escaped then turn quote mode off
  • If backslash is next and inside a quote and not in escape mode then toggle escape mode
  • If any char comes and in escape mode then toggle escape mode

@rstoyanchev
Copy link
Contributor

@dimitrisli this looks much better. I just noticed we have a similar top level method in MimeTypeUtils.

@Kuerten I can't find any evidence of the existing code supporting an escaped quote inside a quoted parameter. In fact I don't think it works like that now. Did you actually see that anywhere? In any case, it's worth adding such test cases while we are at it.

…oted

Applying the same logic on the MimeTypeUtils#parseMimeTypes() parsing
from a string a tokenized list of mime types
@dimitrisli
Copy link
Contributor Author

@Kuerten thanks for the detailed description of the additional case.

@rstoyanchev good spotting. I've amended accordingly also adding similar test cases.

Regarding the additional case of escaped quote within a quoted param - sorry I might have got this wrong but it's not clear to me how this can be covered simply with additional tests without also a code change on the parseMediaTypes()/parseMimeTypes() methods? Wouldn't we need to add the logic to ignore escaped quotes while keeping track of the isQuoted boolean?

@rstoyanchev
Copy link
Contributor

rstoyanchev commented Nov 9, 2018

I think the idea is to add the test case and to make the change so the tests will pass.

…ble quoted

Supporting escaped doube quotes within double quotes
@dimitrisli dimitrisli changed the title SPR-17459 Catering for comma being part of the media type as double quoted SPR-17459 Catering for comma being part of the media/mime type as double quoted Nov 11, 2018
…ble quoted

removing unecessary check of i > 0 since isQuoted cannot be true on
first iteration (i==0). Consequently on first iteration this if-branch
cannot be entered guarateeing i > 0. This way the i-1 check cannot be
out of bound.
@dimitrisli
Copy link
Contributor Author

@rstoyanchev @Kuerten I've added support for escaped double quotes within double quotes - please review.

Essentially instead of maintaining another boolean I'm checking for double quotes character while isQuoted is true (=being within double quotes) and in such case if the previous character is the escape character '\'. In this case I'm ignoring the escaped double quote (=not flipping the isQuoted flag).

One detail: the i-1 character check I'm doing looking for the escape character '\' doesn't happen on the first iteration since isQuoted is checked first and it cannot be true on first iteration (i==0).

@Kuerten
Copy link

Kuerten commented Nov 13, 2018

Thanks for adding the cases. I think there is (at least) one more case you could look at: Assume I want to have a literal backslash as the parameter value. Since the backslash is not a tchar I have to put it in quotes. As it is not a qdtext I have to escape it. Essentially I get foo/bar;param="\\" (or as Java: "foo/bar;param=\"\\\\\"".
I believe that this case will not work currently.

…ble quoted

Adding tests for escaped backslash
@dimitrisli
Copy link
Contributor Author

@Kuerten thanks for that. I've added tests to reflect this exact case and it seems to be passing without needing a change. The reason is that the backslash is considered only as a previous-character-check if the current character is a double quote. It would have been susceptible to this case if there was a boolean to toggle and keep track of the escaped backslash I believe.

Please review

@rstoyanchev
Copy link
Contributor

I've processed this PR with some further modifications, notably creating a MimeTypeUtils#tokenize() method that can be re-used for MimeType and MediaType, as well as changing the algorithm slightly to skip over escaped characters rather than look back.

@dimitrisli thank you for taking the time to submit this contribution and move the ticket forward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants