Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse auth headers using pyparsing instead of regexp #182

Merged
merged 1 commit into from Feb 6, 2021
Merged

Conversation

@temoto
Copy link
Member

@temoto temoto commented Jan 9, 2021

Fixes CPU burn DoS by cubic complexity of whitespace matching in
WWW_AUTH_RELAXED (default) regexp.

@temoto temoto requested review from cdent and dims Jan 9, 2021
@temoto
Copy link
Member Author

@temoto temoto commented Jan 9, 2021

@b-c-ds please take a look too, I can't add you to reviewers.

@temoto temoto force-pushed the www-auth-pyparsing branch from b9a3c22 to 987e995 Jan 9, 2021
@temoto
Copy link
Member Author

@temoto temoto commented Jan 9, 2021

minor update in script/release to read __version__ in source without installing pyparsing

@codecov
Copy link

@codecov codecov bot commented Jan 9, 2021

Codecov Report

Merging #182 (bd9ee25) into master (33090ab) will increase coverage by 0.10%.
The diff coverage is 85.10%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #182      +/-   ##
==========================================
+ Coverage   76.16%   76.26%   +0.10%     
==========================================
  Files           8       12       +4     
  Lines        2618     2667      +49     
==========================================
+ Hits         1994     2034      +40     
- Misses        624      633       +9     
Impacted Files Coverage Δ
python3/httplib2/__init__.py 83.66% <75.43%> (-0.84%) ⬇️
python2/httplib2/__init__.py 83.20% <78.57%> (-0.39%) ⬇️
python2/httplib2/error.py 88.00% <88.00%> (ø)
python2/httplib2/auth.py 91.66% <91.66%> (ø)
python3/httplib2/auth.py 91.66% <91.66%> (ø)
python3/httplib2/error.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33090ab...bd9ee25. Read the comment docs.

@b-c-ds
Copy link

@b-c-ds b-c-ds commented Jan 9, 2021

@temoto I'd never heard of the pyparsing module before... seems like a great alternative to regex. Looks good to me. 👍

Fixes CPU burn DoS by cubic complexity of whitespace matching in
WWW_AUTH_RELAXED (default) regexp.
@temoto temoto force-pushed the www-auth-pyparsing branch from 987e995 to bd9ee25 Jan 27, 2021
@b-c-ds
Copy link

@b-c-ds b-c-ds commented Feb 4, 2021

Hello, do you know when you are planning to merge this?

@temoto temoto merged commit bd9ee25 into master Feb 6, 2021
4 checks passed
@github-pages github-pages bot temporarily deployed to github-pages Feb 6, 2021 Inactive
@temoto temoto deleted the www-auth-pyparsing branch Feb 6, 2021
@temoto
Copy link
Member Author

@temoto temoto commented Feb 8, 2021

@b-c-ds hey, thanks for reminder, I was waiting for more reviews. This fix is released in 0.19.0.

ptmcg
Copy link
Contributor

ptmcg commented on bd9ee25 Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyparsing is not exactly fast. I believe you would get better performance using a pp.Word('=') in place of the pp.ZeroOrMore("=") . Note that this will use a regex internally.
Also, pyparsing's whitespace skipping will accept token68's that look like "0 = = =". If that is not permissable, then use leaveWhitespace() on the second term in this expression (either ZeroOrMore or Word).

ptmcg
Copy link
Contributor

ptmcg commented on bd9ee25 Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'^' operators can also be expensive, since they always check all alternatives, and then select the longest. I think you will get suitable and slightly better behavior if you use (params("params") | token68("token")) instead. (Depends on how often you get a params in the input string).

ptmcg
Copy link
Contributor

ptmcg commented on bd9ee25 Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyparsing 2.4.7 is Py2 and Py3 compatible. It may be worth splitting the parser out to a separate module that both your P2 and Py3 versions import.

ptmcg
Copy link
Contributor

ptmcg commented on bd9ee25 Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is any ambiguity between token and quoted_string such that you need "^" operator. I think you can safely replace with "|", and gain some parse-time performance.

ptmcg
Copy link
Contributor

ptmcg commented on bd9ee25 Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad pyparsing was useful to you in resolving this. I had to fix similar regex parsing issues (runaway backtracking) in some of pyparsing's internal regexen, so I can definitely empathize!

temoto
Copy link
Member

temoto commented on bd9ee25 Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptmcg thanks. Does it need to be pp.Optional(pp.Word("=")) to accept string with zero length suffix?

temoto
Copy link
Member

temoto commented on bd9ee25 Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, improvement is 940us -> 730us (~20%) for params.parseString of a realistic digest auth params. Thank you.

temoto
Copy link
Member

temoto commented on bd9ee25 Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works only with params first and successfully produces wrong results with token first. Maybe syntax is genuinely ambiguous or my implementation is bad. Improvement is 1220us -> 630us (~50%) for challenge.parseString on synthetic complex line. Thank you.

temoto
Copy link
Member

temoto commented on bd9ee25 Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mean improvement on realistic and synthetic complex headers is around 50%. Thank you very much.

temoto
Copy link
Member

temoto commented on bd9ee25 Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, we're committed to drop Python2 support altogether and then will use something like pyparsing>=3 to put lesser version constraint on projects.

ptmcg
Copy link
Contributor

ptmcg commented on bd9ee25 Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants