Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protection against ReDoS #6163

Merged
merged 4 commits into from Nov 7, 2019

Conversation

@stsewd
Copy link
Member

stsewd commented Sep 10, 2019

The regex module is compatible with the re module (VERSION0 flag).
It is also faster.

>>> import re
>>> import regex
>>> import timeit
>>> pattert = "(a+)+b"
>>> input = "a" * 25
>>> timeit.timeit(lambda: re.search(pattern, input), number=10)
32.332445038000515
>>> timeit.timeit(lambda: regex.search(pattern, input, flags=regex.VERSION0), number=10)
0.003861578001306043
>>> input = "a" * 10000
>>> regex.search(pattern, input, flags=regex.VERSION0, timeout=5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/stsewd/.pyenv/versions/readthedocs.org/lib/python3.6/site-packages/regex/regex.py", line 266, in search
    concurrent, partial, timeout)
TimeoutError: regex timed out

I put the timeout to 15, maybe we can drop it to 5?

@humitos

This comment has been minimized.

Copy link
Member

humitos commented Oct 2, 2019

This PR is related to #5996.

@humitos

This comment has been minimized.

Copy link
Member

humitos commented Nov 4, 2019

We decided to ship with regex (#4641 (comment)) so we should merge this PR before that PR gets merged, or merge this PR into the other first.

@humitos

This comment has been minimized.

Copy link
Member

humitos commented Nov 4, 2019

I put the timeout to 15, maybe we can drop it to 5?

Even less, should be better. Parsing a regex shouldn't take more than 1s.

The regex module is compatible with the re module (VERSION0 flag).
It is also faster.

```python
>>> import re
>>> import regex
>>> import timeit
>>> pattert = "(a+)+b"
>>> input = "a" * 25
>>> timeit.timeit(lambda: re.search(pattern, input), number=10)
32.332445038000515
>>> timeit.timeit(lambda: regex.search(pattern, input, flags=regex.VERSION0), number=10)
0.003861578001306043
>>> input = "a" * 10000
>>> regex.search(pattern, input, flags=regex.VERSION0, timeout=5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/stsewd/.pyenv/versions/readthedocs.org/lib/python3.6/site-packages/regex/regex.py", line 266, in search
    concurrent, partial, timeout)
TimeoutError: regex timed out
```
@stsewd stsewd force-pushed the stsewd:prevent-redos-attacks branch from 48e187f to 7cc0b47 Nov 6, 2019
stsewd added 3 commits Nov 6, 2019
@stsewd

This comment has been minimized.

Copy link
Member Author

stsewd commented Nov 6, 2019

Ok, I've decreased the timeout to 1 second. Another alternative is to use a finite state machine type of regex, but I wasn't able to find one lib for python...

@stsewd stsewd requested a review from readthedocs/core Nov 6, 2019
@humitos
humitos approved these changes Nov 7, 2019
@stsewd stsewd merged commit a8611aa into readthedocs:master Nov 7, 2019
2 checks passed
2 checks passed
continuous-documentation/read-the-docs Read the Docs build succeeded!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@stsewd stsewd deleted the stsewd:prevent-redos-attacks branch Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.