Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocTestFinder.find fails if module contains class objects with special characters #107715

Closed
2 tasks done
gertjanvanzwieten opened this issue Aug 7, 2023 · 7 comments · Fixed by #107716 or #107726
Closed
2 tasks done
Labels
type-bug An unexpected behavior, bug, or error

Comments

@gertjanvanzwieten
Copy link
Contributor

gertjanvanzwieten commented Aug 7, 2023

Bug report

Checklist

  • I am confident this is a bug in CPython, not a bug in a third-party project
  • I have searched the CPython issue tracker, and am confident this bug has not been reported before

A clear and concise description of the bug

Doctests initialization may fail with re.error in case a module contains (dynamically generated) class objects with special characters in their name. This is caused by the regular expression compiled in doctest.DocTestFinder._find_lineno failing to escape the class name before embedding it. If compilation does succeed, then invalid matches may result from interpreting the class name as a search pattern.

Your environment

  • CPython versions tested on: 3.9.2
  • Operating system and architecture: Debian linux amd64

Linked PRs

@serhiy-storchaka
Copy link
Member

Did this happen in the real world?

Your PR looks good to me, but there may be a better solution of this problem. If the class was dynamically generated, the class defining statement cannot be found in the source code. It is better to not even try to search.

BTW, re.escape() will fail if __name__ is not a string, so there is small chance of regression.

@gertjanvanzwieten
Copy link
Contributor Author

I ran into this for our physical units framework, in which dimensions are represented by classes and quantities by instances thereof, and operations on the quantities result in dynamically created classes representing the resulting dimension. The special characters in the automatically generated names represent the operation: e.g. L*M/T2 for force. While we could use any other naming scheme to avoid triggering this issue, we prefer to keep the current one for readability during live inspection.

It is true that the line number will not be found. This is fine, we just need to get rid of the exception. Avoiding the search altogether would be equally fine if there is a way to identify dynamically created classes. However, we though that embedding an unescaped string in a regular expression is considered bad practice in general, which would suggest that the proposed fix is an improvement regardless. I see the issue with __name__ potentially being not a string, but I am not aware of a mechanism for creating this situation while still testing true for inspect.isclass.

serhiy-storchaka pushed a commit that referenced this issue Aug 7, 2023
This patch escapes the class name before embedding it in the regular expression
for `pat` in `doctest.DocTestFinder._find_lineno`. While class names do not
ordinarily contain special characters, it is possible to encounter these when a
class is created dynamically. Escaping the name will correctly return `None` in
this scenario, rather than potentially matching a different class or raising
`re.error` depending on the symbols used.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 7, 2023
…7716)

This patch escapes the class name before embedding it in the regular expression
for `pat` in `doctest.DocTestFinder._find_lineno`. While class names do not
ordinarily contain special characters, it is possible to encounter these when a
class is created dynamically. Escaping the name will correctly return `None` in
this scenario, rather than potentially matching a different class or raising
`re.error` depending on the symbols used.
(cherry picked from commit 8579327)

Co-authored-by: Gertjan van Zwieten <git@gjvz.nl>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 7, 2023
…7716)

This patch escapes the class name before embedding it in the regular expression
for `pat` in `doctest.DocTestFinder._find_lineno`. While class names do not
ordinarily contain special characters, it is possible to encounter these when a
class is created dynamically. Escaping the name will correctly return `None` in
this scenario, rather than potentially matching a different class or raising
`re.error` depending on the symbols used.
(cherry picked from commit 8579327)

Co-authored-by: Gertjan van Zwieten <git@gjvz.nl>
@serhiy-storchaka
Copy link
Member

I agree that embedding an unescaped and not validated string in a regular expression is not a good practice. It could be validated with str.isidentifier(), but more complex code would need tests. Also the surrounding code cries for refactoring. I cannot require it from you.

Thank you for your contribution @gertjanvanzwieten.

@gertjanvanzwieten
Copy link
Contributor Author

Thanks for merging!

@terryjreedy
Copy link
Member

The danger of closing before the backports are merged is that the backports may fail for weird reasons. The CLA bot failed again on both backports. Gertjan, try reclicking as you did before. I added a note about the flakey bot to the coredev discord.

@gertjanvanzwieten
Copy link
Contributor Author

Done!

serhiy-storchaka pushed a commit that referenced this issue Aug 12, 2023
…GH-107727)

This patch escapes the class name before embedding it in the regular expression
for `pat` in `doctest.DocTestFinder._find_lineno`. While class names do not
ordinarily contain special characters, it is possible to encounter these when a
class is created dynamically. Escaping the name will correctly return `None` in
this scenario, rather than potentially matching a different class or raising
`re.error` depending on the symbols used.
(cherry picked from commit 8579327)

Co-authored-by: Gertjan van Zwieten <git@gjvz.nl>
Yhg1s pushed a commit that referenced this issue Aug 16, 2023
…#107726)

* gh-107715: Escape class name in regular expression (GH-107716)

This patch escapes the class name before embedding it in the regular expression
for `pat` in `doctest.DocTestFinder._find_lineno`. While class names do not
ordinarily contain special characters, it is possible to encounter these when a
class is created dynamically. Escaping the name will correctly return `None` in
this scenario, rather than potentially matching a different class or raising
`re.error` depending on the symbols used.
(cherry picked from commit 8579327)

Co-authored-by: Gertjan van Zwieten <git@gjvz.nl>

* Update 2023-08-07-14-12-07.gh-issue-107715.238r2f.rst

---------

Co-authored-by: Gertjan van Zwieten <git@gjvz.nl>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@hugovk
Copy link
Member

hugovk commented Nov 9, 2023

Thanks!

@hugovk hugovk closed this as completed Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
4 participants