New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MNT] create build tool to check invalid backticks #6088
Changes from 2 commits
cf74136
039fe0c
657f7de
583b822
18740e4
2d67ce2
85b9c75
ac9bb24
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
#!/usr/bin/env python3 -u | ||
# copyright: sktime developers, BSD-3-Clause License (see LICENSE file) | ||
|
||
"""Test script to check for invalid use of single-backticks.""" | ||
|
||
__author__ = ["geetu040"] | ||
|
||
import ast | ||
import glob | ||
import re | ||
|
||
|
||
def find_py_files(folder_path): | ||
"""Find all Python files in a given folder path.""" | ||
return glob.glob(f"{folder_path}/**/*.py", recursive=True) | ||
|
||
|
||
def extract_docstrings(filename): | ||
"""Extract docstrings from a Python file.""" | ||
# create abstract syntax tree from the file | ||
with open(filename) as f: | ||
tree = ast.parse(f.read()) | ||
|
||
# walk through all nodes in the tree | ||
docstrings = {} | ||
for node in ast.walk(tree): | ||
if ( | ||
isinstance(node, ast.Expr) | ||
and isinstance(node.value, ast.Constant) | ||
and isinstance(node.value.value, str) | ||
): | ||
# if the node is an expression and | ||
# its value is a constant and | ||
# constant's value is a string | ||
# the node represents a docstring | ||
# See https://docs.python.org/3/library/ast.html#abstract-grammar | ||
docstring = node.value.value | ||
lineno = node.value.lineno | ||
docstrings[lineno] = docstring | ||
|
||
return docstrings | ||
|
||
|
||
def find_invalid_backtick_text(docstring): | ||
"""Find invalid backtick text in a docstring.""" | ||
# remove all the code blocks to avoid interference | ||
# they are allowed to use backticks in any way | ||
docstring = re.sub(r"```.*?```", "", docstring, flags=re.DOTALL) | ||
|
||
# remove all double-backticks to avoid interference | ||
# we are looking only for invalid single-backtick | ||
docstring = re.sub(r"``.*?``", "", docstring) | ||
|
||
all_backtick_text = re.findall(r"`.*?`", docstring) | ||
# expressions like :math:`d(x, y):= (x-y)^2` are valid cases | ||
valid_backtick_text = re.findall(r":.*?:(`.*?`)", docstring) | ||
|
||
# find all the invalid backtick code snippets | ||
invalid_backtick_text = [] | ||
for text in all_backtick_text: | ||
if text in valid_backtick_text: | ||
continue | ||
if text in invalid_backtick_text: | ||
continue | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just for unique texts right? Let's skip this and make |
||
invalid_backtick_text.append(text) | ||
|
||
return invalid_backtick_text | ||
|
||
|
||
def main(): | ||
"""Execute the main function of the script.""" | ||
import sktime | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This assumes |
||
|
||
results = {} | ||
|
||
# list all the python files in the project | ||
py_files = find_py_files(sktime.__path__[0]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. What about accepting a path as CLI argument, and defaulting to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this actually improves the time from |
||
|
||
for file in py_files: | ||
docstrings = extract_docstrings(file) | ||
results_on_file = {} | ||
|
||
for lineno, docstring in docstrings.items(): | ||
invalid_backtick_text = find_invalid_backtick_text(docstring) | ||
|
||
if len(invalid_backtick_text) > 0: | ||
results_on_file[lineno] = invalid_backtick_text | ||
|
||
if len(results_on_file) > 0: | ||
results[file] = results_on_file | ||
yarnabrina marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# print the lines along with the invalid backticks text | ||
if len(results) > 0: | ||
print(f"Total Files with invalid backticks: {len(results)}") # noqa | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this |
||
for filename, result in results.items(): | ||
for lineno, errors in result.items(): | ||
print(f"{filename}:{lineno} {' '.join(errors)}") # noqa | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. |
||
else: | ||
print("No invalid backticks found") # noqa | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. |
||
|
||
|
||
if __name__ == "__main__": | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two probably can be merged together, if 2 or more backticks are acceptable. Something like
r"(`{2,}).+?\1"
, but it's not tested.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this for the sake of simplicity.
The regex expressions can be merged but they become complex and unreadable when we try to avoid this situation: "regex 2 basically covers all the cases of regex 1 because it is a sub-part of it and thus the complete code-block is not substituted"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understood what you meant, but if backreference of captured group seems complex, you can keep as it is. It was only a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not using the correct
flags
, so I figured that out