Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT] create build tool to check invalid backticks #6088

Merged
merged 8 commits into from Mar 20, 2024
103 changes: 103 additions & 0 deletions build_tools/check_backticks.py
@@ -0,0 +1,103 @@
#!/usr/bin/env python3 -u
# copyright: sktime developers, BSD-3-Clause License (see LICENSE file)

"""Test script to check for invalid use of single-backticks."""

__author__ = ["geetu040"]

import ast
import glob
import re


def find_py_files(folder_path):
"""Find all Python files in a given folder path."""
return glob.glob(f"{folder_path}/**/*.py", recursive=True)


def extract_docstrings(filename):
"""Extract docstrings from a Python file."""
# create abstract syntax tree from the file
with open(filename) as f:
tree = ast.parse(f.read())

# walk through all nodes in the tree
docstrings = {}
for node in ast.walk(tree):
if (
isinstance(node, ast.Expr)
and isinstance(node.value, ast.Constant)
and isinstance(node.value.value, str)
):
# if the node is an expression and
# its value is a constant and
# constant's value is a string
# the node represents a docstring
# See https://docs.python.org/3/library/ast.html#abstract-grammar
docstring = node.value.value
lineno = node.value.lineno
docstrings[lineno] = docstring

return docstrings


def find_invalid_backtick_text(docstring):
"""Find invalid backtick text in a docstring."""
# remove all the code blocks to avoid interference
# they are allowed to use backticks in any way
docstring = re.sub(r"```.*?```", "", docstring, flags=re.DOTALL)

# remove all double-backticks to avoid interference
# we are looking only for invalid single-backtick
docstring = re.sub(r"``.*?``", "", docstring)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two probably can be merged together, if 2 or more backticks are acceptable. Something like r"(`{2,}).+?\1", but it's not tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this for the sake of simplicity.

The regex expressions can be merged but they become complex and unreadable when we try to avoid this situation: "regex 2 basically covers all the cases of regex 1 because it is a sub-part of it and thus the complete code-block is not substituted"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understood what you meant, but if backreference of captured group seems complex, you can keep as it is. It was only a suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not using the correct flags, so I figured that out


all_backtick_text = re.findall(r"`.*?`", docstring)
# expressions like :math:`d(x, y):= (x-y)^2` are valid cases
valid_backtick_text = re.findall(r":.*?:(`.*?`)", docstring)

# find all the invalid backtick code snippets
invalid_backtick_text = []
for text in all_backtick_text:
if text in valid_backtick_text:
continue
if text in invalid_backtick_text:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for unique texts right? Let's skip this and make invalid_backtick_text a set and use add below.

invalid_backtick_text.append(text)

return invalid_backtick_text


def main():
"""Execute the main function of the script."""
import sktime
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes sktime is already installed. I don't think we can assume that for pre-commit like cases.


results = {}

# list all the python files in the project
py_files = find_py_files(sktime.__path__[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. What about accepting a path as CLI argument, and defaulting to ./sktime by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually improves the time from 2~3 seconds to 1~2 seconds as we are saving time by not importing sktime


for file in py_files:
docstrings = extract_docstrings(file)
results_on_file = {}

for lineno, docstring in docstrings.items():
invalid_backtick_text = find_invalid_backtick_text(docstring)

if len(invalid_backtick_text) > 0:
results_on_file[lineno] = invalid_backtick_text

if len(results_on_file) > 0:
results[file] = results_on_file
yarnabrina marked this conversation as resolved.
Show resolved Hide resolved

# print the lines along with the invalid backticks text
if len(results) > 0:
print(f"Total Files with invalid backticks: {len(results)}") # noqa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this noqa for? Is flake8 or some other linter complaining about presence of print? In that case, skip a specific error code only instead of skipping all.

for filename, result in results.items():
for lineno, errors in result.items():
print(f"{filename}:{lineno} {' '.join(errors)}") # noqa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

else:
print("No invalid backticks found") # noqa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.



if __name__ == "__main__":
main()