Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links with parens #39

Open
zmoon opened this issue Jul 21, 2022 · 1 comment
Open

Links with parens #39

zmoon opened this issue Jul 21, 2022 · 1 comment

Comments

@zmoon
Copy link

zmoon commented Jul 21, 2022

Common in Wikipedia links.

Example: From https://en.wikipedia.org/wiki/Join_(SQL) link is recognized as https://en.wikipedia.org/wiki/Join_.

@grndng
Copy link

grndng commented Jan 13, 2023

Maybe as additional info for @nschloe:

The regular expression in _main.py

url_regex = re.compile(
r"http(?:s)?:\/\/.(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b(?:[-a-zA-Z0-9@:%_\+.~#?&/=]*)"
)
could be extended by parantheses by just adding them after the last =. While the function in _get_urls_from_file() at
def _get_urls_from_file(path):
try:
with open(path) as f:
content = f.read()
except UnicodeDecodeError:
return []
return url_regex.findall(content)
correctly finds links with parantheses when doing so, the links break along the way when using deadlink to check for links with parens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants