Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comma extracted at the end if url ends with comma #123

Closed
amoldavsky opened this issue Mar 16, 2022 · 3 comments
Closed

comma extracted at the end if url ends with comma #123

amoldavsky opened this issue Mar 16, 2022 · 3 comments

Comments

@amoldavsky
Copy link

This should not be the case:

>>> from urlextract import URLExtract
>>> extractor = URLExtract()
>>> extractor.find_urls("https://www.formpl.us/form/1653896001, work independently from home")

['https://www.formpl.us/form/1653896001,']
@controldev
Copy link

The same happens with dots (i.e. '.'), which is a relatively frequent error, for example when sentences end with links.

@lipoja
Copy link
Owner

lipoja commented May 17, 2022

@amoldavsky @controldev Hello. Thank you for reporting this issue. I agree that this is not ideal. And I would like to ask you for help in form of discussion because I do not see easy general solution to this problem. What my suggestion would be is postprocessing.

User (in this case you) is the one using this tool. User should know what kind of text is processing. And therefore user can update URLs just by removing extra comma if he expects to be there. It can be done by using simple .rtrim(',').

If you look on this issue in general. I can no easily remove every dot or comma at the end of URL because it might be part of the URL.

However I am open for discussion, maybe you have some solution in mind that we can agree on and implement it.

@lipoja
Copy link
Owner

lipoja commented Feb 27, 2024

Closing this issue since there is no further discussion and simple solution is recommended to user.

@lipoja lipoja closed this as completed Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants