Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] IPv4Address object has no attribute split #41

Closed
supernothing opened this issue May 16, 2019 · 5 comments
Closed

[Bug] IPv4Address object has no attribute split #41

supernothing opened this issue May 16, 2019 · 5 comments
Assignees

Comments

@supernothing
Copy link
Contributor

Issue

Ran into this issue when on v0.10 running this over a large sample of strings. I believe it has something to do with having a domain nested in the URL.

'IPv4Address' object has no attribute 'split'
Traceback (most recent call last):
...
  File "/usr/local/lib/python3.6/site-packages/urlextract/urlextract_core.py", line 568, in find_urls
    return list(urls)
  File "/usr/local/lib/python3.6/site-packages/urlextract/urlextract_core.py", line 542, in gen_urls
    tmp_url = self._complete_url(text, offset + tld_pos, tld)
  File "/usr/local/lib/python3.6/site-packages/urlextract/urlextract_core.py", line 351, in _complete_url
    if not self._is_domain_valid(complete_url, tld):
  File "/usr/local/lib/python3.6/site-packages/urlextract/urlextract_core.py", line 438, in _is_domain_valid
    host_parts = host.split('.')
AttributeError: 'IPv4Address' object has no attribute 'split'

Steps to reproduce

Below is a minimal test case:

import urlextract
urlextract.URLExtract().find_urls("http://0.0.0.0/a.io")
@lipoja
Copy link
Owner

lipoja commented May 29, 2019

Thank you for that detailed report. I will have a look on it.

@lipoja
Copy link
Owner

lipoja commented Aug 3, 2019

This issue is fixed in version 0.13.0

@lipoja lipoja self-assigned this Aug 4, 2019
@warezers
Copy link

warezers commented Aug 8, 2019

I updated version 0.13.0 but I get the same error

from urlextract import URLExtract
extractor = URLExtract()
urls_unique = set()
files = "/Users/roc/Desktop/WEB_000.w3c"
with open(files, 'r', encoding='utf-8') as f:
    for line in f:
        urls = extractor.find_urls(line, only_unique=True)
        urls_unique |= set(urls)
print(urls_unique)
Traceback (most recent call last):
  File "/Users/roc/PycharmProjects/implace/testrun.py", line 657, in <module>
    for url in extractor.find_urls(line):
  File "/Users/roc/Library/Python/3.6/lib/python/site-packages/urlextract/urlextract_core.py", line 636, in find_urls
    return list(urls)
  File "/Users/roc/Library/Python/3.6/lib/python/site-packages/urlextract/urlextract_core.py", line 598, in gen_urls
    tmp_url = self._complete_url(text, offset + tld_pos, tld)
  File "/Users/roc/Library/Python/3.6/lib/python/site-packages/urlextract/urlextract_core.py", line 376, in _complete_url
    if not self._is_domain_valid(complete_url, tld):
  File "/Users/roc/Library/Python/3.6/lib/python/site-packages/urlextract/urlextract_core.py", line 483, in _is_domain_valid
    host_parts = host.split('.')
AttributeError: 'IPv4Address' object has no attribute 'split'

@lipoja
Copy link
Owner

lipoja commented Aug 8, 2019

@warezers Could you double check that you are running version 0.13.0? Because from the Traceback I see that it failed on line 483 on host.split('.'). But in latest version the line 483 is empty.

@warezers
Copy link

@lipoja yes, new update version is working
thank you

@lipoja lipoja closed this as completed Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants