Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: list index out of range #7

Closed
stereonov opened this issue Dec 11, 2019 · 12 comments
Closed

IndexError: list index out of range #7

stereonov opened this issue Dec 11, 2019 · 12 comments

Comments

@stereonov
Copy link

stereonov commented Dec 11, 2019

Hi, I try to use your py script but I get the following error:

$ python3 ./lyricpass.py -a eminem
[+] Looking up artist eminem [+] Found 1558 songs for artists eminem Traceback (most recent call last): File "./lyricpass.py", line 239, in <module> main() File "./lyricpass.py", line 221, in main raw_words.update(scrape_lyrics(url_list)) File "./lyricpass.py", line 190, in scrape_lyrics lyrics = re.findall(regex, html)[0] IndexError: list index out of range

P.S: I also try 'lyricpass-2019-rewrite' but I get exactly the same issue.

Is this still working?

Thanks in advance!

@initstring
Copy link
Owner

Hi @stereonov
Thanks for reporting! Yeah, it was broken. This is the problem with building webscrapers - regexes break whenever the site makes changes.
I think I see what changed. I push a new update to master just now. Can you try again?

@stereonov
Copy link
Author

I have replace the lyricpass.py with the updated, this time scan almost 100 songs than 642 but still get this:
$ python3 ./lyricpass.py -a Taylor+Swift
`[+] Looking up artist Taylor+Swift
[+] Found 642 songs for artists Taylor+Swift
Traceback (most recent call last):
File "/usr/lib/python3.4/urllib/request.py", line 1182, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "/usr/lib/python3.4/http/client.py", line 1125, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.4/http/client.py", line 1163, in _send_request
self.endheaders(body)
File "/usr/lib/python3.4/http/client.py", line 1121, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.4/http/client.py", line 951, in _send_output
self.send(msg)
File "/usr/lib/python3.4/http/client.py", line 886, in send
self.connect()
File "/usr/lib/python3.4/http/client.py", line 1260, in connect
super().connect()
File "/usr/lib/python3.4/http/client.py", line 863, in connect
self.timeout, self.source_address)
File "/usr/lib/python3.4/socket.py", line 494, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/lib/python3.4/socket.py", line 533, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./lyricpass.py", line 241, in
main()
File "./lyricpass.py", line 223, in main
raw_words.update(scrape_lyrics(url_list))
File "./lyricpass.py", line 183, in scrape_lyrics
with urllib.request.urlopen(url) as response:
File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 463, in open
response = self._open(req, data)
File "/usr/lib/python3.4/urllib/request.py", line 481, in _open
'_open', req)
File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 1225, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/usr/lib/python3.4/urllib/request.py", line 1184, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>`

@initstring
Copy link
Owner

It looks like a DNS lookup failed (socket.gaierror: [Errno -2] Name or service not known).
There isn't currently anything in the tool to try to work around network errors. Probably using the requests library instead of urllib would help.
I pushed a new branch using requests... can you see if it helps?
https://github.com/initstring/lyricpass/tree/requests-lib

@initstring
Copy link
Owner

You might need to do pip3 install requests first.

@stereonov
Copy link
Author

Here is what I did:
sudo apt-get install python3-pip

pip3 install -r requirements.txt
Requirement already satisfied (use --upgrade to upgrade): requests in /usr/lib/python3/dist-packages (from -r requirements.txt (line 1)) Cleaning up...

`python3 ./lyricpass.py -a Taylor+Swift
[+] Looking up artist Taylor+Swift
[+] Found 642 songs for artists Taylor+Swift
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 562, in urlopen
body=body, headers=headers)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 387, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.4/http/client.py", line 1125, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.4/http/client.py", line 1163, in _send_request
self.endheaders(body)
File "/usr/lib/python3.4/http/client.py", line 1121, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.4/http/client.py", line 951, in _send_output
self.send(msg)
File "/usr/lib/python3.4/http/client.py", line 886, in send
self.connect()
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 111, in connect
timeout=self.timeout)
File "/usr/lib/python3.4/socket.py", line 494, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/lib/python3.4/socket.py", line 533, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 330, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 612, in urlopen
raise MaxRetryError(self, url, e)
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.lyrics.com', port=443): Max retries exceeded with url: /db-print.php?id=28163737 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./lyricpass.py", line 245, in
main()
File "./lyricpass.py", line 227, in main
raw_words.update(scrape_lyrics(url_list))
File "./lyricpass.py", line 187, in scrape_lyrics
response = requests.get(url)
File "/usr/lib/python3/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 467, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 570, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.lyrics.com', port=443): Max retries exceeded with url: /db-print.php?id=28163737 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)`

Did I do something wrong?
Thanks again for helping out!

@stereonov
Copy link
Author

Update: Your last update is worked and successfully generate files raw and wordlist with an artist which has 150 songs. When an artist has more songs like Taylor swift (642) then I get the error which appear at my previous comment.

@initstring
Copy link
Owner

Sorry you're still having issues. I can confirm the tool works for artists with more songs. Your error is related to a failed DNS lookup, so is due to something other than the Python script.
You can try a few things:

  • Create a hosts file entry for for www.lyrics.com to skip DNS lookups
  • wrap the "requests.get" lines in try/except, but this may mean you will miss out on lyrics that fail
  • Try a different DNS server
  • Try connecting to a different network

@stereonov
Copy link
Author

Thank you so much for help.
I really don't know how to do any of your suggestions, except the last one where I use it from home and I have only one network. I'm a beginner on Ubuntu.
About first suggestion, you mean something like this? https://rimuhosting.com/knowledgebase/linux/misc/bypassing-dns-servers-using-etc-hosts

@initstring
Copy link
Owner

If you are using Ubuntu, you can try this:

  • Edit the '/etc/hosts' file (you will need to do this as root)
  • Add this line in it anywhere (on its own line):

52.203.75.1 www.lyrics.com

And try again. Don't forget to delete that line out of hosts when you are all done, as the address for the website may change later.

Hope this helps!

@stereonov
Copy link
Author

Ok I will, thanks in advance!!!

@stereonov
Copy link
Author

Yeap worked! Thank you so much again, you are amazing!

@initstring
Copy link
Owner

I'm glad I could help!
I read a book recently called "How Linux Works" by Brian Ward. If you're interested in learning more about Linux, I highly recommend it. I've been using Linux for a lot of years but still learned a lot from it.
Good luck with your projects!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants