Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedparser seems to occasionally hang and has no timeout #76

Closed
peterashwell opened this issue Jul 10, 2016 · 5 comments
Closed

Feedparser seems to occasionally hang and has no timeout #76

peterashwell opened this issue Jul 10, 2016 · 5 comments

Comments

@peterashwell
Copy link

According to this the default timeout in urllib2 is -1, or None. So... this is a problem for long running programs, when occasionally some connection will hang everything.

Solution is pretty simple, add a timeout to the 'open' here

f = opener.open(request)

I'll fork and try make a fix

@rigid
Copy link

rigid commented Feb 19, 2017

this issue seems like a real problem for there seems to be no clean workaround. Can't wait to see the next release because of that.

@darklow
Copy link

darklow commented Jul 4, 2017

If you want a quick workaround you can monkey patch and use requests lib instead with proper timeout. It also fixes https certificate issues I had with default feedparser url open implementation. This is how I do it:

import requests
import feedparser

feedparser._open_resource = lambda *args, **kwargs: feedparser._StringIO(requests.get(args[0], timeout=5).content)

Update: On versions above 6.x use following:

import requests
import feedparser

feedparser.api._open_resource = lambda *args, **kwargs: requests.get(args[0], headers=headers, timeout=5).content

@ghost
Copy link

ghost commented Sep 11, 2019

above did the job for my error:

have very simple app polling, once in a while feedparser does not return and needs 2x ^C to exit the script, and it then prints:

^CTraceback (most recent call last):
  File "frontend/myfeed/src/main.py", line 48, in main
  File "/home/user/.local/share/virtualenvs/workspace_python-Cp_/lib/python3.7/site-packages/feedparser.py", line 3841, in parse
    data = f.read()
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/http/client.py", line 464, in read
    return self._readall_chunked()
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/http/client.py", line 574, in _readall_chunked
    value.append(self._safe_read(chunk_left))
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/http/client.py", line 620, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
KeyboardInterrupt

^CTraceback (most recent call last):
  File "frontend/myfeed/src/main.py", line 56, in <module>
  File "frontend/myfeed/src/main.py", line 53, in main

Not sure if related and if above fixes this. I am using the latest pip install version

@adbenitez
Copy link

adbenitez commented Aug 15, 2021

If you want a quick workaround you can monkey patch and use requests lib instead with proper timeout

having a broken implementation leads to devs doing workarounds like this that then have issues and other devs just copy-paste wrong solutions, this will ignore etag and modified arguments, so the feed will be download completely each time in an inefficient way, and may cause servers to block you, so just saying that you should use a separated lib to get the data and pass the data to feedparser will not be that convenient and then devs will need to implement some function that pass the appropriated headers for etag and modified and then modified header needs proper formatting, it is not just like importing requests and doing requests.get

@kurtmckee
Copy link
Owner

feedparser has dropped all custom HTTP client code in favor of the requests package. (This change not been released yet because I am still working on a significant effort to update the code and documentation.) At this time, the code has a 10 second HTTP request timeout set.

I'm closing this issue for this reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants