We should have a sensible default timeout for downloads. We have example of Mozmill CI builds that are aborted due to them taking > 60 minutes whilst apparently still running mozdownload.
building on a recent conversation with whimboo, I gave this issue a go.
whimboo suggested to use the timeout feature of the urllib2 library (more precisely urllib2.urlopen) to
Now, 1) is rather easy, I just added the timeout=3600 parameter to this line here:
It's number 2 that I am unsure about. According to the docs on urllib2, the timeout feature is added to urllib2.urlopen and to urllib2.OpenerDirector.open, however only urlopen is used exactly once in all the code (in the download function of the general Scraper class).
It is clear that putting somehow a timeout in for scraping the directory structure, may change the code more than I expected. My question: Should I do this?
I can envision changing one of the urllib.urlopen statements to use urllib2 and implement a timeout, like in here:
(If I interpret it correctly that this is where the directory structure is parsed). But unfortunately I can't foresee if there are any major ramifications to this change. I can't see any, but I may be wrong.
Any comments to this are welcome.
Let's just apply this to the actual download, and we can open another issue to timeout during the scraping.
Implemented a download timeout (fixes issue #50)
Landed in cda3a68