Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lease delete the output directory #305

Open
KiwiTrue opened this issue Jan 16, 2022 · 4 comments
Open

lease delete the output directory #305

KiwiTrue opened this issue Jan 16, 2022 · 4 comments

Comments

@KiwiTrue
Copy link

[#] Crawler: error trying to retrieve this page: ch16.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition) From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch16.html [+] Please delete the output directory '/Users/mac/safaribooks/Books/Mike Meyers_ CompTIA A_ Guide to Managing and Troubleshooting PCs Fifth Edition (Exams 220-901 _ 220-902) (9781259589553)' and restart the program. [!] Aborting...
i tried restarting and deleting 3 times , and the same error

@Korred
Copy link

Korred commented Jan 21, 2022

@Wue9 - Thanks for creating the issue. I can confirm I have the same error on my end (on a different chapter though).

[-] Downloading book contents... (47 chapters)
[#] Crawler: error trying to retrieve this page: ch13.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition)
    From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
[+] Please delete the output directory 'G:\Repos\safaribooks\Books\Mike Meyers_ CompTIA A_ Guide to Managing and Troubleshooting PCs Fifth Edition (Exams 220-901 _ 220-902) (9781259589553)' and restart the program.
[!] Aborting...

After checking the info log it looks like this is a backend issue:

[21/Jan/2022 16:09:15] Crawler: error trying to retrieve this page: ch13.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition)
    From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
[21/Jan/2022 16:09:15] Last request done:
	URL: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
	DATA: None
	OTHERS: {}

	503
	Connection: keep-alive
	Content-Length: 449
	Server: Varnish
	Content-Type: text/html; charset=utf-8
	Accept-Ranges: bytes
	Date: Fri, 21 Jan 2022 15:09:16 GMT
	Via: 1.1 varnish
	X-Client-IP: 83.25.6.96
	X-Served-By: cache-hhn4020-HHN
	X-Cache: MISS
	X-Cache-Hits: 0
	X-Timer: S1642777756.144658,VS0,VE165
	Retry-After: 3600


<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
  <head>
    <title>503 backend read error</title>
  </head>
  <body>
    <h1>Error 503 backend read error</h1>
    <p>backend read error</p>
    <h3>Guru Mediation:</h3>
    <p>Details: cache-hhn11543-HHN 1642777756 2976959339</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>

@lorenzodifuccia : I think it would be a good idea to enable automatic retries for failed requests. Especially when a file cannot be fetched on the first try due to a backend error.

@lorenzodifuccia
Copy link
Owner

@lorenzodifuccia : I think it would be a good idea to enable automatic retries for failed requests. Especially when a file cannot be fetched on the first try due to a backend error.

@Korred, we definitly will...
Add this issue to milestone and labels

@lorenzodifuccia lorenzodifuccia added this to the Improvement 0x01 milestone Jan 21, 2022
@EntrixIII
Copy link

I wanted to reproduce it and try to fix it, but I couldn't reproduce it... :(
It took a considerable amount of time to download the 1746 image files, but in the end, it worked.

@Korred
Copy link

Korred commented Feb 12, 2022

@EntrixIII - as mentioned before, the error was caused by a backend read error. That is not something we have control over.
To fix this, you could mount a transport adapter (HTTPAdapter) on the requests Session and ensure that the max_retries parameter is set.

https://docs.python-requests.org/en/latest/user/advanced/#transport-adapters
https://docs.python-requests.org/en/latest/api/#requests.adapters.HTTPAdapter

Side note: If you want to simulate a case where an asset is not reachable, you could probably kill your internet connection during file download, which should give you an 5xx error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants