lease delete the output directory #305

KiwiTrue · 2022-01-16T20:37:33Z

[#] Crawler: error trying to retrieve this page: ch16.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition) From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch16.html [+] Please delete the output directory '/Users/mac/safaribooks/Books/Mike Meyers_ CompTIA A_ Guide to Managing and Troubleshooting PCs Fifth Edition (Exams 220-901 _ 220-902) (9781259589553)' and restart the program. [!] Aborting...
i tried restarting and deleting 3 times , and the same error

The text was updated successfully, but these errors were encountered:

Korred · 2022-01-21T15:13:02Z

@Wue9 - Thanks for creating the issue. I can confirm I have the same error on my end (on a different chapter though).

[-] Downloading book contents... (47 chapters)
[#] Crawler: error trying to retrieve this page: ch13.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition)
    From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
[+] Please delete the output directory 'G:\Repos\safaribooks\Books\Mike Meyers_ CompTIA A_ Guide to Managing and Troubleshooting PCs Fifth Edition (Exams 220-901 _ 220-902) (9781259589553)' and restart the program.
[!] Aborting...

After checking the info log it looks like this is a backend issue:

[21/Jan/2022 16:09:15] Crawler: error trying to retrieve this page: ch13.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition)
    From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
[21/Jan/2022 16:09:15] Last request done:
	URL: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
	DATA: None
	OTHERS: {}

	503
	Connection: keep-alive
	Content-Length: 449
	Server: Varnish
	Content-Type: text/html; charset=utf-8
	Accept-Ranges: bytes
	Date: Fri, 21 Jan 2022 15:09:16 GMT
	Via: 1.1 varnish
	X-Client-IP: 83.25.6.96
	X-Served-By: cache-hhn4020-HHN
	X-Cache: MISS
	X-Cache-Hits: 0
	X-Timer: S1642777756.144658,VS0,VE165
	Retry-After: 3600


<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
  <head>
    <title>503 backend read error</title>
  </head>
  <body>
    <h1>Error 503 backend read error</h1>
    <p>backend read error</p>
    <h3>Guru Mediation:</h3>
    <p>Details: cache-hhn11543-HHN 1642777756 2976959339</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>

@lorenzodifuccia : I think it would be a good idea to enable automatic retries for failed requests. Especially when a file cannot be fetched on the first try due to a backend error.

lorenzodifuccia · 2022-01-21T16:46:17Z

@lorenzodifuccia : I think it would be a good idea to enable automatic retries for failed requests. Especially when a file cannot be fetched on the first try due to a backend error.

@Korred, we definitly will...
Add this issue to milestone and labels

EntrixIII · 2022-02-11T16:44:46Z

I wanted to reproduce it and try to fix it, but I couldn't reproduce it... :(
It took a considerable amount of time to download the 1746 image files, but in the end, it worked.

Korred · 2022-02-12T13:22:38Z

@EntrixIII - as mentioned before, the error was caused by a backend read error. That is not something we have control over.
To fix this, you could mount a transport adapter (HTTPAdapter) on the requests Session and ensure that the max_retries parameter is set.

https://docs.python-requests.org/en/latest/user/advanced/#transport-adapters
https://docs.python-requests.org/en/latest/api/#requests.adapters.HTTPAdapter

Side note: If you want to simulate a case where an asset is not reachable, you could probably kill your internet connection during file download, which should give you an 5xx error.

lorenzodifuccia added feature request help wanted partial download labels Jan 21, 2022

lorenzodifuccia added this to the Improvement 0x01 milestone Jan 21, 2022

lorenzodifuccia closed this as completed Jan 4, 2023

lorenzodifuccia reopened this Jan 4, 2023

lorenzodifuccia added wontfix and removed help wanted feature request partial download labels Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lease delete the output directory #305

lease delete the output directory #305

KiwiTrue commented Jan 16, 2022

Korred commented Jan 21, 2022 •

edited

Loading

lorenzodifuccia commented Jan 21, 2022

EntrixIII commented Feb 11, 2022

Korred commented Feb 12, 2022

lease delete the output directory #305

lease delete the output directory #305

Comments

KiwiTrue commented Jan 16, 2022

Korred commented Jan 21, 2022 • edited Loading

lorenzodifuccia commented Jan 21, 2022

EntrixIII commented Feb 11, 2022

Korred commented Feb 12, 2022

Korred commented Jan 21, 2022 •

edited

Loading