-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scraps forever #280
Comments
Thanks @mirfan899 I tried those URLs and I didn't get any issues. There were only 3 that were forbidden by robots.txt rules, and a few JSON-LD issues here and there. Your logs show that you have 89 (out of 100) status codes that are 200. Please check your code if you have special rules that might have been blocked by some domain (these are from many different domains, and a few might have issues). |
Here is my code. import advertools as adv
import pandas as pd
urls = open("urls.txt").readlines()
adv.crawl(urls, "pages.jl", follow_links=False) Okay, here is the output of code execution, takes around 13 minutes to complete
|
Thanks. If you get a specific error feel free to open another issue. |
Here is the list of URLs I'm trying to scrape, which are stuck and never finishes.
Here is the log
The text was updated successfully, but these errors were encountered: