Scraping-from-scratch

Trying to make my own web-scraper from the scratch just using urllib. There are three python files - for housekeeping, for finding the links and for crawling them. This uses the concept of multithreading so that many spiders crawl a single queue (of links) and move them to other folder of scraped links. PS: The files are converted to sets for making the scraping process fast.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
domain.py		domain.py
general.py		general.py
link_finder.py		link_finder.py
main.py		main.py
spider.py		spider.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

domain.py

domain.py

general.py

general.py

link_finder.py

link_finder.py

main.py

main.py

spider.py

spider.py

Repository files navigation

Scraping-from-scratch

About

Releases

Packages

Languages

mish24/Scraping-from-scratch

Folders and files

Latest commit

History

Repository files navigation

Scraping-from-scratch

About

Resources

Stars

Watchers

Forks

Languages