Skip to content

lablnet/Web-Spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Run on Repl.it

lint_python

Overview

A multi-threaded web crawler written in Python
The purpose of this tool to gather the links Only for now. You may look todo section

To run the crawler, please type python3 index.py and enter a URL to crawl.

Original Author

  1. Muhammad Umer Farooq (Core)

Contributors

  1. Christian Clauss

Todo

  • Gather Page links
  • Multi-threaded
  • Added Configuration support.
  • Crawl images with alt.
  • Behchmarking.
  • Get metadata (description, keywords)
  • Make the package flexible and easy to use without touching any core files
  • Components to extend project
  • Database layer
  • Analytics
  • Data harvesting
  • Searching algorithms
  • Add more tests

Contributions

There is still a lot of work to do, so feel free to contribute to open PR

License

MIT

Support

Donate coffee?
here is the bitcoin address
Balance

37x6PA4qtPu2fQnYdW5U7jztYhbchASpBV

Thanks you so much.

Disclaimer

I do not accept responsibility for any illegal usage

Releases

No releases published

Sponsor this project

Packages

No packages published

Languages