smart-spider-project
The repository cointains all my spider programs.
Recommendation The image_miner works very well.
It sould be used if you only want to exract images based on search queries.
The media_spider works very well also and is the latest version it still has a bug in the link instraction process.
You should tweak the RELAX_TIME so you are not kicked out for scrap abuse.
LAST UPDATE
- media_spider.py is your best choice
I integrated a fake_useragent and proxy functionality I also integration proxy and rotating proxy functionality
smart-spider
Smart spider will be less dump it's a promise.
The spider will be able to extract and evaluate web site content in a more elegant way.
The program is too large and must be redesigned in a better way.
Memo Sim @ 2021
Project @ Paradoxe https://paradoxe-sim.weebly.com/