Skip to content

memosasoft/smart-spider-project

Repository files navigation

smart-spider-project

The repository cointains all my spider programs.

Recommendation The image_miner works very well.

It sould be used if you only want to exract images based on search queries.

The media_spider works very well also and is the latest version it still has a bug in the link instraction process.

You should tweak the RELAX_TIME so you are not kicked out for scrap abuse.

LAST UPDATE

  • media_spider.py is your best choice

I integrated a fake_useragent and proxy functionality I also integration proxy and rotating proxy functionality

smart-spider

Smart spider will be less dump it's a promise.

The spider will be able to extract and evaluate web site content in a more elegant way.

The program is too large and must be redesigned in a better way.

Memo Sim @ 2021

Project @ Paradoxe https://paradoxe-sim.weebly.com/

About

Smart spider that extracts and evaluates web site content

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published