What is Scrapoxy ?
Scrapoxy hides your scraper behind a cloud.
It starts a pool of proxies to send your requests.
Now, you can crawl without thinking about blacklisting!
How does Scrapoxy work ?
- When Scrapoxy starts, it creates and manages a pool of proxies.
- Your scraper uses Scrapoxy as a normal proxy.
- Scrapoxy routes all requests through a pool of proxies.
What Scrapoxy does ?
- Create your own proxies
- Use multiple cloud providers (AWS, DigitalOcean, OVH, Vscale)
- Rotate IP addresses
- Impersonate known browsers
- Exclude blacklisted instances
- Monitor the requests
- Detect bottleneck
- Optimize the scraping
Why Scrapoxy doesn't support anti-blacklisting ?
Anti-blacklisting is a job for the scraper.
When the scraper detects blacklisting, it asks Scrapoxy to remove the proxy from the proxies pool (through a REST API).
What is the best scraper framework to use with Scrapoxy ?
Does Scrapoxy have a SaaS mode or a support plan ?
Scrapoxy is an open source tool. Source code is highly maintained. You are very welcome to open an issue for features or bugs.
And complete with :ref:`tutorials-docs`.
.. toctree:: :maxdepth: 1 :caption: Get Started quick_start/index changelog Licence <license>
.. toctree:: :maxdepth: 1 :caption: Standard standard/config/index standard/providers/awsec2/index standard/providers/digitalocean/index standard/providers/ovhcloud/index standard/providers/vscale/index standard/gui/index
.. toctree:: :maxdepth: 1 :caption: Advanced advanced/understand/index advanced/api/index advanced/security/index advanced/startup/index
.. toctree:: :maxdepth: 1 :caption: Tutorials tutorials/python-scrapy/index tutorials/nodejs-request/index tutorials/python-scrapy-blacklisting/index
- Node.js minimum version: 8.0.0
You can open an issue on this repository for any feedback (bug, question, request, pull request, etc.).
See the :doc:`License <license>`.
And don't forget to be POLITE when you write your scrapers!