Skip to content
ramsy edited this page Sep 20, 2023 · 7 revisions

In this section we will be covering how to use proxycrawler and what are the different commands and switches does it have. Let's start off first by covering each command and it's description or purpose, as you can see in the following table we have each command with it's description:

Command Description
scrap Scrap proxies from supported services
validate Validate a proxy list from a file. The proxies should be formatted like so protocol://ip:port
export-db Export the saved proxies from the database and validate them
version proxycrawler's version

As you can see, each command has it's own purpose. We will now go into how we can use each one of them.

  • scrap

the scrap command start the crawler that will go into the process of scraping proxies from services which are as of v0.2.0: free-proxy-list.net and Geonode.net. The first service to be scrapped is free-proxy-list.net and this cannot be changed (in the future version you will be able to choose which service to use first).

now let's see a basic usage of this command:

proxycrawler scrap

the output will be something like this:

                                                          __
    ____  _________  _  ____  ________________ __      __/ /__  _____
   / __ \/ ___/ __ \| |/_/ / / / ___/ ___/ __ `/ | /| / / / _ \/ ___/
  / /_/ / /  / /_/ />  </ /_/ / /__/ /  / /_/ /| |/ |/ / /  __/ /
 / .___/_/   \____/_/|_|\__, /\___/_/   \__,_/ |__/|__/_/\___/_/ Version 0.1.0
/_/                    /____/
                            Made by `ramsy0dev`

[02:55:31] [INFO] Using service 'free_proxy_list' with url:'https://free-proxy-list.net'                                                                proxycrawler.py:64
[02:55:31] [INFO] Scrapping free-proxy-list at url:'https://free-proxy-list.net'                                                                       freeproxylist.py:26
[02:56:36] [INFO] Found a valid proxy: {'http': 'http://20.204.212.45:3129', 'socks4': 'socks4://20.204.212.45:3129', 'socks5':                        freeproxylist.py:63
           'socks5://20.204.212.45:3129'}
[02:57:27] [INFO] Found a valid proxy: {'http': 'http://20.44.188.17:3129', 'socks4': 'socks4://20.44.188.17:3129', 'socks5':                          freeproxylist.py:63
           'socks5://20.44.188.17:3129'}
[02:58:22] [INFO] Found a valid proxy: {'http': 'http://8.219.97.248:80', 'socks4': 'socks4://8.219.97.248:80', 'socks5': 'socks5://8.219.97.248:80'}  freeproxylist.py:63
[02:59:17] [INFO] Found a valid proxy: {'http': 'http://20.219.177.73:3129', 'socks4': 'socks4://20.219.177.73:3129', 'socks5':                        freeproxylist.py:63
           'socks5://20.219.177.73:3129'}

As you can see the message saying "Found a valid proxy" shows that a valid proxy was found now these proxies will be automatically saved into the output file after scrapping of that particular service is finished, this can be disabled by using the switch --enable-save-on-run. You may notice that the ip and port of the proxy is repeated in the line here {'http': 'http://20.219.177.73:3129', 'socks4': 'socks4://20.219.177.73:3129', 'socks5': 'socks5://20.219.177.73:3129'} but that is because proxycrawler have found that it's works with multiple protocols being http, socks4 and sock5. This will vary from service into an other depending on if the service provide what protocol is supported if not then it will go through all of them trying to validate the proxy. A part from saving to an output file, proxycrawler also saves into a local SQLite database located at ~/.config/proxycrawler/database.db, this can be useful to keep track of the proxies that you previously found and you can then check if they are still valid or not using the command export-db which will export and validate them for you.

Switches for scrap:

switch Description
--enable-save-on-run Save valid proxies while proxycrawler is still running (can be useful in case of a bad internet connection) [default: True]
--group-by-protocol Save proxies into separate files based on the supported protocols [http, https, socks4, sock5]
--output-file-path Custom output file path to save results (.txt)
  • validate

The command validate can be useful if you found a proxy list online on some other website and you to quickly validate them, and it doesn't matter if you don't know the proxy type because proxycrawler can just test them all for.

NOTE: The proxies should be formatted like so: protocol://ip:port

basic usage of validate:

  • validate with the original protocol:
proxycrawler validate --proxy-file ./proxies.txt
  • with a specific protocol:
proxycrawler validate --proxy-file ./proxies.txt --protocol https
  • Validate with all the protocols:
proxycrawler validate --proxy-file ./proxies.txt --test-all-protocols

the output will be something like this:

                                                          __
    ____  _________  _  ____  ________________ __      __/ /__  _____
   / __ \/ ___/ __ \| |/_/ / / / ___/ ___/ __ `/ | /| / / / _ \/ ___/
  / /_/ / /  / /_/ />  </ /_/ / /__/ /  / /_/ /| |/ |/ / /  __/ /
 / .___/_/   \____/_/|_|\__, /\___/_/   \__,_/ |__/|__/_/\___/_/ Version 0.1.0
/_/                    /____/
                            Made by `ramsy0dev`

[03:28:43] [INFO] Found '9' proxies from './proxycrawler-proxies.txt'. Validating them...                                                                       cli.py:174
[03:29:40] [INFO] Found a valid proxy: {'http': 'http://43.157.8.79:8888', 'socks4': 'socks4://43.157.8.79:8888', 'socks5':                            proxycrawler.py:274
           'socks5://43.157.8.79:8888'}
[03:30:26] [INFO] Found a valid proxy: {'http': 'http://35.236.207.242:33333', 'socks4': 'socks4://35.236.207.242:33333', 'socks5':                    proxycrawler.py:274
           'socks5://35.236.207.242:33333'}

Switches for validate:

switch Description
--proxy-file path to the proxy file
--protocol Set a specific protocol to test the proxies on
--test-all-protocols Test all the protocols on a proxy
--enable-save-on-run Save valid proxies while proxycrawler is still running (can be useful in case of a bad internet connection) [default: True]
--group-by-protocol Save proxies into separate files based on the supported protocols [http, https, socks4, sock5]
--output-file-path Custom output file path to save results (.txt)
  • export-db

The command export-db is used to export and validate the proxies saved in the database, you can control how many proxies you want to export from the database using the switch --proxies-count which takes the number of proxies you want. If the number is higher or smaller (basically a negative number )then whats in the database then it will export all.

Basic usage:

proxycrawler export-db --proxies-count 21

the output will be something like this:

                                                          __
    ____  _________  _  ____  ________________ __      __/ /__  _____
   / __ \/ ___/ __ \| |/_/ / / / ___/ ___/ __ `/ | /| / / / _ \/ ___/
  / /_/ / /  / /_/ />  </ /_/ / /__/ /  / /_/ /| |/ |/ / /  __/ /
 / .___/_/   \____/_/|_|\__, /\___/_/   \__,_/ |__/|__/_/\___/_/ Version 0.1.0
/_/                    /____/
                            Made by `ramsy0dev`

[03:22:51] [INFO] Fetching and validating proxies from the database                                                                                              cli.py:93
[03:22:51] [INFO] Fetched '15' proxies from the database. Validating them ...                                                                          proxycrawler.py:130
[03:23:19] [INFO] Found a valid proxy: {'http': 'http://43.157.8.79:8888', 'socks4': 'socks4://43.157.8.79:8888', 'socks5':                            proxycrawler.py:147
           'socks5://43.157.8.79:8888'}

Switches for export-db:

switch Description
--proxies-count Number of proxies to export (exports all by default)
--validate Validate proxies [default: True]
--enable-save-on-run Save valid proxies while proxycrawler is still running (can be useful in case of a bad internet connection) [default: True]
--group-by-protocol Save proxies into separate files based on the supported protocols [http, https, socks4, sock5]
--output-file-path Custom output file path to save results (.txt)
  • version

Display proxycrawler's version

Clone this wiki locally