Usage

In this section we will be covering how to use proxycrawler and what are the different commands and switches does it have. Let's start off first by covering each command and it's description or purpose, as you can see in the following table we have each command with it's description:

Command	Description
`scrap`	Scrap proxies from supported services
`validate`	Validate a proxy list from a file. The proxies should be formatted like so `protocol://ip:port`
`export-db`	Export the saved proxies from the database and validate them
`version`	proxycrawler's version

As you can see, each command has it's own purpose. We will now go into how we can use each one of them.

scrap

the scrap command start the crawler that will go into the process of scraping proxies from services which are as of v0.2.0: free-proxy-list.net and Geonode.net. The first service to be scrapped is free-proxy-list.net and this cannot be changed (in the future version you will be able to choose which service to use first).

now let's see a basic usage of this command:

proxycrawler scrap

the output will be something like this:

                                                          __
    ____  _________  _  ____  ________________ __      __/ /__  _____
   / __ \/ ___/ __ \| |/_/ / / / ___/ ___/ __ `/ | /| / / / _ \/ ___/
  / /_/ / /  / /_/ />  </ /_/ / /__/ /  / /_/ /| |/ |/ / /  __/ /
 / .___/_/   \____/_/|_|\__, /\___/_/   \__,_/ |__/|__/_/\___/_/ Version 0.1.0
/_/                    /____/
                            Made by `ramsy0dev`

[02:55:31] [INFO] Using service 'free_proxy_list' with url:'https://free-proxy-list.net'                                                                proxycrawler.py:64
[02:55:31] [INFO] Scrapping free-proxy-list at url:'https://free-proxy-list.net'                                                                       freeproxylist.py:26
[02:56:36] [INFO] Found a valid proxy: {'http': 'http://20.204.212.45:3129', 'socks4': 'socks4://20.204.212.45:3129', 'socks5':                        freeproxylist.py:63
           'socks5://20.204.212.45:3129'}
[02:57:27] [INFO] Found a valid proxy: {'http': 'http://20.44.188.17:3129', 'socks4': 'socks4://20.44.188.17:3129', 'socks5':                          freeproxylist.py:63
           'socks5://20.44.188.17:3129'}
[02:58:22] [INFO] Found a valid proxy: {'http': 'http://8.219.97.248:80', 'socks4': 'socks4://8.219.97.248:80', 'socks5': 'socks5://8.219.97.248:80'}  freeproxylist.py:63
[02:59:17] [INFO] Found a valid proxy: {'http': 'http://20.219.177.73:3129', 'socks4': 'socks4://20.219.177.73:3129', 'socks5':                        freeproxylist.py:63
           'socks5://20.219.177.73:3129'}

As you can see the message saying "Found a valid proxy" shows that a valid proxy was found now these proxies will be automatically saved into the output file after scrapping of that particular service is finished, this can be disabled by using the switch --enable-save-on-run. You may notice that the ip and port of the proxy is repeated in the line here {'http': 'http://20.219.177.73:3129', 'socks4': 'socks4://20.219.177.73:3129', 'socks5': 'socks5://20.219.177.73:3129'} but that is because proxycrawler have found that it's works with multiple protocols being http, socks4 and sock5. This will vary from service into an other depending on if the service provide what protocol is supported if not then it will go through all of them trying to validate the proxy. A part from saving to an output file, proxycrawler also saves into a local SQLite database located at ~/.config/proxycrawler/database.db, this can be useful to keep track of the proxies that you previously found and you can then check if they are still valid or not using the command export-db which will export and validate them for you.

Switches for scrap:

switch	Description
`--enable-save-on-run`	Save valid proxies while proxycrawler is still running (can be useful in case of a bad internet connection) [default: True]
`--group-by-protocol`	Save proxies into separate files based on the supported protocols [http, https, socks4, sock5]
`--output-file-path`	Custom output file path to save results (.txt)

validate

The command validate can be useful if you found a proxy list online on some other website and you to quickly validate them, and it doesn't matter if you don't know the proxy type because proxycrawler can just test them all for.

NOTE: The proxies should be formatted like so: protocol://ip:port

basic usage of validate:

validate with the original protocol:

proxycrawler validate --proxy-file ./proxies.txt

with a specific protocol:

proxycrawler validate --proxy-file ./proxies.txt --protocol https

Validate with all the protocols:

proxycrawler validate --proxy-file ./proxies.txt --test-all-protocols

the output will be something like this:

                                                          __
    ____  _________  _  ____  ________________ __      __/ /__  _____
   / __ \/ ___/ __ \| |/_/ / / / ___/ ___/ __ `/ | /| / / / _ \/ ___/
  / /_/ / /  / /_/ />  </ /_/ / /__/ /  / /_/ /| |/ |/ / /  __/ /
 / .___/_/   \____/_/|_|\__, /\___/_/   \__,_/ |__/|__/_/\___/_/ Version 0.1.0
/_/                    /____/
                            Made by `ramsy0dev`

[03:28:43] [INFO] Found '9' proxies from './proxycrawler-proxies.txt'. Validating them...                                                                       cli.py:174
[03:29:40] [INFO] Found a valid proxy: {'http': 'http://43.157.8.79:8888', 'socks4': 'socks4://43.157.8.79:8888', 'socks5':                            proxycrawler.py:274
           'socks5://43.157.8.79:8888'}
[03:30:26] [INFO] Found a valid proxy: {'http': 'http://35.236.207.242:33333', 'socks4': 'socks4://35.236.207.242:33333', 'socks5':                    proxycrawler.py:274
           'socks5://35.236.207.242:33333'}

Switches for validate:

switch	Description
`--proxy-file`	path to the proxy file
`--protocol`	Set a specific protocol to test the proxies on
`--test-all-protocols`	Test all the protocols on a proxy
`--enable-save-on-run`	Save valid proxies while proxycrawler is still running (can be useful in case of a bad internet connection) [default: True]
`--group-by-protocol`	Save proxies into separate files based on the supported protocols [http, https, socks4, sock5]
`--output-file-path`	Custom output file path to save results (.txt)

export-db

The command export-db is used to export and validate the proxies saved in the database, you can control how many proxies you want to export from the database using the switch --proxies-count which takes the number of proxies you want. If the number is higher or smaller (basically a negative number )then whats in the database then it will export all.

Basic usage:

proxycrawler export-db --proxies-count 21

the output will be something like this:

                                                          __
    ____  _________  _  ____  ________________ __      __/ /__  _____
   / __ \/ ___/ __ \| |/_/ / / / ___/ ___/ __ `/ | /| / / / _ \/ ___/
  / /_/ / /  / /_/ />  </ /_/ / /__/ /  / /_/ /| |/ |/ / /  __/ /
 / .___/_/   \____/_/|_|\__, /\___/_/   \__,_/ |__/|__/_/\___/_/ Version 0.1.0
/_/                    /____/
                            Made by `ramsy0dev`

[03:22:51] [INFO] Fetching and validating proxies from the database                                                                                              cli.py:93
[03:22:51] [INFO] Fetched '15' proxies from the database. Validating them ...                                                                          proxycrawler.py:130
[03:23:19] [INFO] Found a valid proxy: {'http': 'http://43.157.8.79:8888', 'socks4': 'socks4://43.157.8.79:8888', 'socks5':                            proxycrawler.py:147
           'socks5://43.157.8.79:8888'}

Switches for export-db:

switch	Description
`--proxies-count`	Number of proxies to export (exports all by default)
`--validate`	Validate proxies [default: True]
`--enable-save-on-run`	Save valid proxies while proxycrawler is still running (can be useful in case of a bad internet connection) [default: True]
`--group-by-protocol`	Save proxies into separate files based on the supported protocols [http, https, socks4, sock5]
`--output-file-path`	Custom output file path to save results (.txt)

version

Display proxycrawler's version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage

`scrap`

`validate`

`export-db`

`version`

Clone this wiki locally