-
-
Notifications
You must be signed in to change notification settings - Fork 0
Usage
In this section we will be covering how to use proxycrawler and what are the different commands and switches does it have. Let's start off first by covering each command and it's description or purpose, as you can see in the following table we have each command with it's description:
Command | Description |
---|---|
scrap |
Scrap proxies from supported services |
validate |
Validate a proxy list from a file. The proxies should be formatted like so protocol://ip:port
|
export-db |
Export the saved proxies from the database and validate them |
version |
proxycrawler's version |
As you can see, each command has it's own purpose. We will now go into how we can use each one of them.
the scrap
command start the crawler that will go into the process of scraping proxies from services which are as of v0.2.0: free-proxy-list.net and Geonode.net. The first service to be scrapped is free-proxy-list.net and this cannot be changed (in the future version you will be able to choose which service to use first).
now let's see a basic usage of this command:
proxycrawler scrap
the output will be something like this:
__
____ _________ _ ____ ________________ __ __/ /__ _____
/ __ \/ ___/ __ \| |/_/ / / / ___/ ___/ __ `/ | /| / / / _ \/ ___/
/ /_/ / / / /_/ /> </ /_/ / /__/ / / /_/ /| |/ |/ / / __/ /
/ .___/_/ \____/_/|_|\__, /\___/_/ \__,_/ |__/|__/_/\___/_/ Version 0.1.0
/_/ /____/
Made by `ramsy0dev`
[02:55:31] [INFO] Using service 'free_proxy_list' with url:'https://free-proxy-list.net' proxycrawler.py:64
[02:55:31] [INFO] Scrapping free-proxy-list at url:'https://free-proxy-list.net' freeproxylist.py:26
[02:56:36] [INFO] Found a valid proxy: {'http': 'http://20.204.212.45:3129', 'socks4': 'socks4://20.204.212.45:3129', 'socks5': freeproxylist.py:63
'socks5://20.204.212.45:3129'}
[02:57:27] [INFO] Found a valid proxy: {'http': 'http://20.44.188.17:3129', 'socks4': 'socks4://20.44.188.17:3129', 'socks5': freeproxylist.py:63
'socks5://20.44.188.17:3129'}
[02:58:22] [INFO] Found a valid proxy: {'http': 'http://8.219.97.248:80', 'socks4': 'socks4://8.219.97.248:80', 'socks5': 'socks5://8.219.97.248:80'} freeproxylist.py:63
[02:59:17] [INFO] Found a valid proxy: {'http': 'http://20.219.177.73:3129', 'socks4': 'socks4://20.219.177.73:3129', 'socks5': freeproxylist.py:63
'socks5://20.219.177.73:3129'}
As you can see the message saying "Found a valid proxy" shows that a valid proxy was found now these proxies will be automatically saved into the output file after scrapping of that particular service is finished, this can be disabled by using the switch --enable-save-on-run
. You may notice that the ip
and port
of the proxy is repeated in the line here {'http': 'http://20.219.177.73:3129', 'socks4': 'socks4://20.219.177.73:3129', 'socks5': 'socks5://20.219.177.73:3129'}
but that is because proxycrawler have found that it's works with multiple protocols being http, socks4 and sock5. This will vary from service into an other depending on if the service provide what protocol is supported if not then it will go through all of them trying to validate the proxy. A part from saving to an output file, proxycrawler also saves into a local SQLite database located at ~/.config/proxycrawler/database.db
, this can be useful to keep track of the proxies that you previously found and you can then check if they are still valid or not using the command export-db
which will export and validate them for you.
Switches for scrap
:
switch | Description |
---|---|
--enable-save-on-run |
Save valid proxies while proxycrawler is still running (can be useful in case of a bad internet connection) [default: True] |
--group-by-protocol |
Save proxies into separate files based on the supported protocols [http, https, socks4, sock5] |
--output-file-path |
Custom output file path to save results (.txt) |
The command validate
can be useful if you found a proxy list online on some other website and you to quickly validate them, and it doesn't matter if you don't know the proxy type because proxycrawler can just test them all for.
NOTE: The proxies should be formatted like so:
protocol://ip:port
basic usage of validate
:
- validate with the original protocol:
proxycrawler validate --proxy-file ./proxies.txt
- with a specific protocol:
proxycrawler validate --proxy-file ./proxies.txt --protocol https
- Validate with all the protocols:
proxycrawler validate --proxy-file ./proxies.txt --test-all-protocols
the output will be something like this:
__
____ _________ _ ____ ________________ __ __/ /__ _____
/ __ \/ ___/ __ \| |/_/ / / / ___/ ___/ __ `/ | /| / / / _ \/ ___/
/ /_/ / / / /_/ /> </ /_/ / /__/ / / /_/ /| |/ |/ / / __/ /
/ .___/_/ \____/_/|_|\__, /\___/_/ \__,_/ |__/|__/_/\___/_/ Version 0.1.0
/_/ /____/
Made by `ramsy0dev`
[03:28:43] [INFO] Found '9' proxies from './proxycrawler-proxies.txt'. Validating them... cli.py:174
[03:29:40] [INFO] Found a valid proxy: {'http': 'http://43.157.8.79:8888', 'socks4': 'socks4://43.157.8.79:8888', 'socks5': proxycrawler.py:274
'socks5://43.157.8.79:8888'}
[03:30:26] [INFO] Found a valid proxy: {'http': 'http://35.236.207.242:33333', 'socks4': 'socks4://35.236.207.242:33333', 'socks5': proxycrawler.py:274
'socks5://35.236.207.242:33333'}
Switches for validate
:
switch | Description |
---|---|
--proxy-file |
path to the proxy file |
--protocol |
Set a specific protocol to test the proxies on |
--test-all-protocols |
Test all the protocols on a proxy |
--enable-save-on-run |
Save valid proxies while proxycrawler is still running (can be useful in case of a bad internet connection) [default: True] |
--group-by-protocol |
Save proxies into separate files based on the supported protocols [http, https, socks4, sock5] |
--output-file-path |
Custom output file path to save results (.txt) |
The command export-db
is used to export and validate the proxies saved in the database, you can control how many proxies you want to export from the database using the switch --proxies-count
which takes the number of proxies you want. If the number is higher or smaller (basically a negative number )then whats in the database then it will export all.
Basic usage:
proxycrawler export-db --proxies-count 21
the output will be something like this:
__
____ _________ _ ____ ________________ __ __/ /__ _____
/ __ \/ ___/ __ \| |/_/ / / / ___/ ___/ __ `/ | /| / / / _ \/ ___/
/ /_/ / / / /_/ /> </ /_/ / /__/ / / /_/ /| |/ |/ / / __/ /
/ .___/_/ \____/_/|_|\__, /\___/_/ \__,_/ |__/|__/_/\___/_/ Version 0.1.0
/_/ /____/
Made by `ramsy0dev`
[03:22:51] [INFO] Fetching and validating proxies from the database cli.py:93
[03:22:51] [INFO] Fetched '15' proxies from the database. Validating them ... proxycrawler.py:130
[03:23:19] [INFO] Found a valid proxy: {'http': 'http://43.157.8.79:8888', 'socks4': 'socks4://43.157.8.79:8888', 'socks5': proxycrawler.py:147
'socks5://43.157.8.79:8888'}
Switches for export-db
:
switch | Description |
---|---|
--proxies-count |
Number of proxies to export (exports all by default) |
--validate |
Validate proxies [default: True] |
--enable-save-on-run |
Save valid proxies while proxycrawler is still running (can be useful in case of a bad internet connection) [default: True] |
--group-by-protocol |
Save proxies into separate files based on the supported protocols [http, https, socks4, sock5] |
--output-file-path |
Custom output file path to save results (.txt) |
Display proxycrawler's version