# Scrape Proxy Table with Node.js
May 21, 2022

In this guide I will demonstrate how to scrape a table of free proxies using Node.js. First, in order to use Node.js in this Jupyter notebook, we need to run the following command in its own cell to install <b>pixiedust</b> and <b>pixiedust_node</b> with the package manager pip.

In [4]:
!pip install pixiedust
!pip install pixiedust_node



[0m

Next, let's start up pixiedust_node by importing it in a new cell.

In [3]:
import pixiedust_node
npm.install(("table-scraper"))

Pixiedust database opened successfully


/usr/local/bin/npm install -s table-scraper
pixiedust_node 0.2.5 started. Cells starting '%%node' may contain Node.js code.


In [5]:
import pixiedust_node

Now we can install our required modules.

In [6]:
npm.install(("table-scraper"))

/usr/local/bin/npm install -s table-scraper


The `table-scraper` module enables us to extract all tables from the HTML of a webpage. The tables are stored as an array of JSON objects. From here we can extract the individual proxy IP addresses and port numbers to use for scraping.

In [72]:
%%node
var scraper = require("table-scraper");
var data;
scraper.get("https://www.proxynova.com/proxy-server-list/elite-proxies/").then(function(tableData) {
        data = tableData[0];
        console.log(data);
    });

... ... ...
[
{
'0': 'document.write("164" + ".15" + "5.1" + "50." + "31");',
'Proxy Port': '80',
'Last Check': '',
'Proxy Speed': '695 ms',
Uptime: '50%\n                \n                 (413)',
'Proxy Country': 'United States\n\n                                             - Chicago',
Anonymity: 'Elite'
},
{
'0': 'document.write("104" + ".16" + "0.1" + "89." + "3");',
'Proxy Port': '80',
'Last Check': '',
'Proxy Speed': '572 ms',
Uptime: '90%\n                \n                 (486)',
'Proxy Country': 'United States\n' +
'\n' +
'                                             - Los Angeles',
Anonymity: 'Elite'
},
{
'0': 'document.write("164" + ".15" + "5.1" + "45." + "0");',
'Proxy Port': '80',
'Last Check': '',
'Proxy Speed': '1038 ms',
Uptime: '58%\n                \n                 (86)',
'Proxy Country': 'United States\n\n                                             - Chicago',
Anonymity: 'Elite'
},
{
'0': 'document.write("164" + ".15" + "5.1" + "47." + "31");',
'Proxy Port': '8

We will clean up the IP address with regular expressions, as it has been rendered as a JavaScript operation string. Then we will remove the 13th element from the JSON array, since it points to an ad that is embedded in the table.

In [74]:
%%node
for (var i = 0; i < data.length; i++) {
    ip = data[i]['0'].replace(/[^0-9\.]+/g,"");
    console.log(ip);
    ip = ip.slice(1);
    data[i]['0'] = ip;
    //console.log(ip);
}
var removed = data.splice(12,1);
data = JSON.stringify(data);
console.log(data);

... ... ... ... ... ... 164.155.150.31
104.160.189.3
164.155.145.0
164.155.147.31
164.155.151.1
47.74.226.8
202.162.194.70
120.194.150.70
43.255.113.232
216.137.184.253
66.94.120.161
80.48.119.28
8.142.142.250
182.61.201.201
61.79.139.30
213.230.97.10
59.11.52.237
121.199.78.228
39.175.75.24
47.104.237.35
112.6.117.135
67.212.186.101
58.20.184.187
195.158.18.236
43.255.113.232
54.175.197.235
183.247.211.50
151.106.18.124
222.65.228.96
106.14.255.124
169.57.1.85
106.158.156.213
64.227.62.123
20.47.108.204
77.50.104.110
[{"0":"64.155.150.31","Proxy Port":"80","Last Check":"","Proxy Speed":"695 ms","Uptime":"50%\n                \n                 (413)","Proxy Country":"United States\n\n                                             - Chicago","Anonymity":"Elite"},{"0":"04.160.189.3","Proxy Port":"80","Last Check":"","Proxy Speed":"572 ms","Uptime":"90%\n                \n                 (486)","Proxy Country":"United States\n\n                                             - Los Angeles","

In [75]:
%%node
var proxies = [];
function addProxies(obj) {
    proxies.push(obj["0"] + ":" + obj["Proxy Port"]);
}
data.forEach(obj => addProxies(obj));
console.log(proxies);

... ...
[
'64.155.150.31:80',    '04.160.189.3:80',
'64.155.145.0:80',     '64.155.147.31:80',
'64.155.151.1:80',     '7.74.226.8:5001',
'02.162.194.70:41766', '20.194.150.70:9091',
'3.255.113.232:8084',  '16.137.184.253:80',
'6.94.120.161:443',    '0.48.119.28:8080',
'82.61.201.201:80',    '1.79.139.30:80',
'13.230.97.10:3128',   '9.11.52.237:80',
'21.199.78.228:8888',  '9.175.75.24:30001',
'7.104.237.35:81',     '12.6.117.135:8085',
'7.212.186.101:80',    '8.20.184.187:9091',
'95.158.18.236:3128',  '3.255.113.232:8081',
'4.175.197.235:80',    '83.247.211.50:30001',
'51.106.18.124:1080',  '22.65.228.96:8085',
'06.14.255.124:80',    '69.57.1.85:8123',
'06.158.156.213:80',   '4.227.62.123:80',
'0.47.108.204:8888',   '7.50.104.110:3128'
]


In [25]:
%%node
function renameKey(obj, oldKey, newKey) {
    console.log("Renaming key");
    obj[newKey] = obj[oldKey];
    delete obj[oldKey];
}

data.forEach(obj => renameKey(obj, "0", "Proxy IP"));
console.log(data);
//const newData = JSON.stringify(data);
//console.log(newData);

... ... ... ...
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
Renaming key
[
{
'Proxy Port': '8000',
'Last Check': '',
'Proxy Speed': '3891 ms',
Uptime: '4%\n                \n                 (68)',
'Proxy Country': 'India\n\n                                             - Mumbai',
Anonymity: 'Elite',
'Proxy IP': undefined
},
{
'Proxy Port': '80',
'Last Check': '',
'Proxy Speed': '1948 ms',
Uptime: '56%\n                \n                 (513)',
'Proxy Country': 'United States',
Anonymity: 'Elite',
'Proxy IP': undefined
},
{
'Proxy Port': '80',
'Last Check': '',
'Proxy Speed': '2344 