proxy_scraper

scrapes "https://free-proxy-list.net/" for proxies and store them in a json, csv, txt for use in projects involved in scraping that may require proxies

pandas was used because it was just a quick way to parse the tables for me

install libraries

make sure you have bs4, lxml, pandas libraries for python3
you can also run pip install -r requirements.txt or if you have pipenv pipenv install -r requirements.txt to get the dependencies

rotation

rotation built in. It tries to make connection and any proxy that fail to make connections are removed

running

i use this in other projects so you can just import the scrape() function from proxy_scrape and it will give you the local files. Example how is shown in test.py.
proxy_rotate.py will rotate proxies and headers for the URL you are interested in and give you a bs4 object as return. Import the function and run it the same way

Docker

there is a dockerfile, I still need to test it out

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
proxy_rotate.py		proxy_rotate.py
proxy_scrape.py		proxy_scrape.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

proxy_scraper

install libraries

rotation

running

Docker

About

Releases

Packages

Languages

epeenphrine/proxy_scraper

Folders and files

Latest commit

History

Repository files navigation

proxy_scraper

install libraries

rotation

running

Docker

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages