Bitcoin Core Web Scraper

A script for web scraping and downloading the Bitcoin Core bin directory.
Ideal for creating your own mirror!

Usage

Dependencies

Run-time dependency:

Python3 + pip (python3 python3-dev python3-pip)
Additional libs for Scrapy (libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev)

More packages will be downloaded via pip, see next section.

Prepare

I advice you to use a Python virtual environment, create & activate such an environment via:

python3 -m venv env
source env/bin/activate

Next, install the required packages via:

pip install -r requirements.txt

Run scraper

Execute scraper and start downloading:

scrapy crawl bitcoincore

Or by running: ./start_spider.py

Note: Files are stored within the bin sub-folder of the root-folder of this project.

Optionally, execute scraper and output the meta-data to a "feed" file (eg. JSON file):

scrapy crawl bitcoincore -O bitcoincore.json

Docker Image

The Docker image is available on DockerHub.

Note: The Docker Image will start the scrawler using a cronjob, so the bitcoin spider runs automatically once a week.

I provided a docker-compose file for convenience.

Building Docker image

Create a Docker image locally using:

docker build -t danger89/bitcoinscraper .

Learn & Debug

You can use the Scrapy shell to help debugging or learn how to extract data when using scrapy:

scrapy shell 'https://bitcoincore.org/bin/'

Check the response object for data, just an example:

response.css('pre a')[3].get()

External Links

More info:

Scrapy homepage
Scrapy Tutorial docs (ideal for beginners)
APScheduler Cron docs

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
bitcoinscraper		bitcoinscraper
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bitcoinscraper-compose.yml		bitcoinscraper-compose.yml
build_and_push_image.sh		build_and_push_image.sh
cronjob_spider.py		cronjob_spider.py
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
start_spider.py		start_spider.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bitcoin Core Web Scraper

Usage

Dependencies

Prepare

Run scraper

Docker Image

Learn & Debug

External Links

About

Languages

License

melroy89/bitcoin-core-web-scraper

Folders and files

Latest commit

History

Repository files navigation

Bitcoin Core Web Scraper

Usage

Dependencies

Prepare

Run scraper

Docker Image

Learn & Debug

External Links

About

Topics

Resources

License

Stars

Watchers

Forks

Languages