Bitcoin to Neo4j
See the cypher examples for cool screenshots.
This script runs through a bitcoin blockchain and inserts it in to a Neo4j graph database.
I use this script to power my bitcoin blockchain browser: http://learnmeabitcoin.com/browser
- The resulting Neo4j database is roughly 6x the size of the blockchain. So if the blockchain is 100GB, your Neo4j database will be 600GB.
- It may take 60+ days to finish importing the entire blockchain. Instead of doing a bulk import of the entire blockchain, this script runs through each
blk.dat1 file and inserts each block and transaction it encounters. So whilst it takes "a while" for an initial import, when it's complete it will continuously add new blocks as they arrive.
Nonetheless, you can still browse whatever is in the database whilst this script is running.
I have only used this on Linux (Ubuntu).
It should work on OSX and Windows, but I haven't got installation instructions for those.
This script makes use of the following software:
sudo add-apt-repository ppa:bitcoin/bitcoin sudo apt update sudo apt install bitcoind
sudo add-apt-repository ppa:webupd8team/java sudo apt update sudo apt install oracle-java8-installer wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add - echo 'deb http://debian.neo4j.org/repo stable/' >/tmp/neo4j.list sudo mv /tmp/neo4j.list /etc/apt/sources.list.d sudo apt update sudo apt install neo4j
- PHP 7.0+ - The main script and it's library functions are written in PHP.
# The extra php7.0-* libraries are needed for this script to run. sudo apt install php7.0 php7.0-dev php7.0-gmp php7.0-curl php7.0-bcmath php7.0-mbstring
- Redis 3.2+ - This is used for storing the state of the import, so that the script can be stopped and started at any time.
sudo apt install build-essential cd /usr/local/share sudo wget http://download.redis.io/releases/redis-stable.tar.gz sudo tar -xvzf redis-stable.tar.gz sudo rm redis-stable.tar.gz cd redis-stable cd deps sudo make geohash-int jemalloc lua hiredis linenoise cd .. sudo make sudo make install cd utils sudo ./install_server.sh
This is the driver that allows PHP to connect to your Neo4j database. I have included a
composer.json file, so navigate to the project's home directory and install it with:
This allows PHP to connect to Redis. These instructions should install the version needed for PHP7 (which is different to the default installation instructions that come with phpredis, which is aimed at PHP5).
# This is needed for phpize (used in a moment) sudo apt install php7.0-dev # Install phpredis cd /usr/local/share sudo wget https://github.com/phpredis/phpredis/archive/php7.zip sudo unzip php7.zip sudo rm php7.zip cd phpredis-php7/ sudo phpize sudo make sudo make install # Install mod sudo touch /etc/php/7.0/mods-available/redis.ini sudo bash -c "echo extension=redis.so > /etc/php/7.0/mods-available/redis.ini" sudo ln -s /etc/php/7.0/mods-available/redis.ini /etc/php/7.0/cli/conf.d/20-redis.ini
config.php file contains all the configuration settings. You probably only need to change:
- The location of your
- Your Neo4j username and password.
define("BLOCKS", '/home/user/.bitcoin/blocks'); // the location of the blk.dat files you want to read define("TESTNET", false); // are you reading blk.dat files from Bitcoin's testnet? define("NEO4J_USER", 'neo4j'); define("NEO4J_PASS", 'neo4j'); define("NEO4J_IP", 'localhost'); define("NEO4J_PORT", '7687'); // this is the port used for the bolt protocol define("REDIS_IP", 'localhost'); define("REDIS_PORT", '6379');
Make sure Neo4j is running (
sudo service neo4j start), then start running the script with:
This will start importing in to Neo4j, printing out the results as it goes.
Here's an annotated explanation of the results
You can stop and restart the script at any time, as the script stores its position using Redis.
The script sets the following keys in Redis:
bitcoin-to-neo4j- This stores the number of the current blk.dat file, and it's position in that file.
bitcoin-to-neo4j:orphans- This stores the blockhashes of orphan blocks. You see, the blocks in the blk.dat files are not stored in order (based on their height), so by saving blocks that we cannot calculate a height for yet (because we haven't encountered the block it builds upon), we are able set the height later on.
bitcoin-to-neo4j:tip- This is the height of the current longest chain we have got in Neo4j. It's not needed for the script to work, but it's useful for seeing the progress of the script.
When Redis is installed, you can look at each of these with:
redis-cli hgetall bitcoin-to-neo4j redis-cli hgetall bitcoin-to-neo4j:orphans redis-cli hgetall bitcoin-to-neo4j:tip
How can I query this database?
Here are some example cypher queries, including some screenshots.
What are the hardware requirements?
- A really ****ing big SSD.
Other than that, I run this on my Thinkpad X220 (8GB Ram, 4x2.60GHz CPU) without any problems. It took about 2 weeks to import the full testnet blockchain (50GB total), but my laptop didn't explode.
However, if you want to help things along:
- Make sure you're using an SSD for fast write speeds.
- Give as much RAM to Neo4j as possible. This helps with looking up existing nodes in the database, which this script does continually as it merges new transactions on to old ones.
- Heap Size: I think a minimum 4GB does the trick.
- Page Cache: Whatever RAM you have got left over.
CPU isn't much of a factor in comparison to RAM and a fast disk.
See Neo4j Performance for more details.
How big is this graph database?
It's constantly growing, but as of 17 May 2017 (blockchain height: 466,874, blockchain size: 114GB):
- Nodes: 1,587,199,550
- Relationships: 2,503,359,310
- Size: 625 GB
Does this import the entire blockchain?
Yes, no data is left behind. If you really wanted to you could convert the data back in to binary as it is found in the raw
For example, the "serialized" transaction data2 on my explorer is actually data from the graph converted back in to it's original format: Transaction: be56667fed4336efc08c6a1addfba0008169861af906e7f436ffcc86935d2b2e (click on "serialized" in the top-right)
Why doesn't this use Neo4j's Bulk Import Tool?
Because I needed a script that would add blocks as they arrived.
It would involve writing another tool for a bulk import. I haven't tried.
Why is this written in PHP?
Because it's the language I knew best when I started this.
Or in other words, I'm not the king of programming, and PHP does the job.