# Bitcoin - Complex Graph Analysis

## Steps
1. [Parse blockchain](#Parse-blockchain-using-btcgraph.py)
2. [Data preparation (BigQuery)](#2.-BigQuery)  
    2.1. [Upload data](#2.1.-Upload-data-to-Google-Big-Query)  
    2.2. [BigQuery](#2.2.-BigQuery)

3. [Build graph](#3.-Build-Graph-using-NetworKit)
4. [Analysis](#4.-Analysis)  
    4.1. The graph object  
    4.2. Connected Components  
    4.3. [](#)

***

## 1. Parse blockchain using btcgraph.py

In [5]:
%run run.py -loc data -raw 1 -wt 1

Starting btc graph version 0.1.0 with the following arguments:
current wd:       /home/nero/python/wu/btc/python-bitcoin-graph
startfile:        blk00000.dat 
endfile:          [31mdeactivated[0m
starttx:          [31mdeactivated[0m
endtx:            [31mdeactivated[0m
endts:            [31mdeactivated[0m
blklocation:      data         
format:           [31mdeactivated[0m
rawedges:         [32mactivated[0m
withts:           [32mactivated[0m
googlebigquery:   [31mdeactivated[0m

Initializing...
[########################################]

10:56:40  -  New BtcGraph initialized
10:56:41  -  Start building...
10:56:41  -  [32mBlock File # 0/4[0m
10:56:41  -  Processing data/blk00000.dat
10:57:09  -  Graph has       18,216,888 bytes
10:57:09  -  Graph has               17 mb
10:57:09  -  ---  -------   ----------------------------[0m
10:57:09  -  -->  23.2 GB   total memory[0m
10:57:09  -  -->  16.6 GB   of memory available[0m
10:57:09  -  -->   5.7 GB   memory used[0m

***

## 2. BigQuery

### 2.1. Upload data to Google Big Query 

In [16]:
%run run.py -gbq 1

startfile:        [31mdeactivated[0m
endfile:          [31mdeactivated[0m
starttx:          [31mdeactivated[0m
endtx:            [31mdeactivated[0m
endts:            [31mdeactivated[0m
blklocation:      [31mdeactivated[0m
format:           [31mdeactivated[0m
rawedges:         [31mdeactivated[0m
withts:           [31mdeactivated[0m
googlebigquery:   [32mactivated[0m

Initializing...
[########################################]

raw_blk_0.csv      successfully uploaded   
raw_blk_1.csv      successfully uploaded   
raw_blk_2.csv      successfully uploaded   
[###]
-----------------------------------------


### 2.2. BigQuery
   * Fetch distinct addresses
   * Create index ID for each address
   * Create edge list from transactions with new indexes

### 2.2. Download edge list from google cloud as csv 

In [11]:
import os 
from google.cloud import storage

credentials_path = ".gcpkey/wu-btcgraph.json"
bucket_name      = "wu-bitcoin"
file_name        = "btc.csv"
target_folder    = "graph"

if not os.path.isdir(f'./{target_folder}'):
    os.makedirs(f'./{target_folder}')

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credentials_path

storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
for file_name in file_names:
    blob = bucket.blob(file_name)
    blob.download_to_filename(target_folder + "/" + file_name)
    print(f"{bucket_name}.{file_name} copied to {target_folder + "/" + file_name}")

NameError: name 'graph' is not defined

***

## 3. Build Graph using NetworKit

In [None]:
file_location = "test.csv"
G = graphio.EdgeListReader(',', 0, directed=True).read(file_location)

***

## 4. Analysis

In [21]:
%matplotlib inline
import matplotlib.pyplot as plt
import networkit as nk

### 4.1. The graph object

In [4]:
nodes = G.numberOfNodes()
edges = G.numberOfEdges()
print(f"G has {nodes} nodes and {edges} edges")

'wu-btcgraph.json'

### 4.2. Connected components

In [2]:
cc = nk.components.ConnectedComponents(G)
cc.run()
print("number of components ", cc.numberOfComponents())

### 4.3. Degree distribution

In [5]:
dd = sorted(nk.centrality.DegreeCentrality(G).run().scores(), reverse=True)
plt.xscale("log")
plt.xlabel("degree")
plt.yscale("log")
plt.ylabel("number of nodes")
plt.plot(dd)
plt.show()

In [17]:
try:
    import powerlaw
    fit = powerlaw.Fit(dd)
except ImportError:
    print ("Module powerlaw could not be loaded")

'/home/nero/python/wu/btc/python-bitcoin-graph'

In [6]:
bucket = storage_client.bucket("wu-bitcoin")

In [9]:
blob = bucket.blob("btc.csv")
blob.download_to_filename("btcttttttttt.csv")

In [20]:
from networkit import *

In [18]:
g = graph.Graph()

In [19]:
g.numberOfEdges()

0

In [22]:
import pandas as pd

In [25]:
pd.DataFrame([(1,2,3),(3,4,5)], columns=["ts", "from", "to"])

Unnamed: 0,ts,from,to
0,1,2,3
1,3,4,5
