# Analyzing Portion of BTC Data

Data used for analysis was a small sample of the first DAT file ~ 30 %

Nodes: 545,889
Relationships: 741,060
Labels: {block, coinbase, tx, output, address} > 5 total
Relationships: {reward, seeds, includes, out, unlock, locked, chain} > 7 total
Properties: 18 total

## Setup

In [91]:
# Importing all necessary libraries
import json
from neo4j import GraphDatabase
from py2neo import Graph
from graphdatascience import GraphDataScience 
import pandas as pd

In [36]:
# graph = Graph(uri='neo4j://localhost:7687', user="neo4j", password="password")
# dbc = GraphDatabase.driver(uri = "bolt://localhost:7687", auth=("neo4j", "password"))
# sess = dbc.session(database="neo4j")

# Creating graph data science object
gds = GraphDataScience("bolt://localhost:7687" , auth=("neo4j", "password"))

In [38]:
# Setting our database and checking version of gds tools
gds.set_database("neo4j")
print(gds.version())

2.0.3


In [60]:
# Projecting to in memory graph so we can use it in python
relProj = {
    "relType": {
        "type": '*',
        "orientation": 'UNDIRECTED',
        "properties": {}
    }
}

G, project_result = gds.graph.project("btc_graph", 
                                      "*", 
                                      relProj)

In [None]:
# Can also retrieve projected in-memory graphs with this command
# G = gds.graph.get("graphName")

In [55]:
# Graph meta data
project_result

nodeProjection                {'__ALL__': {'label': '*', 'properties': {}}}
relationshipProjection    {'relType': {'orientation': 'UNDIRECTED', 'agg...
graphName                                                         btc_graph
nodeCount                                                            545889
relationshipCount                                                   1482120
projectMillis                                                           684
Name: 0, dtype: object

In [56]:
# Other useful information
print(G.memory_usage())
print(G.density())

18 MiB
4.973652941144905e-06


## Louvain CD Algorithm

One of the fastest modularity-based algorithms and also reveals a hierarchy of communities at different scales


In [99]:
# Running Louvain CD Alogirthm
louvain_df = gds.louvain.stream(G,  relationshipWeightProperty = None, includeIntermediateCommunities = False)

In [100]:
# Creating a dataframe with the size of each community
uniq_coms = sorted(list(set(louvain_df["communityId"])))

com_list = []
size_list = []

for each in uniq_coms:
    com = louvain_df[louvain_df.communityId == each]
    com_list.append(com['nodeId'])
    
for each in com_list:
    size_list.append(len(each))

com_size = pd.DataFrame({'communityId': uniq_coms, 'communitySize' : size_list}).sort_values("communitySize", ascending=False)
com_size.head(10)

Unnamed: 0,communityId,communitySize
5263,364460,15739
5371,388010,15602
4632,276310,13665
5659,494347,12865
5783,538533,12119
5517,439165,11898
5713,522408,11375
2281,92974,9513
5809,542184,9151
5171,343238,8835


In [102]:
totCom = len(com_size)
totCom

5856

## Modularity Optimization CD Algorithm

Detect communities in the graph based on their modularity


In [103]:
# Running Louvain CD Alogirthm
modOp_df = gds.beta.modularityOptimization.stream(G,  relationshipWeightProperty = None, maxIterations = 10, tolerance = .0001)

In [110]:
# Creating a dataframe with the size of each community
uniq_coms = sorted(list(set(modOp_df["communityId"])))

com_list = []
size_list = []

for each in uniq_coms:
    com = modOp_df[modOp_df.communityId == each]
    com_list.append(com['nodeId'])
    
for each in com_list:
    size_list.append(len(each))

com_size = pd.DataFrame({'communityId': uniq_coms, 'communitySize' : size_list}).sort_values("communitySize", ascending=False)
com_size.head(10)

Unnamed: 0,communityId,communitySize
87215,308161,54
59834,213874,40
70825,256188,38
98029,343238,32
97649,341994,29
69818,252434,18
149619,489837,17
117823,404936,16
146120,479094,14
69471,251087,14


In [113]:
totCom = len(com_size)
totCom

167296

## Label Propagation CD Algorithm

A fast algorithm for finding communities in a graph


In [111]:
# Running Louvain CD Alogirthm
labProp_df = gds.labelPropagation.stream(G,  relationshipWeightProperty = None)

In [114]:
# Creating a dataframe with the size of each community
uniq_coms = sorted(list(set(labProp_df["communityId"])))

com_list = []
size_list = []

for each in uniq_coms:
    com = labProp_df[labProp_df.communityId == each]
    com_list.append(com['nodeId'])
    
for each in com_list:
    size_list.append(len(each))

com_size = pd.DataFrame({'communityId': uniq_coms, 'communitySize' : size_list}).sort_values("communitySize", ascending=False)
com_size.head(10)

Unnamed: 0,communityId,communitySize
634,19334,6385
342,6917,4240
13990,253031,3631
404,9410,3272
14659,261580,2447
19874,354491,2407
30,386,2309
406,9448,2303
1034,31477,2248
14661,265619,1937


In [115]:
totCom = len(com_size)
totCom

28096