# About bitcoins
The bitcoin blockchain stores all addresses ever used and transactions between them. Despite the seeming anonymity of this cryptocurrency, there are well-known methods that allow you to link many addresses, assuming that they belong to the same owner (this can be a person or a company). Since all transactions between addresses are transparent, we can see how much money has gone from one owner to another (this is called cash flow).

***It is necessary to calculate cash flow between two clusters.***

***Important***

A bitcoin transaction is a set of input and output addresses, as well as an indication of how many bitcoins each address contributes to a given transaction (for input) or takes from a transaction (for output).It is worth noting that nothing is known about how many bitcoins a particular input address forwards on any other weekend. Also, part of the funds from the amount of inputs goes as a fee (fee) for miners when a transaction is included in a new block. We can find fee (fee> = 0) as the difference between the sum of the inputs and the sum of the outputs of the transaction.

In [1]:
#library import
import pandas as pd
import numpy as np

#### Table with addresses and clusters

In [2]:
#check table with addresses and clusters
address_clust = pd.read_csv('address_clust.csv')
display(address_clust.head())

Unnamed: 0,address_id,cluster_id
0,71413451,1
1,71411914,1
2,71410369,1
3,71410100,1
4,71410040,1


In [3]:
address_clust.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6996 entries, 0 to 6995
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   address_id  6996 non-null   int64
 1   cluster_id  6996 non-null   int64
dtypes: int64(2)
memory usage: 109.4 KB


In [4]:
#check duplicates
address_clust.duplicated().sum()

0

In [5]:
#check nulls
address_clust.isnull().sum()

address_id    0
cluster_id    0
dtype: int64

#### Conclusion

We have 6996 entries and 2 columns:
- address_id - input and output addresses
- cluster_id - cluster (1 or 2)

No duplicates or nulls, data types are normal.

#### Table with tranzactions

In [6]:
#check table with tranzactions
address_stats = pd.read_csv('address_stats.csv')
display(address_stats.head())

Unnamed: 0,id,address_id,transaction_id,received,sent
0,87134765,46402336,19162323,0.0,1800000.0
1,87134766,45919235,19162323,0.0,1071052.0
2,87134767,46529090,19162323,1000000.0,0.0
3,87134768,46529091,19162323,1821052.0,0.0
4,87154706,46529090,19166856,0.0,1000000.0


In [7]:
address_stats.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76724 entries, 0 to 76723
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   id              76724 non-null  int64  
 1   address_id      76724 non-null  int64  
 2   transaction_id  76724 non-null  int64  
 3   received        76724 non-null  float64
 4   sent            76724 non-null  float64
dtypes: float64(2), int64(3)
memory usage: 2.9 MB


In [8]:
#check duplicates
#address_stats.duplicated().sum()
address_stats = address_stats.drop_duplicates().reset_index(drop=True)
address_stats.duplicated().sum()

0

In [9]:
#check nulls
address_stats.isnull().sum()

id                0
address_id        0
transaction_id    0
received          0
sent              0
dtype: int64

#### Conclusion
We have 76724 entries и 5 columns:

- id - id 
- address_id - input and output addresses
- transaction_id - transaction
- received - the amount in satoshi (1 bitcoin = 10 ^ 8 satoshi) received by this address in this transaction
- sent - the amount in satoshi sent by this address in this transaction

No nulls, data types are normal.

In address_id we have additional addresses, that are external to cluster 1 and 2; now it is cluster 0.

We had 22279 duplicates - 29% of all. On the one hand, this is a lot, on the other, it is a complete double.
I decided to delete them, because a full take will increase the amounts for us, and we do not need this.

#### Merge table

In [10]:
#merge tables
new_clusters = address_stats.merge(address_clust, on = 'address_id', how = 'outer')

In [11]:
new_clusters.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 54445 entries, 0 to 54444
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   id              54445 non-null  int64  
 1   address_id      54445 non-null  int64  
 2   transaction_id  54445 non-null  int64  
 3   received        54445 non-null  float64
 4   sent            54445 non-null  float64
 5   cluster_id      37625 non-null  float64
dtypes: float64(3), int64(3)
memory usage: 2.9 MB


In [12]:
#set 0 cluster for NaN
new_clusters['cluster_id'] = new_clusters['cluster_id'].fillna(value='0').astype(int)
display(new_clusters.head())

Unnamed: 0,id,address_id,transaction_id,received,sent,cluster_id
0,87134765,46402336,19162323,0.0,1800000.0,0
1,87154708,46402336,19166856,900000.0,0.0,0
2,87335816,46402336,19205422,0.0,900000.0,0
3,88124870,46402336,19374859,5500.0,0.0,0
4,95910795,46402336,21055878,0.0,5500.0,0


In [13]:
new_clusters.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 54445 entries, 0 to 54444
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   id              54445 non-null  int64  
 1   address_id      54445 non-null  int64  
 2   transaction_id  54445 non-null  int64  
 3   received        54445 non-null  float64
 4   sent            54445 non-null  float64
 5   cluster_id      54445 non-null  int32  
dtypes: float64(2), int32(1), int64(3)
memory usage: 2.7 MB


#### Conclusion
Now we have 54445 entries и 6 columns:

    id - id
    address_id - input and output addresses
    transaction_id - transaction
    received - the amount in satoshi (1 bitcoin = 10 ^ 8 satoshi) received by this address in this transaction
    sent - the amount in satoshi sent by this address in this transaction
    cluster_id - 0, 1 or 2 cluster

No duplicates or nulls, data type in cluster_id changed.

#### Calculate bitcoins

In [14]:
#use gropby for unique transaction_id
tranzactions = new_clusters.groupby('transaction_id').agg({'id': 'count'}).reset_index()
tranzactions['transaction_id']=tranzactions['transaction_id'].astype(str)
tranzactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13753 entries, 0 to 13752
Data columns (total 2 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   transaction_id  13753 non-null  object
 1   id              13753 non-null  int64 
dtypes: int64(1), object(1)
memory usage: 215.0+ KB


In [24]:
def clusters(row):
    """Function for calculating how many bitcoins are sent from one cluster to another.

    Function accept row with transaction_id and calculate sum(sent) for unique transaction
    in clasters. Calculate percentage of received sum. Multiplication and round in the end
    """

    transaction_id = row['transaction_id']
    tranz = new_clusters.query('transaction_id=='+transaction_id)
    s1= tranz.query('cluster_id==1')['sent'].sum() #сумма отправленных в кластере 1
    s2=tranz.query('cluster_id==2')['sent'].sum() #сумма отправленных в кластере 2
    s0=tranz.query('cluster_id==0')['sent'].sum() #сумма отправленных в кластере 0
    r1= tranz.query('cluster_id==1')['received'].sum() #сумма полученных в кластере 1
    r2= tranz.query('cluster_id==2')['received'].sum() #сумма полученных в кластере 2
    r0= tranz.query('cluster_id==0')['received'].sum() #сумма полученных в кластере 0
    
    if from_clust==1 and s1==0: #если нет отправленных, то не учитывыется при подсчете
        return 0
    if from_clust==2 and s2==0: #если нет отправленных, то не учитывыется при подсчете
        return 0
    if from_clust==0 and s0==0: #если нет отправленных, то не учитывыется при подсчете
        return 0
    if to_clust==1 and r1==0: #если нет полученных, то не учитывыется при подсчете
        return 0
    if to_clust==2 and r2==0: #если нет полученных, то не учитывыется при подсчете
        return 0
    if to_clust==0 and r0==0: #если нет полученных, то не учитывыется при подсчете
        return 0    
    
    if from_clust==1: #количество отправленных в кластере 1 умноженное на долю отправленных этим кластером 
        result=s1*(s1/(s1+s2+s0))
    if from_clust==2: #количество отправленных в кластере 2 умноженное на долю отправленных этим кластером 
        result=s2*(s2/(s1+s2+s0))
    if from_clust==0: #количество отправленных в кластере 0 умноженное на долю отправленных этим кластером 
        result=s0*(s0/(s1+s2+s0))
    if to_clust==1: #доля полученных в кластере 1
        k_rec=r1/(r1+r2+r0)
    if to_clust==2: #доля полученных в кластере 2
        k_rec=r2/(r1+r2+r0)
    if to_clust==0: #доля полученных в кластере 0
        k_rec=r0/(r1+r2+r0)
        
    return round(result*k_rec) #умножение

In [16]:
from_clust=1
to_clust=2
tranzactions['column_one'] = tranzactions.apply(clusters, axis=1)
bitcoin_1 = tranzactions.column_one.sum()/ 10**8
print('The sum of bitcoins from cluster 1 to cluster 2: ',bitcoin_1)

The sum of bitcoins from cluster 1 to cluster 2:  64.73247918


In [17]:
from_clust=2
to_clust=1
tranzactions['column_two'] = tranzactions.apply(clusters, axis=1)
bitcoin_2 = tranzactions.column_two.sum()/ 10**8
print('The sum of bitcoins from cluster 2 to cluster 1: ',bitcoin_2)

The sum of bitcoins from cluster 2 to cluster 1:  68.06727618


In [18]:
from_clust=1
to_clust=0
tranzactions['column_three'] = tranzactions.apply(clusters, axis=1)
bitcoin_3 = tranzactions.column_three.sum()/ 10**8
print('The sum of bitcoins from cluster 1 to cluster 0: ',bitcoin_3)

The sum of bitcoins from cluster 1 to cluster 0:  291.67612951


In [19]:
from_clust=0
to_clust=1
tranzactions['column_four'] = tranzactions.apply(clusters, axis=1)
bitcoin_4 = tranzactions.column_four.sum()/ 10**8
print('The sum of bitcoins from cluster 0 to cluster 1: ',bitcoin_4)

The sum of bitcoins from cluster 0 to cluster 1:  289.16274208


In [20]:
from_clust=2
to_clust=0
tranzactions['column_five'] = tranzactions.apply(clusters, axis=1)
bitcoin_5 = tranzactions.column_five.sum()/ 10**8
print('The sum of bitcoins from cluster 2 to cluster 0: ',bitcoin_5)

The sum of bitcoins from cluster 2 to cluster 0:  2.43721255


In [21]:
from_clust=0
to_clust=2
tranzactions['column_six'] = tranzactions.apply(clusters, axis=1)
bitcoin_6 = tranzactions.column_six.sum()/ 10**8
print('The sum of bitcoins from cluster 0 to cluster 2: ',bitcoin_6)

The sum of bitcoins from cluster 0 to cluster 2:  6.10432863


#### Conclusion
Most bitcoins sent ***from cluster 1 to cluster 0: 291.67612951.***

Further, with a small difference, bitcoin transactions. ***from cluster 0 to cluster 1: 289.16274208.*** 

And further ***from cluster 2 to cluster 1 (68.06727618)*** and ***from cluster 1 to cluster 2 (64.73247918)***

At the end we have transactions ***from cluster 0 to cluster 2 (6.10432863)*** and ***from cluster 2 to cluster 0 (2.43721255).***


Total bitcoins sent:

- from cluster 0 - 295.26707071 bitcoins.

- from 1 cluster - 356.40860869 bitcoins.

- from 2 clusters - 70.50448873 bitcoins.


Total amount: 722.18016813 bitcoins.

Since we do not need to calculate how many bitcoins have arrived, we do not take fee.

#### Calculate fee

In [22]:
#column for fee
fee = new_clusters.groupby(['transaction_id','cluster_id']).agg({'received':'sum', 'sent':'sum'}).reset_index()
fee['fee'] = (round(fee['sent']-fee['received']))/ 10**8
display(fee.head())

Unnamed: 0,transaction_id,cluster_id,received,sent,fee
0,19162323,0,1821052.0,2871052.0,0.0105
1,19162323,1,1000000.0,0.0,-0.01
2,19166856,0,990000.0,0.0,-0.0099
3,19166856,1,0.0,1000000.0,0.01
4,19174421,0,4590000.0,4800000.0,0.0021


In [29]:
print('Cluster 1 spent on fee:', (fee.query("cluster_id==1 & fee >= 0")['fee']).sum(), 'bitcoins')
print('Cluster 2 spent on fee:', (fee.query("cluster_id==2 & fee >= 0")['fee']).sum(), 'bitcoins')

Cluster 1 spent on fee: 356.63479859999995 bitcoins
Cluster 2 spent on fee: 70.71802861 bitcoins


#### Conclusion
***Cluster 1 spent on fee: 356.63479860 bitcoins, Cluster 2 spent on fee: 70.71802861 bitcoins.***