# CoinMetrics Case Study

Objective - to evaluate skills and abilities in multiple ways:
1. importing data
2. wrangling data
3. exploring data
4. analysis
5. modelling
6. communicating results

Provide:
1. A written explanation of how to approach the problem
2. Present the beginning phases of implementation using coin metrics data

Of the four options made available in the case study, option 3 was chosen

### Advocating for CoinMetric's data

Produce quality research that is of value to potential clients (doesnâ€™t have to be complete) with a particular focus on network data

### Initial ideas

My first rough ideas were: 
1. comparing different Bitcoin based chains, (BTC, BCH, LTC, BSV) to test the influence of whales and compare this to their respective claims to be a SoV  or alternative to cash.
2. developing some of the research by Willy Woo. I find his research particularly interesting. Particularly, 
    1. days destroyed, 
    2. hodl waves, 
    3. thermo cap, 
    4. average cap. 
 
I think the following ideas are also interesting and worth investigating, but not possible within the scope of this exercise:
 
1. Tracking the number of twitter followers of various crypto-twitter thought leaders and celebrities to test the hypothesis that *"an increase in follower numbers shows that new retail investors are entering crypto-markets, and an increase in price is expected soon"*
 
 Thought leaders / crypto celebrities could be further grouped by what types of coins they speak about most  - SoV, smart contracts, DeFi, etc. 
 
 Weibo could be analysed as well as Twitter to understand Chinese markets, Korean twitter could be analysed for the Korean retail market, etc. 

2. I have an existing side project which has the goal of using a recurrent neural net to predict BTC price movements. The app (model, stored data, data pipeline, visualization of results) will run autonomously on Google Cloud Platform. Candle data is consumed from CoinAPI.io and stored in BigQuery. 

 Technical indicators will be calculated and used as additional factors to the model. Sentiment analysis from news outlets (Bloomberg, FT) would be added later. 

 The model would be written using TensorFlow, and the BigQuery tables names would use BQ's date format capabilites. This would make the project faster and cheaper. 

### 1. Testing the influence of whales on BTC forks and comparing to each chains claims e.g. as a store of value or alternative to cash

If a country has a much lower median income than mean income, it probably has high income inequality. 

Similarly, if a chain has a much smaller median transaction size than mean transaction size, it probably is not used by regular users and is controlled mostly be whales. 

This would contradict any claims the fork makes to being a form of digital cash. 

We assume that:
1. If a blockchain is functioning as digital cash, then most of its transactions will be small. 


2. Conversely, if a blockchain has relatively little organic use by normal users then whales (users with large holdings) will make up a large proportion of on-chain activity and would have average transaction sizes much larger than normal users.


3. Where the ratio of mean to median transaction value is relatively high, we have an environment where the mean value is much higher than the median value, which shows that daily total value transacted is dominated by a few large transactions, rahter than many small value transactions. This would show that whales dominate the blockchain (and likely market behaviour) rather than members of the general public, or retail investors. 

#### Fields
1. _TxTfrValMeanUSD_ 

 The sum USD value of native units transferred divided by the count of transfers (i.e., the mean "size" in USD of a transfer) that interval.


2. *TxTfrValMedUSD*

 The median USD value transferred per transfer (i.e., the median "size" in USD of a transfer) that interval.


3. *TxTfrValUSD*

 The sum USD value of all native units transferred (i.e., the aggregate size in USD of all transfers) that interval.


#### Chains:
BTC, BCH, BSV, LTC, DOGE

In [28]:
# import and setup
import requests
import json

import cufflinks as cf
import pandas as pd
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import chart_studio.plotly as py

init_notebook_mode(connected=True)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [4]:
def get_metricdata(asset_id, payload):
    url = f'https://community-api.coinmetrics.io/v2/assets/{asset_id}/metricdata'
    response = requests.get(
        url=url,
        params=payload
    )
    
    if response.status_code == 200:
        print(f'{asset_id} - success!')
        return json.loads(response.content.decode('utf-8'))
    else:
        print(f'status_code: {response.status_code}')
        return None

In [45]:
payload = {
    'metrics':  'PriceUSD,'+
                'TxTfrValMeanUSD,'+
                'TxTfrValMedUSD,'+
                'TxTfrValUSD',
    'start': '2016-01-01',
}

asset_list = ['btc', 'ltc', 'bch', 'bsv', 'doge']
data = {}
for asset in asset_list:
    data[asset] = get_metricdata(asset, payload)

btc - success!
ltc - success!
bch - success!
bsv - success!
doge - success!


In [46]:
dataframes = {}
cols = ['PriceUSD', 'TxTfrValMeanUSD', 'TxTfrValMedUSD', 'TxTfrValUSD']
for asset in data.keys():
    values = [ each['values'] for each in data[asset]['metricData']['series']]
    index = [ each['time'] for each in data[asset]['metricData']['series']]
    
    df = pd.DataFrame.from_records(values, columns = cols)
    df.index = pd.to_datetime(index, infer_datetime_format=True).date
    
    for col in df.columns:
        df[col] = df[col].astype(float)
    
    df['TxCount'] = df.TxTfrValUSD / df.TxTfrValMeanUSD
    df['MeanMedianRatio'] = df.TxTfrValMeanUSD / df.TxTfrValMedUSD
    
    dataframes[asset] = df

In [47]:
dataframes['bsv'].sample(5)
dataframes['btc'].sample(5)
dataframes['doge'].sample(5)

Unnamed: 0,PriceUSD,TxTfrValMeanUSD,TxTfrValMedUSD,TxTfrValUSD,TxCount,MeanMedianRatio
2019-01-08,87.827085,15293.281969,1.481913,102862600.0,6726.0,10319.962527
2019-05-04,52.940605,5391.805952,0.217886,108046400.0,20039.0,24746.048883
2019-03-13,64.398603,4734.273513,0.051519,41997740.0,8871.0,91893.948515
2019-07-25,161.297862,197.009298,0.747857,21825870.0,110785.999997,263.431873
2019-05-30,182.328941,17133.040429,80.586203,834721700.0,48720.0,212.605134


Unnamed: 0,PriceUSD,TxTfrValMeanUSD,TxTfrValMedUSD,TxTfrValUSD,TxCount,MeanMedianRatio
2016-12-22,858.501434,2219.42861,27.059055,1471572000.0,663041.0,82.021659
2018-02-18,10439.446327,32676.089813,109.853981,14011250000.0,428792.0,297.450212
2018-10-26,6428.006649,6602.929445,64.192228,3881882000.0,587903.0,102.861821
2018-04-18,8165.786386,12411.456127,96.392454,6503963000.0,524029.0,128.759624
2018-08-01,7599.357486,7669.213932,77.581727,4432622000.0,577976.0,98.853355


Unnamed: 0,PriceUSD,TxTfrValMeanUSD,TxTfrValMedUSD,TxTfrValUSD,TxCount,MeanMedianRatio
2018-08-20,0.00237,1150.942894,0.684651,84924620.0,73787.0,1681.06452
2017-06-12,0.00312,6763.308716,2.662201,217961100.0,32227.0,2540.495477
2018-02-01,0.004975,1633.490615,2.251489,119878600.0,73388.0,725.515611
2018-10-08,0.005577,7262.118596,1.544734,404420100.0,55689.0,4701.208926
2017-06-08,0.003404,5941.453002,5.1025,175201600.0,29488.0,1164.420047


In [48]:
bsv_mean = go.Scatter(
    x=dataframes['btc'].index,
    y=dataframes['btc'].TxTfrValMeanUSD,
    name='BTC mean',
    #marker=dict(color='#ffcdd2')
)
bsv_median = go.Scatter(
    x=dataframes['btc'].index,
    y=dataframes['btc'].TxTfrValMedUSD,
    name='BTC median'
    #marker=dict(color='#ffcdd2')
)

data = [bsv_mean, bsv_median]
layout = go.Layout(
    title="BTC median and mean transaction values by day",
    xaxis=dict(title='Date'),
    yaxis=dict(title='USD value'),
    yaxis_type='log'
)

fig = go.Figure(data=data, layout=layout)

py.iplot(fig) 

### Plot the ratio of daily mean to median transaction values for each asset 

In [49]:
def scatter_plot(asset, name):
    return go.Scatter(
        x=dataframes[asset].index,
        y=dataframes[asset].MeanMedianRatio,
        name=name
    )
        
data = [scatter_plot(asset, asset) for asset in dataframes.keys()]
layout = go.Layout(
    title="ratio of daily mean to median transaction value",
    xaxis=dict(title='Date'),
    yaxis=dict(title='Ratio'),
    yaxis_type='log'
)

fig = go.Figure(data=data, layout=layout)

py.iplot(fig) 

### Conclusions
The chart above shows that BTC has the lowest ratio of mean to median daily transaction value. This suggests that it has more organic use and is less influenced by whales than the other blockchains on the chart. 


Using this ratio as a proxy to measure organic use, the chain with the second most organic use is Litecoin.


Since the start of 2019, Dogecoin influence of whales has been decreasing

Of the two contentious hard forks, Bitcoin cash shows two distinct phases with different characterists in each. From its inception in August 2017 to November 2018, the influence of whales and inorganic behaviour increased at a steady rate. Notably at the coins genesis, there appears to have been a large organic user base transacting daily, bringing the median transaction value to within 50 - 100x the mean daily transaction value. This was lower than Bitcoins, which had a much more consistent but higher ratio of 120 - 200.


After November 10 2018, the ratio increases from an average of approximately 500 to approximately 10,000. This is a stark and abrupt change in the daily ratio, and suggests that either organic use drastically decreased or it (very suddenly) started being used for very large value transfers by relatively few users.


Dogecoin, despite its status as a "joke" blockchain, appears to have more widespread organic use than either BCH or BSV, but less than BTC or LTC. Additionally, even though Elon Musk has professed to be a [fan](https://twitter.com/elonmusk/status/1113009339743100929) , it seems unlikely that he is using it to transmit a large fraction of his personal wealth.