# CoinMetrics Case Study

Objective - to evaluate skills and abilities in multiple ways:
1. importing data
2. wrangling data
3. exploring data
4. analysis
5. modelling
6. communicating results

Provide:
1. A written explanation of how to approach the problem
2. Present the beginning phases of implementation using coin metrics data

Of the four options made available in the case study, option 3 was chosen

### Advocating for CoinMetric's data

Produce quality research that is of value to potential clients (doesn’t have to be complete) with a particular focus on network data

### Initial ideas

My first rough ideas were: 
1. comparing different Bitcoin based chains, (BTC, BCH, LTC, BSV) to test the influence of whales and compare this to their respective claims to be a SoV  or alternative to cash.
2. developing some of the research by Willy Woo. I find his research particularly interesting. Particularly, 
    1. days destroyed, 
    2. hodl waves, 
    3. thermo cap, 
    4. average cap. 
 
I think the following ideas are also interesting and worth investigating, but not possible within the scope of this exercise:
 
1. Tracking the number of twitter followers of various crypto-twitter thought leaders and celebrities to test the hypothesis that *"an increase in follower numbers shows that new retail investors are entering crypto-markets, and an increase in price is expected soon"*
 
 Thought leaders / crypto celebrities could be further grouped by what types of coins they speak about most  - SoV, smart contracts, DeFi, etc. 
 
 Weibo could be analysed as well as Twitter to understand Chinese markets, Korean twitter could be analysed for the Korean retail market, etc. 

2. I have an existing side project which has the goal of using a recurrent neural net to predict BTC price movements. The app (model, stored data, data pipeline, visualization of results) will run autonomously on Google Cloud Platform. Candle data is consumed from CoinAPI.io and stored in BigQuery. 

 Technical indicators will be calculated and used as additional factors to the model. Sentiment analysis from news outlets (Bloomberg, FT) would be added later. 

 The model would be written using TensorFlow, and the BigQuery tables names would use BQ's date format capabilites. This would make the project faster and cheaper. 

### 1. Testing the influence of whales on BTC forks and comparing to each chains claims e.g. as a store of value or alternative to cash

If a country has a much lower median income than mean income, it probably has high income inequality. 

Similarly, if a chain has a much smaller median transaction size than mean transaction size, it probably is not used by regular users and is controlled mostly be whales. 

This would contradict any claims the fork makes to being a form of digital cash. 

We assume that:
1. If a blockchain is functioning as digital cash, then most of its transactions will be small. 


2. Conversely, if a blockchain has relatively little organic use by normal users then whales (users with large holdings) will make up a large proportion of on-chain activity and would have average transaction sizes much larger than normal users.


3. Where the ratio of mean to median transaction value is relatively high, we have an environment where the mean value is much higher than the median value, which shows that daily total value transacted is dominated by a few large transactions, rahter than many small value transactions. This would show that whales dominate the blockchain (and likely market behaviour) rather than members of the general public, or retail investors. 

#### Fields
1. _TxTfrValMeanUSD_ 

 The sum USD value of native units transferred divided by the count of transfers (i.e., the mean "size" in USD of a transfer) that interval.


2. *TxTfrValMedUSD*

 The median USD value transferred per transfer (i.e., the median "size" in USD of a transfer) that interval.


3. *TxTfrValUSD*

 The sum USD value of all native units transferred (i.e., the aggregate size in USD of all transfers) that interval.


#### Chains:
BTC, BCH, BSV, LTC, DOGE

In [28]:
# import and setup
import requests
import json

import cufflinks as cf
import pandas as pd
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import chart_studio.plotly as py

init_notebook_mode(connected=True)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [4]:
def get_metricdata(asset_id, payload):
    url = f'https://community-api.coinmetrics.io/v2/assets/{asset_id}/metricdata'
    response = requests.get(
        url=url,
        params=payload
    )
    
    if response.status_code == 200:
        print(f'{asset_id} - success!')
        return json.loads(response.content.decode('utf-8'))
    else:
        print(f'status_code: {response.status_code}')
        return None

In [20]:
payload = {
    'metrics':  'PriceUSD,'+
                'TxTfrValMeanUSD,'+
                'TxTfrValMedUSD,'+
                'TxTfrValUSD',
    'start': '2018-09-01',
}

asset_list = ['btc', 'ltc', 'bch', 'bsv', 'doge']
data = {}
for asset in asset_list:
    data[asset] = get_metricdata(asset, payload)

btc - success!
ltc - success!
bch - success!
bsv - success!
doge - success!


In [21]:
data.keys()

dict_keys(['btc', 'ltc', 'bch', 'bsv', 'doge'])

In [25]:
dataframes = {}
cols = ['PriceUSD', 'TxTfrValMeanUSD', 'TxTfrValMedUSD', 'TxTfrValUSD']
for asset in data.keys():
    values = [ each['values'] for each in data[asset]['metricData']['series']]
    index = [ each['time'] for each in data[asset]['metricData']['series']]
    
    df = pd.DataFrame.from_records(values, columns = cols)
    df.index = pd.to_datetime(index, infer_datetime_format=True).date
    
    for col in df.columns:
        df[col] = df[col].astype(float)
    
    df['TxCount'] = df.TxTfrValUSD / df.TxTfrValMeanUSD
    df['MeanMedianRatio'] = df.TxTfrValMeanUSD / df.TxTfrValMedUSD
    
    dataframes[asset] = df

In [31]:
dataframes['bsv'].sample(5)
dataframes['btc'].sample(5)
dataframes['doge'].sample(5)

Unnamed: 0,PriceUSD,TxTfrValMeanUSD,TxTfrValMedUSD,TxTfrValUSD,TxCount,MeanMedianRatio
2019-01-01,92.519183,10837.990111,2.219771,47567940.0,4389.0,4882.481
2019-03-17,67.834524,5991.813652,0.004839,40079240.0,6689.0,1238153.0
2019-05-23,101.948639,3136.756425,0.250598,107998500.0,34430.0,12517.09
2019-08-23,135.367138,210.543987,0.94757,26559490.0,126146.999808,222.1936
2019-04-15,58.787129,504.349049,0.000494,139736000.0,277062.0,1021338.0


Unnamed: 0,PriceUSD,TxTfrValMeanUSD,TxTfrValMedUSD,TxTfrValUSD,TxCount,MeanMedianRatio
2018-11-04,6443.326841,5160.339325,61.057223,2613186000.0,506398.0,84.516443
2018-10-22,6438.415255,6417.348249,64.384153,3842054000.0,598698.0,99.672792
2019-03-11,3868.371172,4097.174848,63.441287,2796826000.0,682623.0,64.582152
2019-03-27,4027.007918,3823.455273,56.543581,2984046000.0,780458.0,67.619617
2019-04-22,5383.799975,8583.753811,64.941172,6028945000.0,702367.0,132.177378


Unnamed: 0,PriceUSD,TxTfrValMeanUSD,TxTfrValMedUSD,TxTfrValUSD,TxCount,MeanMedianRatio
2019-08-28,0.002614,185.470817,1.045685,9609058.0,51809.0,177.367702
2019-08-18,0.002589,259.118427,0.22858,18894920.0,72920.0,1133.599019
2019-08-04,0.002931,355.880275,0.277324,29748740.0,83592.0,1283.265524
2019-07-25,0.00293,290.932257,0.387732,20674230.0,71062.0,750.342938
2019-03-22,0.002017,200.227768,0.443784,14140690.0,70623.0,451.182435


In [35]:
bsv_mean = go.Scatter(
    x=dataframes['btc'].index,
    y=dataframes['btc'].TxTfrValMeanUSD,
    name='BTC mean',
    #marker=dict(color='#ffcdd2')
)
bsv_median = go.Scatter(
    x=dataframes['btc'].index,
    y=dataframes['btc'].TxTfrValMedUSD,
    name='BTC median'
    #marker=dict(color='#ffcdd2')
)

data = [bsv_mean, bsv_median]
layout = go.Layout(
    title="BTC median and mean transaction values by day",
    xaxis=dict(title='Date'),
    yaxis=dict(title='USD value'),
    yaxis_type='log'
)

fig = go.Figure(data=data, layout=layout)

py.iplot(fig) 

### Plot the ratio of daily mean to median transaction values for each asset 

In [38]:
def scatter_plot(asset, name):
    return go.Scatter(
        x=dataframes[asset].index,
        y=dataframes[asset].MeanMedianRatio,
        name=name
    )
        
data = [scatter_plot(asset, asset) for asset in dataframes.keys()]
layout = go.Layout(
    title="ratio of daily mean to median transaction value",
    xaxis=dict(title='Date'),
    yaxis=dict(title='Ratio'),
    yaxis_type='log'
)

fig = go.Figure(data=data, layout=layout)

py.iplot(fig) 