# Features Extraction & Clustering. Part II.
This part is focused on clustering inside top-levels clusters that were obtained in the previous part.

## Preparing

Firstly we have to re-run code of Part I before continue.

In [1]:
# Import of necessary libs and our classes
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from scipy.spatial.distance import squareform, pdist
from sklearn.cluster import DBSCAN
from cj_loader import Storer, Extractor, extract_features
import warnings
warnings.filterwarnings("ignore")

# Init storer object with given data and calculate precense of data
storer = Storer()
precense = storer.applicability()

# Calculation of data completeness matrix and weights
data_compl = storer.data_completeness()
weights = data_compl.mean().sort_values()

# transform boolean matrix to numeric
weighted = precense.copy()
for index, row in precense.iterrows():
    weighted.loc[index][:] = row * weights

# calculate distances between pairs of coins
distances = pd.DataFrame(squareform(pdist(weighted)), index=weighted.index, columns=weighted.index)

# Top-level clustering with DBSCAN algorithm
clustering = DBSCAN(eps=0.3, min_samples=3).fit(distances)
labels = clustering.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
print('Estimated number of clusters: %d' % n_clusters_)

# Clusters aggregation
weighted['label'] = labels
clusters = {}
for label in np.unique(labels):
    cl = weighted[weighted['label'] == label]
    clusters[label] = cl

# Extracting features
cl_coin_features = {}
unclustered_coins = set()
for label, cluster in clusters.items():
    if label != -1:
        cl_coin_features[label] = extract_features(storer, '2014-04-01', coins_set=cluster.index)
    else:
        unclustered_coins.update(cluster.index.values.tolist())

Estimated number of clusters: 4


## Adjusting clustering parameters

DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature. It has a lot of advantages and widely recommended for various tasks. One of disatvantages of DBSCAN is routine of parameters selection, especially epsilon parameter. In the previous part DBSCAN was applied to data having all elements in \[0;1\] range. That simplified choosing of epsilon. Current clustering task is significantly more complicated. Parameters are non-normalized and have it's own scales. Number of parameters is pretty big so the task can be considered as high-dimensional clustering. High number of dimensions will also make direct clusters visualization impossible. 

Best possible approach to choose correct value for epsilon is to research results obtained with different values. Fortunately, such method was developed and described in several publications.

[HDBSCAN](https://hdbscan.readthedocs.io/en/latest/index.html) - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. This allows HDBSCAN to find clusters of varying densities (unlike DBSCAN), and be more robust to parameter selection.

In [2]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;

<IPython.core.display.Javascript object>

In [3]:
import hdbscan

for top_level_cl in range(n_clusters_):
    print("Top-level cluster: {}".format(top_level_cl))
    tc = cl_coin_features[top_level_cl]
    tc = tc.replace([np.inf, -np.inf], np.nan)
    tc = tc.dropna(axis='columns', how='any')
    print("shape: {}".format(tc.shape))
    clusterer = hdbscan.HDBSCAN(min_cluster_size=2)
    ll_labels = clusterer.fit_predict(tc)
    tc['label'] = ll_labels
    for label in np.unique(ll_labels):
        if label != -1:
            print('label %d' %label)
            cl = tc[tc['label'] == label]
            display(cl)
            print('-'*40)
        else:
            noise_coins = tc[tc['label'] == label].index.values.tolist()
            unclustered_coins.update(noise_coins)
            
    print('='*40) 
    
print("Unclustered coins:")
for co in unclustered_coins:
    print(co)

Top-level cluster: 0
shape: (27, 95)
label 0


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,trans_per_address_median,trans_per_address_stddev,trans_per_address_skewns,trans_per_address_kurtos,tx_per_address_mean,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,label
vechain,949.394509,699.5,1200.307321,9.380132,127.998701,2.590254,2.61,2.073741,0.348733,-0.83717,...,13806.638918,57591.330938,11.81781,172.859691,1.080835,1.081693,0.11419,-0.055933,0.398785,0
icon,1882.450161,1056.0,1919.075742,1.637988,1.567113,3.354226,2.67,2.305503,1.513255,1.962505,...,15439.828774,555552.458802,9.788784,101.536048,1.067163,1.020315,0.210151,0.692688,-0.153612,0


----------------------------------------
label 1


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,trans_per_address_median,trans_per_address_stddev,trans_per_address_skewns,trans_per_address_kurtos,tx_per_address_mean,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,label
binance-coin,999.766234,217.0,7791.168444,14.382106,226.810603,8.084689,9.26,5.893013,0.043746,-1.360099,...,5051.448981,23890.993978,6.502648,49.615987,1.082846,1.062092,0.193961,0.827556,2.178119,1
omisego,3904.139896,2467.5,8247.695864,6.691323,49.809485,10.89489,9.82,4.869719,0.360876,0.131471,...,7600.454755,18506.548801,13.566305,214.684558,1.375349,1.256275,0.413903,4.556722,30.751261,1


----------------------------------------
label 2


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,trans_per_address_median,trans_per_address_stddev,trans_per_address_skewns,trans_per_address_kurtos,tx_per_address_mean,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,label
power-ledger,1779.313842,450.0,3025.881014,2.281092,4.219666,0.557741,0.447777,0.340812,1.402221,1.785522,...,4488.152524,11348.520879,13.554229,202.976926,1.070835,1.074252,0.330385,0.906042,2.366292,2
bytom,657.467018,446.0,674.220442,3.584983,18.070853,0.362301,0.352443,0.247522,0.831446,0.245651,...,6253.210601,38877.076001,12.846367,184.142848,1.102018,1.089468,0.17148,0.201866,1.945265,2
status,987.235732,620.0,1383.10494,5.636902,39.888811,0.108137,0.075684,0.098066,2.256887,6.333733,...,5614.727056,11964.850609,8.577986,92.22326,1.408994,1.352601,0.354303,2.662188,13.743134,2
kyber-network,936.940252,516.0,1547.828189,4.638044,26.118587,1.784807,1.37,0.973657,1.425581,1.596338,...,5282.816318,7823.555497,6.173834,61.314781,1.205733,1.18219,0.199979,0.645657,0.061374,2
aion,350.647059,242.0,410.546863,3.701675,15.264639,2.691735,2.355,1.888573,1.611819,3.118996,...,8500.707163,155013.764539,13.618181,202.1781,1.163089,1.126638,0.201389,1.502889,4.640763,2
rchain,65.275304,47.5,74.429443,2.578367,13.607838,1.094281,1.05,0.678605,0.335606,-0.615536,...,5708.825768,249371.80879,11.218519,139.014991,0.897106,0.90295,0.215825,0.84883,3.865996,2
funfair,568.948052,297.0,630.972189,2.180213,4.649262,0.03946,0.029683,0.029096,2.533569,7.665132,...,3075.443915,86043.307517,14.154505,221.879295,1.449644,1.233333,0.659184,2.553176,7.034447,2
loopring,597.911111,295.5,1728.082395,12.963405,200.276697,0.442309,0.362005,0.320652,1.410576,2.860271,...,6045.900529,54846.934511,17.270638,305.274743,1.216407,1.154256,0.378954,3.585533,28.921042,2
waltonchain,315.145553,254.0,263.000883,3.737516,22.263249,11.364073,9.865,7.509946,1.30666,1.939306,...,5928.113479,36809.79508,9.292871,92.056137,1.058315,1.048571,0.187528,3.197617,25.541123,2
augur,456.766484,377.0,323.06407,3.191605,18.721484,20.261361,15.85,19.904376,1.715515,3.295499,...,5483.098352,15132.52055,6.322831,47.898602,1.385219,1.304811,0.451716,6.304069,48.630149,2


----------------------------------------
Top-level cluster: 1
shape: (61, 55)
label 0


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
bitcoin-private,23.240863,22.56,12.86043,1.26174,2.232808,25.634964,23.76,15.385629,1.464457,2.555959,...,0.002917,0.001316,1.137621,2.056149,0.001502,0.001397,0.000807,0.883645,0.207065,0
hshare,11.814399,9.74,7.306931,1.559535,2.078288,12.8239,10.49,8.210162,1.554403,1.961953,...,0.001251,0.001159,3.532915,14.703203,0.001635,0.001264,0.001003,1.514419,2.000851,0
bitcoin-diamond,14.807918,4.5,17.294688,1.481229,1.833662,17.13449,4.82,20.11647,1.460349,1.778262,...,0.000492,0.001572,2.310487,6.983425,0.001651,0.001442,0.000601,1.048382,-0.029951,0
maker,792.610902,781.93,322.325747,0.238778,-0.239704,837.310865,800.455,347.995799,0.361101,-0.123107,...,0.086341,0.027893,-0.549053,0.029306,0.001821,0.001697,0.000581,0.909274,0.563056,0
wanchain,4.423651,4.025,2.07861,0.705453,-0.634565,4.682857,4.34,2.143479,0.658691,-0.713747,...,0.000533,0.000191,0.341138,-1.026182,0.0017,0.001627,0.000809,0.505815,-0.966857,0


----------------------------------------
label 1


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
ardor,0.248999,0.146666,0.360959,2.863432,8.456138,0.269928,0.155014,0.404876,3.03744,9.818911,...,3.3e-05,2.5e-05,1.839779,3.890813,0.000892,0.000529,0.001279,2.871493,8.676248,1
steem,1.408093,1.07,1.385575,1.503642,2.264808,1.52034,1.15,1.49698,1.540028,2.42837,...,0.000316,0.000859,3.948504,16.988578,0.001109,0.000823,0.001216,1.600084,2.544892,1


----------------------------------------
label 2


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
bytecoin-bcn,0.001071,5e-05,0.002226,3.862114,26.67343,0.001157,5.4e-05,0.002402,3.689752,22.428601,...,9.970023e-08,2.660023e-07,2.3131,6.036797,0.000677,3e-05,0.001361,2.926192,10.582155,2
bitshares,0.068236,0.007727,0.123098,2.921102,10.656447,0.072932,0.008211,0.132705,2.94653,10.692251,...,1.797706e-05,2.180321e-05,1.976971,5.055244,0.000615,6.3e-05,0.001122,2.914992,10.565782,2
siacoin,0.006439,0.000607,0.011083,3.222797,14.875275,0.006968,0.000646,0.012276,3.443014,17.092187,...,7.107616e-07,1.341594e-06,1.911383,3.867497,0.000708,3.8e-05,0.001243,3.139076,14.080927,2


----------------------------------------
label 3


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
komodo,2.420748,1.89,2.23631,1.774483,3.604554,2.602976,2.0,2.443241,1.910811,4.456263,...,0.000341,0.000169,0.446671,-0.131254,0.000873,0.000672,0.000815,1.79986,3.736237,3
golem-network-tokens,0.298102,0.27422,0.226029,0.86406,0.87531,0.319904,0.292045,0.244917,0.91339,1.0237,...,5.1e-05,4.9e-05,1.739616,2.716876,0.000883,0.000811,0.000657,0.848335,0.866487,3
ark,2.492056,2.455,1.990124,1.088746,1.188183,2.684415,2.59,2.160226,1.150835,1.356311,...,0.000329,0.000176,0.976624,1.708981,0.000864,0.000858,0.000684,1.02922,1.032684,3


----------------------------------------
label 4


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
wax,0.401025,0.25352,0.476952,4.325298,28.728535,0.464423,0.270466,0.611431,4.229268,24.002165,...,2.754163e-05,3.106317e-05,3.446221,17.31567,0.0007,0.00049,0.000626,2.614212,8.014067,4
digixdao,85.084494,59.435,100.152757,1.760459,3.061513,91.100699,63.56,108.366901,1.796305,3.257511,...,0.01754502,0.01035381,1.91271,5.650685,0.000602,0.000429,0.000703,1.728822,2.928424,4
dentacoin,0.000574,0.000374,0.000725,3.900904,20.085688,0.00066,0.000398,0.000929,4.605233,27.334378,...,5.046969e-08,5.396153e-08,2.35934,8.467067,0.000626,0.000423,0.000832,3.695435,18.454049,4


----------------------------------------
label 5


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
cybermiles,0.209653,0.188085,0.098722,0.93793,0.839453,0.22671,0.197038,0.110596,1.017315,1.071018,...,2.210712e-05,1.016537e-05,0.654467,-0.231995,0.000416,0.000371,0.000202,0.562836,-0.763781,5
monaco,8.256865,7.5,4.213411,0.593459,0.441495,9.000823,8.13,4.740944,0.730917,0.649901,...,0.0009692494,0.000628968,2.325042,7.92749,0.000362,0.000329,0.000195,0.436356,-0.226983,5
zencash,22.862708,21.46,14.16501,0.390113,-0.73513,24.753895,22.99,15.265151,0.431416,-0.64701,...,0.00291958,0.0009794733,0.225385,-0.418846,0.000253,0.000254,0.000175,0.127273,-1.272639,5
decentraland,0.086779,0.09555,0.050646,-0.01283,-0.326775,0.094069,0.101427,0.055044,0.124299,0.007083,...,1.080277e-05,5.202075e-06,-0.290883,-1.107702,0.000376,0.000369,0.00021,0.644192,0.732772,5
dropil,0.005598,0.00572,0.000849,-0.516029,-0.232788,0.006113,0.006088,0.001064,0.264097,1.255825,...,7.808254e-07,1.437768e-07,-0.846285,0.018465,0.00037,0.000386,6e-05,-0.779544,-0.361698,5
bancor,3.660941,2.96,2.398626,4.381781,28.189283,3.89,3.055,2.683823,4.263081,25.135643,...,0.000506296,0.0007255991,8.139562,71.234417,0.000457,0.000397,0.000229,0.558214,-0.940856,5
paypex,1.05047,1.01,0.520715,0.350255,-0.444781,1.139421,1.095,0.604531,1.470352,7.028575,...,0.0001204651,6.944378e-05,0.19913,-0.663339,0.000258,0.000244,0.00012,0.40318,-0.388046,5
polymath-network,0.62313,0.523797,0.286755,0.890957,0.079759,0.674753,0.556775,0.319642,0.871556,-0.089713,...,6.51723e-05,2.517043e-05,0.595057,-1.033858,0.000539,0.000431,0.000243,0.899825,-0.084507,5
enigma-project,1.977663,1.72,1.39026,1.493452,2.804216,2.14362,1.86,1.555507,1.594281,2.992226,...,0.0002182171,0.0001071887,0.07945,-0.402444,0.000518,0.000448,0.000363,1.487507,2.761826,5
moac,9.079839,9.26,3.851052,-0.06142,-1.001265,9.649731,9.84,4.070448,-0.035481,-0.93643,...,0.001118302,0.0003698108,0.094831,-0.422601,0.000518,0.000524,0.000114,0.177664,-0.805532,5


----------------------------------------
label 6


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
neoscoin,1.241401,0.057776,2.31229,3.161752,13.667693,1.352071,0.061973,2.538932,3.18389,13.642464,...,8.058983e-05,0.000395699,2.43167,6.994469,1.6e-05,6.995261e-07,2.9e-05,3.216032,14.13288,6
crown,0.547453,0.011036,0.885053,1.936076,3.648528,0.597505,0.01356,0.962189,1.942499,3.727477,...,1.721539e-05,0.0001486377,1.497302,1.658155,3.1e-05,4.047599e-07,5.2e-05,2.009594,3.906457,6
korecoin,0.990036,0.029753,1.793782,1.97015,3.236238,1.096288,0.035417,1.994462,2.001036,3.376039,...,5.689136e-05,0.0002523717,2.585461,8.747095,7e-06,1.768558e-07,1.3e-05,1.967379,3.194516,6
maidsafecoin,0.170106,0.073602,0.20653,1.70473,3.147239,0.180155,0.076747,0.220148,1.735564,3.272201,...,8.949929e-05,5.166388e-05,0.571676,-0.485515,0.000268,0.0001158692,0.000326,1.713392,3.191458,6
zcoin,21.557094,11.16,25.684621,1.939066,3.948439,23.088672,12.08,27.740978,2.018183,4.440614,...,0.003109666,0.001836301,0.482048,-0.20417,0.000292,0.0001107759,0.000366,1.598898,2.31118,6
ion,1.12371,0.819653,1.119327,1.288495,1.640972,1.238601,0.891901,1.250366,1.479844,2.814114,...,0.000237284,0.0001089903,0.71075,1.881462,7.2e-05,4.54166e-05,8e-05,1.301249,1.369434,6
sibcoin,0.630847,0.161728,0.926206,2.157049,5.283161,0.689684,0.171437,1.031382,2.251245,5.821812,...,0.0001183304,0.0001199335,1.697724,4.689936,3.5e-05,7.673365e-06,5.2e-05,2.194836,5.47387,6
emercoin,0.818447,0.224321,1.319319,2.477042,7.231048,0.878846,0.240944,1.427773,2.584766,8.194776,...,0.0002491345,0.0002201666,0.715023,-0.003978,0.000117,2.966228e-05,0.000191,2.453898,6.97462,6
blocknet,6.493587,0.121778,11.061462,1.786041,2.636969,7.061386,0.135489,12.054517,1.802897,2.6883,...,0.0002827746,0.001407655,1.468569,1.458448,0.000117,2.418933e-06,0.000194,1.720409,2.371543,6
bitsend,0.210375,0.005222,0.361271,2.010765,3.583548,0.230115,0.006082,0.398735,2.134792,4.519553,...,8.728452e-06,4.264147e-05,0.853146,-0.775657,1.3e-05,2.135303e-07,2.3e-05,2.033453,3.677086,6


----------------------------------------
Top-level cluster: 2
shape: (3, 10)
Top-level cluster: 3
shape: (19, 106)
label 0


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,mdiff_to_volatility_mean,mdiff_to_volatility_median,mdiff_to_volatility_stddev,mdiff_to_volatility_skewns,mdiff_to_volatility_kurtos,label
neo,6186.555556,5824.0,6611.747396,1.372165,3.604629,31.586522,20.12,39.027937,1.372236,1.261594,...,2.196671,0.46121,1.907444,6.457338,0.075625,0.0,0.924062,19.345173,409.596195,0
ethereum-classic,10809.951152,10763.0,8286.560932,0.689049,0.72534,12.379725,13.69,10.556761,0.660376,-0.27908,...,2.753299,1.736931,1.205032,0.656309,17.226962,12.174938,79.397441,25.285356,655.311219,0
bitcoin-gold,24770.355731,13296.0,32325.900838,2.885266,9.292075,129.117545,84.71,103.964299,1.048445,0.166518,...,0.225699,0.080849,-0.072862,3.205795,0.565094,0.502647,0.330306,1.074817,1.513131,0


----------------------------------------
label 1


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,mdiff_to_volatility_mean,mdiff_to_volatility_median,mdiff_to_volatility_stddev,mdiff_to_volatility_skewns,mdiff_to_volatility_kurtos,label
lisk,1256.345088,845.5,1306.419742,2.170264,8.431071,5.084455,1.765,7.069427,1.84691,2.94081,...,1.780763,0.396637,1.179027,2.528794,234.195165,194.815368,189.347875,1.589927,4.444137,1
gas,4605.768519,4411.5,5576.554873,2.587282,15.329782,23.146736,20.59,13.790248,1.121104,1.914183,...,1.800429,0.809647,1.883398,3.804189,73.458813,0.013154,197.234739,3.986829,21.984833,1
digibyte,7653.289331,4285.0,7987.288277,2.910194,12.477327,0.007896,0.000279,0.016372,2.99155,11.571083,...,0.640099,0.136387,0.030436,2.4606,1334.301447,122.844788,2645.07516,4.81118,40.258149,1
verge,4775.91204,707.0,9105.175995,2.395383,4.998431,0.011389,2.5e-05,0.029746,3.755064,16.750922,...,0.701068,0.183204,-0.552401,-0.252065,950.513747,33.23857,7896.868751,31.924672,1099.745328,1
zcash,53671.197802,56198.0,28965.45871,0.018248,-1.350792,233.724953,220.8,184.993409,3.025848,20.671637,...,0.096124,0.051123,5.473586,41.712232,2.476253,1.655919,3.959993,4.433368,23.606822,1
waves,4530.174551,3220.0,4783.672582,2.243263,7.484182,3.155054,2.86,3.29103,1.33811,1.970991,...,2.522097,3.895967,2.982409,12.366264,42.177654,4.054797,159.940076,5.716705,32.118045,1
dogecoin,44725.500591,35972.0,30990.339987,2.850689,11.2763,0.001132,0.000243,0.001916,3.192562,13.44,...,0.44532,0.129465,0.957839,0.907888,37756.476612,24201.275609,332187.587638,40.333971,1632.795528,1
pivx,2817.088009,2696.0,1551.385042,1.526174,8.624566,2.08026,1.36,2.71116,1.75551,3.364241,...,0.187977,0.169446,1.370168,3.121039,41.565378,1.1469,1076.473628,29.339617,858.875856,1
decred,10833.712222,11097.5,3816.079987,0.222284,0.659588,29.186445,15.51,33.340191,0.980179,-0.305222,...,0.302313,0.07204,1.659293,9.949313,59.520629,8.81254,148.366603,5.126046,33.825512,1


----------------------------------------
label 2


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,mdiff_to_volatility_mean,mdiff_to_volatility_median,mdiff_to_volatility_stddev,mdiff_to_volatility_skewns,mdiff_to_volatility_kurtos,label
nem,870.648026,356.0,1267.603028,4.200264,32.138797,0.131956,0.005698,0.247699,3.310615,13.703799,...,1.634948,20.710297,7.254711,70.183217,6197.273077,5089.033717,7475.500598,20.369103,578.018261,2
dash,18778.626667,10335.0,17916.087571,2.149134,12.087689,120.214314,7.775,230.905655,2.654101,7.803201,...,0.170657,0.158673,3.688645,26.213098,8.365254,3.932487,10.370287,7.242054,139.679172,2


----------------------------------------
Unclustered coins:
qtum
cryptonex
mixin
kin
stellar
tether
nano
ripple
huobi-token
smartcash
gifto
mithril
nxt
elastos
monero
factom
tron
ethereum
stratis
tezos
bitcoin
iota
veritaseum
ontology
litecoin
pundi-x
iostoken
bitcoin-cash
eos
nebulas-token
cardano
tenx


## Important notes
### Approach limitations
Implemented approach is very general. It doesn't detect any mutual dependencies, following trends and other complex things. So it should be considered as clustering by basic time series characteristics. Particularly, top-level clustering is rather necessary measure then usual taken step. List of features describing each time serie is quite short, but it can easily be extended with additional features.
### Label "-1"
HDBSCAN (as extension of DBSCAN method) groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). This leads to appearing a section named "Unclustered coins" which includes these outliers belonging to -1 label. This list consists of two types of coins: coins with big amount of data missed and **real** outliers. Each coin presented in this list should be considered as **special**. it's not surprising, therefore, that couple of crypto-headliners appeared there: they are really outliers by many of parameters.
### Metrics
HDBSCAN [can use different metrics][0] to calculate distance between coins by their values of features. The case examined uses default Euclidean metrics. Main disadvantage of such metrics is assumption that different parameters equally influence the distance. That actually is not true. The question of choosing metrics for high-dimensional clustering is [very][1] [controversial][2]. Generally one of the best ways is to define weight for each feature and use weighted Euclidean distance. Possibly this can be the theme of separate (and pretty complex) research.
### Methods
Of course there are a lot of clustering algorithms nowadays that can be applied for this task instead of (H)DBSCAN. But most probably, applying of described approach to data splitted by timeframes of different market states as well as widening feature-sets extraction method can deliver more significant results. Next proposed step is switching to tsfresh library for automated features extraction. 
[0]:https://hdbscan.readthedocs.io/en/latest/basic_hdbscan.html#what-about-different-metrics
[1]:https://www.researchgate.net/post/What_is_the_best_distance_measure_for_high_dimensional_data
[2]:https://stats.stackexchange.com/questions/99171/why-is-euclidean-distance-not-a-good-metric-in-high-dimensions

# Clustering using automated features extraction technique
Module tsfresh is used to to extract characteristics from time series. Actual version for the moment extracts ~800 features out of the box. Since our data consists of 22 series for each coin (15 from csv files and 7 additionally constructed), total number of features describing each coin is 17468. This time the task dimensionality becomes even more higher.

## Preparing data
Tsfresh has [it's own format][1] of pandas.Dataframe() that should be used for features extraction. Since tsfresh [can easily handle][2] time series of unequal length, we can use **Storer.mf** containing the data across all coins, without using **Extractor** class for each coin separately. Additional method **tsfresh_format()** was implemented as @property of **Storer** class. This method converts mainframe to format compatible with tsfresh. 

[1]:https://tsfresh.readthedocs.io/en/latest/text/data_formats.html
[2]:https://tsfresh.readthedocs.io/en/latest/text/faq.html

In [4]:
# Import of tsfresh methods
from tsfresh import extract_features as tsf_ef
from tsfresh.utilities.dataframe_functions import impute

In [5]:
%%time

# Convert mainframe to format compatible with tsfresh
ts_df = storer.tsfresh_format

display(ts_df.head(10))
display(ts_df.tail(10))

Unnamed: 0,id,kind,time,value
1232,vechain,active_address,2017-08-15,2.0
1233,vechain,active_address,2017-08-16,159.25
1234,vechain,active_address,2017-08-17,316.5
1235,vechain,active_address,2017-08-18,473.75
1236,vechain,active_address,2017-08-19,631.0
1237,vechain,active_address,2017-08-20,205.0
1238,vechain,active_address,2017-08-21,393.0
1239,vechain,active_address,2017-08-22,296.0
1240,vechain,active_address,2017-08-23,470.0
1241,vechain,active_address,2017-08-24,456.0


Unnamed: 0,id,kind,time,value
3989174,aeternity,tx_per_address,2018-07-17,1.266904
3989175,aeternity,tx_per_address,2018-07-18,1.435262
3989176,aeternity,tx_per_address,2018-07-19,1.121864
3989177,aeternity,tx_per_address,2018-07-20,1.053191
3989178,aeternity,tx_per_address,2018-07-21,1.116022
3989179,aeternity,tx_per_address,2018-07-22,1.010638
3989180,aeternity,tx_per_address,2018-07-23,1.643068
3989181,aeternity,tx_per_address,2018-07-24,1.230769
3989182,aeternity,tx_per_address,2018-07-25,1.405858
3989183,aeternity,tx_per_address,2018-07-26,1.080169


Wall time: 3min 32s


## Features extraction
Run process of automated features extraction with tsfresh and save the result to csv file.

In [6]:
%%time

extracted_features = tsf_ef(ts_df, column_id="id", column_sort="time", column_kind="kind", column_value="value")
extracted_features = impute(extracted_features)
extracted_features.to_csv("ef.csv")  # Save it because it takes too long to calculate it each time

Feature Extraction: 100%|████████████████████████████████████████████████████████| 20/20 [26:14<00:00, 78.70s/it]


Wall time: 26min 30s


In [7]:
# Show the structure of dataframe with extracted features
display(extracted_features.head(10))
display(extracted_features.tail(10))

variable,active_address__abs_energy,active_address__absolute_sum_of_changes,"active_address__agg_autocorrelation__f_agg_""mean""","active_address__agg_autocorrelation__f_agg_""median""","active_address__agg_autocorrelation__f_agg_""var""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""intercept""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""rvalue""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""slope""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""stderr""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_50__attr_""intercept""",...,volume__time_reversal_asymmetry_statistic__lag_1,volume__time_reversal_asymmetry_statistic__lag_2,volume__time_reversal_asymmetry_statistic__lag_3,volume__value_count__value_-inf,volume__value_count__value_0,volume__value_count__value_1,volume__value_count__value_inf,volume__value_count__value_nan,volume__variance,volume__variance_larger_than_standard_deviation
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0x,605519400.0,82731.0,0.117297,0.106349,0.027818,2365.457143,-0.336061,-47.954622,23.395524,5323.25,...,-1.052709e+21,-6.852882e+20,-2.590628e+19,0.0,0.0,0.0,0.0,0.0,356560700000000.0,1.0
aelf,308042200.0,73441.0,-0.004179,-0.004811,0.000163,3102.913043,-0.295909,-142.01581,100.039258,10451.8,...,-8.534993e+21,-2.320098e+22,-4.20012e+22,0.0,0.0,0.0,0.0,0.0,2931632000000000.0,1.0
aeternity,112810700.0,41060.0,0.037174,0.020251,0.009742,1155.857398,-0.189691,-22.828209,21.222047,3799.857143,...,2.524027e+20,-5.652238e+19,-7.643957e+20,0.0,0.0,0.0,0.0,0.0,133052200000000.0,1.0
aion,84244100.0,54263.0,0.065037,0.052556,0.012044,968.441379,0.002184,0.23202,20.441058,2087.619048,...,-2.047757e+20,-2.205417e+20,-1.225549e+20,0.0,0.0,0.0,0.0,0.0,61449180000000.0,1.0
ardor,2608398000.0,259974.0,0.340423,0.353796,0.010806,1238.273728,-0.057492,-6.645122,36.135563,3664.75,...,-9.176908e+20,-2.808596e+21,-3.516748e+21,0.0,9.0,1.0,0.0,0.0,228964900000000.0,1.0
ark,2608398000.0,259974.0,0.340423,0.353796,0.010806,1238.273728,-0.057492,-6.645122,36.135563,3664.75,...,-5.432228e+19,-7.823966e+18,6.011114e+19,0.0,0.0,0.0,0.0,0.0,44455060000000.0,1.0
augur,113934200.0,43618.0,0.350662,0.394984,0.021014,966.594595,-0.204435,-11.882883,9.61749,1601.666667,...,-2.457617e+21,-3.936387e+20,-1.350774e+21,0.0,1.0,2.0,0.0,0.0,151085200000000.0,1.0
bancor,2608398000.0,259974.0,0.340423,0.353796,0.010806,1238.273728,-0.057492,-6.645122,36.135563,3664.75,...,1.982417e+20,1.819819e+20,-2.060632e+18,0.0,0.0,0.0,0.0,0.0,53264890000000.0,1.0
basic-attention-token,304296000.0,62326.0,0.530526,0.557664,0.029372,1202.082452,-0.136181,-6.645122,7.549721,1849.844444,...,-4.413956e+20,1.549729e+20,1.207532e+20,0.0,0.0,0.0,0.0,0.0,131869900000000.0,1.0
binance-coin,23755210000.0,317075.0,0.01568,-0.009919,0.011063,2513.285897,0.057883,108.372267,307.281956,13393.916667,...,4.497641e+21,-8.759933e+22,-6.003704e+22,0.0,0.0,0.0,0.0,0.0,4740660000000000.0,1.0


variable,active_address__abs_energy,active_address__absolute_sum_of_changes,"active_address__agg_autocorrelation__f_agg_""mean""","active_address__agg_autocorrelation__f_agg_""median""","active_address__agg_autocorrelation__f_agg_""var""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""intercept""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""rvalue""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""slope""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""stderr""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_50__attr_""intercept""",...,volume__time_reversal_asymmetry_statistic__lag_1,volume__time_reversal_asymmetry_statistic__lag_2,volume__time_reversal_asymmetry_statistic__lag_3,volume__value_count__value_-inf,volume__value_count__value_0,volume__value_count__value_1,volume__value_count__value_inf,volume__value_count__value_nan,volume__variance,volume__variance_larger_than_standard_deviation
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
verge,146624700000.0,924126.0,0.859942,0.86019,0.005304,-5607.959918,0.606455,172.635328,19.337607,-6691.889163,...,-2.074784e+23,-3.804209e+23,1.980499e+22,0.0,0.0,0.0,0.0,0.0,8109743000000000.0,1.0
veritaseum,67301670.0,29287.0,0.476574,0.506777,0.031793,718.904718,-0.415149,-10.24021,3.383006,1151.327273,...,4.439179e+16,6.335611e+16,9247988000000000.0,0.0,0.0,0.0,0.0,0.0,323898300000.0,1.0
waltonchain,62508370.0,35426.0,0.215901,0.187632,0.009683,648.680162,-0.065799,-2.887406,7.297838,1380.833333,...,-9.191506e+20,-5.695716e+21,1.131337e+21,0.0,0.0,0.0,0.0,0.0,577032400000000.0,1.0
wanchain,2608398000.0,259974.0,0.340423,0.353796,0.010806,1238.273728,-0.057492,-6.645122,36.135563,3664.75,...,-1.058998e+22,-3.770552e+21,-6.212197e+21,0.0,0.0,0.0,0.0,0.0,399704200000000.0,1.0
waves,26607880000.0,1686466.0,0.437526,0.433732,0.008667,559.067669,0.615942,305.572888,50.455748,5259.10989,...,3.321996e+19,-1.878305e+20,-2.44462e+20,0.0,0.0,0.0,0.0,0.0,330547400000000.0,1.0
wax,2608398000.0,259974.0,0.340423,0.353796,0.010806,1238.273728,-0.057492,-6.645122,36.135563,3664.75,...,-2.832079e+21,-2.58193e+21,-1.161918e+21,0.0,0.0,0.0,0.0,0.0,131392500000000.0,1.0
zcash,2369382000000.0,1906523.0,0.927693,0.934783,0.001058,13040.014904,0.854419,1482.187523,114.470386,21783.802198,...,-2.296494e+21,-3.19616e+22,-6.077346e+22,0.0,0.0,0.0,0.0,0.0,4200767000000000.0,1.0
zcoin,2608398000.0,259974.0,0.340423,0.353796,0.010806,1238.273728,-0.057492,-6.645122,36.135563,3664.75,...,5.917691e+19,-1.849818e+20,-1.092619e+20,0.0,0.0,0.0,0.0,0.0,37375110000000.0,1.0
zencash,2608398000.0,259974.0,0.340423,0.353796,0.010806,1238.273728,-0.057492,-6.645122,36.135563,3664.75,...,-9.596289e+19,-1.553048e+20,-1.018554e+20,0.0,0.0,0.0,0.0,0.0,17899210000000.0,1.0
zilliqa,235495200.0,53307.0,0.002418,-0.011676,0.008752,1843.564286,-0.057492,-18.814662,77.007548,4849.0,...,-1.15604e+23,-2.175072e+23,-1.562117e+23,0.0,0.0,0.0,0.0,0.0,3688235000000000.0,1.0


## Normalization
It's clear that each feature has its own scale. So this features dataframe should be rescaled, otherwise features with larger scale will affects the distance measure higher. **normalize** function below will scale each feature to \[0, 1\] range. 

In [8]:
def normalize(df):
    result = df.copy()
    for feature_name in df.columns:
        max_value = df[feature_name].max()
        min_value = df[feature_name].min()
        result[feature_name] = (df[feature_name] - min_value) / (max_value - min_value)
    return result

In [9]:
ef_norm = normalize(extracted_features)
display(ef_norm.head(10))

variable,active_address__abs_energy,active_address__absolute_sum_of_changes,"active_address__agg_autocorrelation__f_agg_""mean""","active_address__agg_autocorrelation__f_agg_""median""","active_address__agg_autocorrelation__f_agg_""var""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""intercept""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""rvalue""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""slope""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""stderr""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_50__attr_""intercept""",...,volume__time_reversal_asymmetry_statistic__lag_1,volume__time_reversal_asymmetry_statistic__lag_2,volume__time_reversal_asymmetry_statistic__lag_3,volume__value_count__value_-inf,volume__value_count__value_0,volume__value_count__value_1,volume__value_count__value_inf,volume__value_count__value_nan,volume__variance,volume__variance_larger_than_standard_deviation
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0x,1.282667e-06,0.001078,0.130591,0.155094,0.548957,0.448015,0.241708,0.292583,0.014595,0.321105,...,0.117465,0.712655,0.897814,,0.0,0.0,,,3.3e-05,
aelf,6.474441e-07,0.000942,0.000268,0.041724,0.0,0.450795,0.266243,0.280244,0.064841,0.333218,...,0.11743,0.712579,0.897766,,0.0,0.0,,,0.000273,
aeternity,2.305537e-07,0.000469,0.044632,0.067284,0.19014,0.443457,0.33115,0.29588,0.01317,0.317506,...,0.117471,0.712657,0.897813,,0.0,0.0,,,1.2e-05,
aion,1.695535e-07,0.000662,0.074524,0.100232,0.235831,0.44275,0.448398,0.298905,0.012658,0.313462,...,0.117469,0.712656,0.897814,,0.0,0.0,,,6e-06,
ardor,5.559542e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.117465,0.712648,0.89781,,0.095745,0.012821,,,2.1e-05,
ark,5.559542e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.117469,0.712657,0.897814,,0.0,0.0,,,4e-06,
augur,2.329527e-07,0.000506,0.380949,0.449468,0.413903,0.442743,0.32214,0.297315,0.005563,0.312314,...,0.117458,0.712656,0.897813,,0.010638,0.025641,,,1.4e-05,
bancor,5.559542e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.117471,0.712658,0.897814,,0.0,0.0,,,5e-06,
basic-attention-token,6.394445e-07,0.00078,0.573911,0.615383,0.579813,0.443631,0.363848,0.298003,0.004207,0.312901,...,0.117468,0.712658,0.897815,,0.0,0.0,,,1.2e-05,
binance-coin,5.07157e-05,0.004501,0.021574,0.036515,0.216374,0.448573,0.482434,0.313091,0.200704,0.340167,...,0.117491,0.712363,0.897746,,0.0,0.0,,,0.000441,


Some features were the same value for all coins. This may leads to appearing of **Inf**s and **NaN**s. To avoid this we should drop such features. Dropping will not affect the result because such equal features are non informative.

In [10]:
ef_norm = ef_norm.dropna(axis='columns', how='any')
ef_norm.shape

(115, 15634)

## Clustering
As previously we will use hdbscan to perform clustering. Since this time number of features is quite higher we can expect that most of coins will be considered as outliers.

In [11]:
tf_clusterer = hdbscan.HDBSCAN(min_cluster_size=2)
tf_labels = tf_clusterer.fit_predict(ef_norm)
ef_norm['label'] = tf_labels
for label in np.unique(tf_labels):
    if label != -1:
        print('label %d' %label)
        cl_tf = ef_norm[ef_norm['label'] == label]
        display(cl_tf)
        print('-'*40)

label 0


variable,active_address__abs_energy,active_address__absolute_sum_of_changes,"active_address__agg_autocorrelation__f_agg_""mean""","active_address__agg_autocorrelation__f_agg_""median""","active_address__agg_autocorrelation__f_agg_""var""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""intercept""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""rvalue""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""slope""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""stderr""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_50__attr_""intercept""",...,volume__sum_values,volume__symmetry_looking__r_0.05,volume__symmetry_looking__r_0.1,volume__time_reversal_asymmetry_statistic__lag_1,volume__time_reversal_asymmetry_statistic__lag_2,volume__time_reversal_asymmetry_statistic__lag_3,volume__value_count__value_0,volume__value_count__value_1,volume__variance,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
bitshares,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.006706,1.0,1.0,0.117462,0.712534,0.897765,0.0,0.0,0.0001005685,0
neoscoin,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,9.3e-05,1.0,1.0,0.11747,0.712657,0.897814,0.0,0.0,4.76223e-08,0
nxt,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.004531,1.0,1.0,0.117499,0.712691,0.897811,0.0,0.0,0.0001321656,0


----------------------------------------
label 1


variable,active_address__abs_energy,active_address__absolute_sum_of_changes,"active_address__agg_autocorrelation__f_agg_""mean""","active_address__agg_autocorrelation__f_agg_""median""","active_address__agg_autocorrelation__f_agg_""var""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""intercept""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""rvalue""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""slope""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""stderr""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_50__attr_""intercept""",...,volume__sum_values,volume__symmetry_looking__r_0.05,volume__symmetry_looking__r_0.1,volume__time_reversal_asymmetry_statistic__lag_1,volume__time_reversal_asymmetry_statistic__lag_2,volume__time_reversal_asymmetry_statistic__lag_3,volume__value_count__value_0,volume__value_count__value_1,volume__variance,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
bitsend,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,9.2e-05,1.0,1.0,0.11747,0.712657,0.897814,0.191489,0.448718,1.530308e-07,1
crown,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,2.4e-05,1.0,1.0,0.11747,0.712657,0.897814,1.0,0.679487,1.875903e-09,1
emercoin,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.000265,1.0,1.0,0.117469,0.712657,0.897814,0.12766,0.076923,4.712445e-07,1
exclusivecoin,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,7.4e-05,1.0,1.0,0.11747,0.712657,0.897814,0.031915,0.141026,7.142969e-08,1
korecoin,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.000122,1.0,1.0,0.11747,0.712657,0.897814,0.542553,1.0,1.493741e-07,1
monetaryunit,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.000106,1.0,1.0,0.11747,0.712657,0.897814,0.234043,0.230769,8.80712e-08,1


----------------------------------------
label 2


variable,active_address__abs_energy,active_address__absolute_sum_of_changes,"active_address__agg_autocorrelation__f_agg_""mean""","active_address__agg_autocorrelation__f_agg_""median""","active_address__agg_autocorrelation__f_agg_""var""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""intercept""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""rvalue""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""slope""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""stderr""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_50__attr_""intercept""",...,volume__sum_values,volume__symmetry_looking__r_0.05,volume__symmetry_looking__r_0.1,volume__time_reversal_asymmetry_statistic__lag_1,volume__time_reversal_asymmetry_statistic__lag_2,volume__time_reversal_asymmetry_statistic__lag_3,volume__value_count__value_0,volume__value_count__value_1,volume__variance,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
diamond,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,2.5e-05,1.0,1.0,0.11747,0.712657,0.897814,0.0,0.012821,8.215973e-09,2
reddcoin,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.001678,1.0,1.0,0.117467,0.712652,0.897813,0.0,0.0,1.419455e-05,2


----------------------------------------
label 3


variable,active_address__abs_energy,active_address__absolute_sum_of_changes,"active_address__agg_autocorrelation__f_agg_""mean""","active_address__agg_autocorrelation__f_agg_""median""","active_address__agg_autocorrelation__f_agg_""var""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""intercept""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""rvalue""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""slope""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""stderr""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_50__attr_""intercept""",...,volume__sum_values,volume__symmetry_looking__r_0.05,volume__symmetry_looking__r_0.1,volume__time_reversal_asymmetry_statistic__lag_1,volume__time_reversal_asymmetry_statistic__lag_2,volume__time_reversal_asymmetry_statistic__lag_3,volume__value_count__value_0,volume__value_count__value_1,volume__variance,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
bitcoin-diamond,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.00148,1.0,1.0,0.119593,0.712634,0.897787,0.0,0.0,0.000306,3
wax,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.000628,1.0,1.0,0.117456,0.712648,0.897813,0.0,0.0,1.2e-05,3


----------------------------------------
label 4


variable,active_address__abs_energy,active_address__absolute_sum_of_changes,"active_address__agg_autocorrelation__f_agg_""mean""","active_address__agg_autocorrelation__f_agg_""median""","active_address__agg_autocorrelation__f_agg_""var""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""intercept""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""rvalue""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""slope""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""stderr""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_50__attr_""intercept""",...,volume__sum_values,volume__symmetry_looking__r_0.05,volume__symmetry_looking__r_0.1,volume__time_reversal_asymmetry_statistic__lag_1,volume__time_reversal_asymmetry_statistic__lag_2,volume__time_reversal_asymmetry_statistic__lag_3,volume__value_count__value_0,volume__value_count__value_1,volume__variance,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ardor,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.001493,1.0,1.0,0.117465,0.712648,0.89781,0.095745,0.012821,2.132189e-05,4
ark,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.000871,1.0,1.0,0.117469,0.712657,0.897814,0.0,0.0,4.139766e-06,4
cryptonex,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,9.8e-05,0.0,0.0,0.11747,0.712657,0.897814,0.0,0.0,6.604458e-08,4
cybermiles,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.002618,0.0,1.0,0.117473,0.712654,0.897813,0.0,0.0,7.631944e-05,4
decentraland,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.001773,0.0,1.0,0.117467,0.712653,0.897813,0.0,0.0,3.27281e-05,4
digixdao,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.001666,1.0,1.0,0.11744,0.712639,0.897814,0.0,0.0,2.762521e-05,4
elastos,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.001574,1.0,1.0,0.117469,0.712656,0.897814,0.0,0.0,2.666893e-05,4
enigma-project,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.000957,1.0,1.0,0.117467,0.712653,0.897813,0.0,0.0,1.422224e-05,4
factom,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.001574,1.0,1.0,0.117469,0.712656,0.897814,0.0,0.0,2.666893e-05,4
gifto,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.001574,1.0,1.0,0.117469,0.712656,0.897814,0.0,0.0,2.666893e-05,4


----------------------------------------
label 5


variable,active_address__abs_energy,active_address__absolute_sum_of_changes,"active_address__agg_autocorrelation__f_agg_""mean""","active_address__agg_autocorrelation__f_agg_""median""","active_address__agg_autocorrelation__f_agg_""var""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""intercept""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""rvalue""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""slope""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""stderr""","active_address__agg_linear_trend__f_agg_""max""__chunk_len_50__attr_""intercept""",...,volume__sum_values,volume__symmetry_looking__r_0.05,volume__symmetry_looking__r_0.1,volume__time_reversal_asymmetry_statistic__lag_1,volume__time_reversal_asymmetry_statistic__lag_2,volume__time_reversal_asymmetry_statistic__lag_3,volume__value_count__value_0,volume__value_count__value_1,volume__variance,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
bitcoin-private,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,3.5e-05,0.0,1.0,0.11747,0.712657,0.897814,0.0,0.0,2.214374e-08,5
hshare,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.008126,0.0,1.0,0.117411,0.712611,0.897814,0.0,0.0,0.0003221602,5
moac,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,6e-05,0.0,1.0,0.11747,0.712657,0.897814,0.0,0.0,2.658076e-07,5
polymath-network,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.000306,1.0,1.0,0.11747,0.712657,0.897814,0.0,0.0,2.309527e-06,5
wanchain,6e-06,0.003667,0.369964,0.407461,0.211258,0.443767,0.411932,0.298003,0.022947,0.317187,...,0.000765,1.0,1.0,0.11742,0.712644,0.897807,0.0,0.0,3.722166e-05,5


----------------------------------------


## Conclusions
As it was expected, most of coins were considered outliers. The main reason for such behavior is very high dimensionality of features space. Actually we have ~150 points, each having it's own value for every of ~15k coordinates. And our goal is to find points that close enough in this rarefied space to be considered **cluster**. On the other hand this means that founded clusters represent coins that very similar. 

Part II of final report was successfully completed. Two different approaches were demonstrated. Both approaches, manual features extracion and automated feature extraction with tsfresh library, gave different interesting results as it was initially expected.

Next step is performing clustering with taking into account different market states.