# Features Extraction & Clustering. Part II.
This part is focused on clustering inside top-levels clusters that were obtained in the previous part.

## Preparing

First, we have to re-run code of Part I before continuing.

In [1]:
# Import of necessary libs and our classes
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from scipy.spatial.distance import squareform, pdist
from sklearn.cluster import DBSCAN
from cj_loader import Storer, Extractor, extract_features
import warnings
warnings.filterwarnings("ignore")

# Init storer object with given data and calculate precense of data
storer = Storer()
precense = storer.applicability()

# Calculation of data completeness matrix and weights
data_compl = storer.data_completeness()
weights = data_compl.mean().sort_values()

# transform boolean matrix to numeric
weighted = precense.copy()
for index, row in precense.iterrows():
    weighted.loc[index][:] = row * weights

# calculate distances between pairs of coins
distances = pd.DataFrame(squareform(pdist(weighted)), index=weighted.index, columns=weighted.index)

# Top-level clustering with DBSCAN algorithm
clustering = DBSCAN(eps=0.3, min_samples=3).fit(distances)
labels = clustering.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
print('Estimated number of clusters: %d' % n_clusters_)

# Clusters aggregation
weighted['label'] = labels
clusters = {}
for label in np.unique(labels):
    cl = weighted[weighted['label'] == label]
    clusters[label] = cl

# Extracting features
cl_coin_features = {}
unclustered_coins = set()
for label, cluster in clusters.items():
    if label != -1:
        cl_coin_features[label] = extract_features(storer, '2014-04-01', coins_set=cluster.index)
    else:
        unclustered_coins.update(cluster.index.values.tolist())

Estimated number of clusters: 4


## Adjusting clustering parameters

DBSCAN is one of the most commonly used clustering algorithms and also the most cited in scientific literature. It has a lot of advantages and is widely recommended for various tasks. One of the disatvantages of DBSCAN is routine of parameters selection, especially epsilon parameter. In the previous part, DBSCAN was applied to data having all elements in the \[0;1\] range. That simplified the selection process for epsilon. Our current clustering task is significantly more complex. Parameters are non-normalized and have their own scales. Number of parameters is large, therefore, the task can be considered as high-dimensional clustering. High number of dimensions will also make direct clusters visualization impossible. 

Best possible approach to choosing the correct value for epsilon would be to research results obtained with different values. Fortunately, such method was developed and described in several publications.

[HDBSCAN](https://hdbscan.readthedocs.io/en/latest/index.html) - Hierarchical Density-Based Spatial Clustering of Applications with Noise performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. This allows HDBSCAN to find clusters of varying densities (unlike DBSCAN), and be more robust to parameter selection.

In [2]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;

<IPython.core.display.Javascript object>

In [3]:
import hdbscan

for top_level_cl in range(n_clusters_):
    print("Top-level cluster: {}".format(top_level_cl))
    tc = cl_coin_features[top_level_cl]
    tc = tc.replace([np.inf, -np.inf], np.nan)
    tc = tc.dropna(axis='columns', how='any')
    print("shape: {}".format(tc.shape))
    clusterer = hdbscan.HDBSCAN(min_cluster_size=2)
    ll_labels = clusterer.fit_predict(tc)
    tc['label'] = ll_labels
    for label in np.unique(ll_labels):
        if label != -1:
            print('label %d' %label)
            cl = tc[tc['label'] == label]
            display(cl)
            print('-'*40)
        else:
            noise_coins = tc[tc['label'] == label].index.values.tolist()
            unclustered_coins.update(noise_coins)
            
    print('='*40) 
    
print("Unclustered coins:")
for co in unclustered_coins:
    print(co)

Top-level cluster: 0
shape: (19, 106)
label 0


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,mdiff_to_volatility_mean,mdiff_to_volatility_median,mdiff_to_volatility_stddev,mdiff_to_volatility_skewns,mdiff_to_volatility_kurtos,label
neo,6186.555556,5824.0,6611.747396,1.372165,3.604629,31.586522,20.12,39.027937,1.372236,1.261594,...,2.196671,0.46121,1.907444,6.457338,0.075625,0.0,0.924062,19.345173,409.596195,0
bitcoin-gold,24770.355731,13296.0,32325.900838,2.885266,9.292075,129.117545,84.71,103.964299,1.048445,0.166518,...,0.225699,0.080849,-0.072862,3.205795,0.565094,0.502647,0.330306,1.074817,1.513131,0
ethereum-classic,10809.951152,10763.0,8286.560932,0.689049,0.72534,12.379725,13.69,10.556761,0.660376,-0.27908,...,2.753299,1.736931,1.205032,0.656309,17.226962,12.174938,79.397441,25.285356,655.311219,0


----------------------------------------
label 1


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,mdiff_to_volatility_mean,mdiff_to_volatility_median,mdiff_to_volatility_stddev,mdiff_to_volatility_skewns,mdiff_to_volatility_kurtos,label
lisk,1256.345088,845.5,1306.419742,2.170264,8.431071,5.084455,1.765,7.069427,1.84691,2.94081,...,1.780763,0.396637,1.179027,2.528794,234.195165,194.815368,189.347875,1.589927,4.444137,1
pivx,2817.088009,2696.0,1551.385042,1.526174,8.624566,2.08026,1.36,2.71116,1.75551,3.364241,...,0.187977,0.169446,1.370168,3.121039,41.565378,1.1469,1076.473628,29.339617,858.875856,1
verge,4775.91204,707.0,9105.175995,2.395383,4.998431,0.011389,2.5e-05,0.029746,3.755064,16.750922,...,0.701068,0.183204,-0.552401,-0.252065,950.513747,33.23857,7896.868751,31.924672,1099.745328,1
gas,4605.768519,4411.5,5576.554873,2.587282,15.329782,23.146736,20.59,13.790248,1.121104,1.914183,...,1.800429,0.809647,1.883398,3.804189,73.458813,0.013154,197.234739,3.986829,21.984833,1
digibyte,7653.289331,4285.0,7987.288277,2.910194,12.477327,0.007896,0.000279,0.016372,2.99155,11.571083,...,0.640099,0.136387,0.030436,2.4606,1334.301447,122.844788,2645.07516,4.81118,40.258149,1
zcash,53671.197802,56198.0,28965.45871,0.018248,-1.350792,233.724953,220.8,184.993409,3.025848,20.671637,...,0.096124,0.051123,5.473586,41.712232,2.476253,1.655919,3.959993,4.433368,23.606822,1
dogecoin,44725.500591,35972.0,30990.339987,2.850689,11.2763,0.001132,0.000243,0.001916,3.192562,13.44,...,0.44532,0.129465,0.957839,0.907888,37756.476612,24201.275609,332187.587638,40.333971,1632.795528,1
decred,10833.712222,11097.5,3816.079987,0.222284,0.659588,29.186445,15.51,33.340191,0.980179,-0.305222,...,0.302313,0.07204,1.659293,9.949313,59.520629,8.81254,148.366603,5.126046,33.825512,1
waves,4530.174551,3220.0,4783.672582,2.243263,7.484182,3.155054,2.86,3.29103,1.33811,1.970991,...,2.522097,3.895967,2.982409,12.366264,42.177654,4.054797,159.940076,5.716705,32.118045,1


----------------------------------------
label 2


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,mdiff_to_volatility_mean,mdiff_to_volatility_median,mdiff_to_volatility_stddev,mdiff_to_volatility_skewns,mdiff_to_volatility_kurtos,label
nem,870.648026,356.0,1267.603028,4.200264,32.138797,0.131956,0.005698,0.247699,3.310615,13.703799,...,1.634948,20.710297,7.254711,70.183217,6197.273077,5089.033717,7475.500598,20.369103,578.018261,2
dash,18778.626667,10335.0,17916.087571,2.149134,12.087689,120.214314,7.775,230.905655,2.654101,7.803201,...,0.170657,0.158673,3.688645,26.213098,8.365254,3.932487,10.370287,7.242054,139.679172,2


----------------------------------------
Top-level cluster: 1
shape: (61, 55)
label 0


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
wanchain,4.423651,4.025,2.07861,0.705453,-0.634565,4.682857,4.34,2.143479,0.658691,-0.713747,...,0.000533,0.000191,0.341138,-1.026182,0.0017,0.001627,0.000809,0.505815,-0.966857,0
maker,792.610902,781.93,322.325747,0.238778,-0.239704,837.310865,800.455,347.995799,0.361101,-0.123107,...,0.086341,0.027893,-0.549053,0.029306,0.001821,0.001697,0.000581,0.909274,0.563056,0
hshare,11.814399,9.74,7.306931,1.559535,2.078288,12.8239,10.49,8.210162,1.554403,1.961953,...,0.001251,0.001159,3.532915,14.703203,0.001635,0.001264,0.001003,1.514419,2.000851,0
bitcoin-diamond,14.807918,4.5,17.294688,1.481229,1.833662,17.13449,4.82,20.11647,1.460349,1.778262,...,0.000492,0.001572,2.310487,6.983425,0.001651,0.001442,0.000601,1.048382,-0.029951,0
bitcoin-private,23.240863,22.56,12.86043,1.26174,2.232808,25.634964,23.76,15.385629,1.464457,2.555959,...,0.002917,0.001316,1.137621,2.056149,0.001502,0.001397,0.000807,0.883645,0.207065,0


----------------------------------------
label 1


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
ardor,0.248999,0.146666,0.360959,2.863432,8.456138,0.269928,0.155014,0.404876,3.03744,9.818911,...,3.3e-05,2.5e-05,1.839779,3.890813,0.000892,0.000529,0.001279,2.871493,8.676248,1
steem,1.408093,1.07,1.385575,1.503642,2.264808,1.52034,1.15,1.49698,1.540028,2.42837,...,0.000316,0.000859,3.948504,16.988578,0.001109,0.000823,0.001216,1.600084,2.544892,1


----------------------------------------
label 2


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
bytecoin-bcn,0.001071,5e-05,0.002226,3.862114,26.67343,0.001157,5.4e-05,0.002402,3.689752,22.428601,...,9.970023e-08,2.660023e-07,2.3131,6.036797,0.000677,3e-05,0.001361,2.926192,10.582155,2
siacoin,0.006439,0.000607,0.011083,3.222797,14.875275,0.006968,0.000646,0.012276,3.443014,17.092187,...,7.107616e-07,1.341594e-06,1.911383,3.867497,0.000708,3.8e-05,0.001243,3.139076,14.080927,2
bitshares,0.068236,0.007727,0.123098,2.921102,10.656447,0.072932,0.008211,0.132705,2.94653,10.692251,...,1.797706e-05,2.180321e-05,1.976971,5.055244,0.000615,6.3e-05,0.001122,2.914992,10.565782,2


----------------------------------------
label 3


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
golem-network-tokens,0.298102,0.27422,0.226029,0.86406,0.87531,0.319904,0.292045,0.244917,0.91339,1.0237,...,5.1e-05,4.9e-05,1.739616,2.716876,0.000883,0.000811,0.000657,0.848335,0.866487,3
ark,2.492056,2.455,1.990124,1.088746,1.188183,2.684415,2.59,2.160226,1.150835,1.356311,...,0.000329,0.000176,0.976624,1.708981,0.000864,0.000858,0.000684,1.02922,1.032684,3
komodo,2.420748,1.89,2.23631,1.774483,3.604554,2.602976,2.0,2.443241,1.910811,4.456263,...,0.000341,0.000169,0.446671,-0.131254,0.000873,0.000672,0.000815,1.79986,3.736237,3


----------------------------------------
label 4


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
dentacoin,0.000574,0.000374,0.000725,3.900904,20.085688,0.00066,0.000398,0.000929,4.605233,27.334378,...,5.046969e-08,5.396153e-08,2.35934,8.467067,0.000626,0.000423,0.000832,3.695435,18.454049,4
wax,0.401025,0.25352,0.476952,4.325298,28.728535,0.464423,0.270466,0.611431,4.229268,24.002165,...,2.754163e-05,3.106317e-05,3.446221,17.31567,0.0007,0.00049,0.000626,2.614212,8.014067,4
digixdao,85.084494,59.435,100.152757,1.760459,3.061513,91.100699,63.56,108.366901,1.796305,3.257511,...,0.01754502,0.01035381,1.91271,5.650685,0.000602,0.000429,0.000703,1.728822,2.928424,4


----------------------------------------
label 5


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
crown,0.547453,0.011036,0.885053,1.936076,3.648528,0.597505,0.01356,0.962189,1.942499,3.727477,...,1.721539e-05,0.0001486377,1.497302,1.658155,3.1e-05,4.047599e-07,5.2e-05,2.009594,3.906457,5
diamond,2.356839,0.29339,4.632165,2.946563,9.965001,2.547863,0.313869,5.053897,3.092576,11.583995,...,0.0006355228,0.0004713165,1.508904,2.862232,2e-05,1.579299e-06,4.2e-05,2.943282,9.895606,5
transfercoin,0.502495,0.016246,0.857478,2.219602,4.770671,0.554953,0.017823,0.943963,2.192058,4.53786,...,3.742565e-05,8.104771e-05,0.994257,0.090049,1.1e-05,3.047554e-07,1.9e-05,2.195006,4.615749,5
reddcoin,0.00128,3.9e-05,0.003109,3.69866,17.764759,0.001403,4.3e-05,0.003454,3.82565,18.795357,...,7.314169e-08,2.923602e-07,1.982224,3.152333,0.000128,3.73251e-06,0.000311,3.679076,17.471926,5
korecoin,0.990036,0.029753,1.793782,1.97015,3.236238,1.096288,0.035417,1.994462,2.001036,3.376039,...,5.689136e-05,0.0002523717,2.585461,8.747095,7e-06,1.768558e-07,1.3e-05,1.967379,3.194516,5
neoscoin,1.241401,0.057776,2.31229,3.161752,13.667693,1.352071,0.061973,2.538932,3.18389,13.642464,...,8.058983e-05,0.000395699,2.43167,6.994469,1.6e-05,6.995261e-07,2.9e-05,3.216032,14.13288,5
blocknet,6.493587,0.121778,11.061462,1.786041,2.636969,7.061386,0.135489,12.054517,1.802897,2.6883,...,0.0002827746,0.001407655,1.468569,1.458448,0.000117,2.418933e-06,0.000194,1.720409,2.371543,5
monetaryunit,0.046694,0.00054,0.079685,2.313304,6.627993,0.051421,0.000615,0.088827,2.436268,7.53926,...,9.195411e-07,1.235955e-05,1.367737,0.752221,2e-05,1.869393e-07,3.4e-05,2.192247,5.862209,5
maidsafecoin,0.170106,0.073602,0.20653,1.70473,3.147239,0.180155,0.076747,0.220148,1.735564,3.272201,...,8.949929e-05,5.166388e-05,0.571676,-0.485515,0.000268,0.0001158692,0.000326,1.713392,3.191458,5
emercoin,0.818447,0.224321,1.319319,2.477042,7.231048,0.878846,0.240944,1.427773,2.584766,8.194776,...,0.0002491345,0.0002201666,0.715023,-0.003978,0.000117,2.966228e-05,0.000191,2.453898,6.97462,5


----------------------------------------
label 6


Unnamed: 0,close_mean,close_median,close_stddev,close_skewns,close_kurtos,high_mean,high_median,high_stddev,high_skewns,high_kurtos,...,rate_btc_median,rate_btc_stddev,rate_btc_skewns,rate_btc_kurtos,mcap_ratio_mean,mcap_ratio_median,mcap_ratio_stddev,mcap_ratio_skewns,mcap_ratio_kurtos,label
polymath-network,0.62313,0.523797,0.286755,0.890957,0.079759,0.674753,0.556775,0.319642,0.871556,-0.089713,...,6.51723e-05,2.517043e-05,0.595057,-1.033858,0.000539,0.000431,0.000243,0.899825,-0.084507,6
enigma-project,1.977663,1.72,1.39026,1.493452,2.804216,2.14362,1.86,1.555507,1.594281,2.992226,...,0.0002182171,0.0001071887,0.07945,-0.402444,0.000518,0.000448,0.000363,1.487507,2.761826,6
zencash,22.862708,21.46,14.16501,0.390113,-0.73513,24.753895,22.99,15.265151,0.431416,-0.64701,...,0.00291958,0.0009794733,0.225385,-0.418846,0.000253,0.000254,0.000175,0.127273,-1.272639,6
nuls,2.666116,2.59,1.481698,0.532731,0.79416,2.867961,2.73,1.607618,0.670237,1.117421,...,0.0003170639,0.000148568,-0.422798,-0.802625,0.000361,0.000346,0.000134,0.413765,-0.452485,6
decentraland,0.086779,0.09555,0.050646,-0.01283,-0.326775,0.094069,0.101427,0.055044,0.124299,0.007083,...,1.080277e-05,5.202075e-06,-0.290883,-1.107702,0.000376,0.000369,0.00021,0.644192,0.732772,6
dropil,0.005598,0.00572,0.000849,-0.516029,-0.232788,0.006113,0.006088,0.001064,0.264097,1.255825,...,7.808254e-07,1.437768e-07,-0.846285,0.018465,0.00037,0.000386,6e-05,-0.779544,-0.361698,6
cybermiles,0.209653,0.188085,0.098722,0.93793,0.839453,0.22671,0.197038,0.110596,1.017315,1.071018,...,2.210712e-05,1.016537e-05,0.654467,-0.231995,0.000416,0.000371,0.000202,0.562836,-0.763781,6
theta-token,0.164444,0.153641,0.043322,1.12532,0.825498,0.17606,0.163871,0.046669,1.019619,0.49356,...,1.807421e-05,5.205841e-06,0.911038,0.280574,0.000351,0.000334,9.5e-05,0.90955,0.278834,6
monaco,8.256865,7.5,4.213411,0.593459,0.441495,9.000823,8.13,4.740944,0.730917,0.649901,...,0.0009692494,0.000628968,2.325042,7.92749,0.000362,0.000329,0.000195,0.436356,-0.226983,6
moac,9.079839,9.26,3.851052,-0.06142,-1.001265,9.649731,9.84,4.070448,-0.035481,-0.93643,...,0.001118302,0.0003698108,0.094831,-0.422601,0.000518,0.000524,0.000114,0.177664,-0.805532,6


----------------------------------------
Top-level cluster: 2
shape: (3, 10)
Top-level cluster: 3
shape: (27, 95)
label 0


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,trans_per_address_median,trans_per_address_stddev,trans_per_address_skewns,trans_per_address_kurtos,tx_per_address_mean,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,label
icon,1882.450161,1056.0,1919.075742,1.637988,1.567113,3.354226,2.67,2.305503,1.513255,1.962505,...,15439.828774,555552.458802,9.788784,101.536048,1.067163,1.020315,0.210151,0.692688,-0.153612,0
vechain,949.394509,699.5,1200.307321,9.380132,127.998701,2.590254,2.61,2.073741,0.348733,-0.83717,...,13806.638918,57591.330938,11.81781,172.859691,1.080835,1.081693,0.11419,-0.055933,0.398785,0


----------------------------------------
label 1


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,trans_per_address_median,trans_per_address_stddev,trans_per_address_skewns,trans_per_address_kurtos,tx_per_address_mean,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,label
populous,449.096606,368.0,289.220522,2.592868,11.805822,16.097421,11.455,15.257558,1.439948,1.470089,...,5537.818441,6190.729083,1.694793,3.00697,1.216695,1.184783,0.237509,6.244834,68.613779,1
zilliqa,730.729592,596.5,817.031316,6.584975,52.120121,0.081106,0.071365,0.035365,0.969616,0.274079,...,16241.574272,26118.927198,7.647189,75.329836,1.400679,1.3722,0.324885,0.265146,-0.756347,1
rchain,65.275304,47.5,74.429443,2.578367,13.607838,1.094281,1.05,0.678605,0.335606,-0.615536,...,5708.825768,249371.80879,11.218519,139.014991,0.897106,0.90295,0.215825,0.84883,3.865996,1
aelf,410.696833,257.0,1106.880388,13.475649,190.262334,1.091667,1.05,0.451269,0.740229,-0.127404,...,15626.56323,42210.151964,6.545389,53.551399,1.499277,1.363958,0.440857,1.578175,3.187754,1
qash,259.127586,184.0,227.125618,2.312331,7.983496,0.764087,0.689685,0.423743,1.446614,2.546533,...,5886.808622,24812.639626,6.688334,49.751077,1.018055,1.016605,0.186535,0.067582,0.897683,1
ethos,323.564557,278.0,208.52125,2.368344,9.206586,2.256412,1.7,1.835292,1.707954,3.736363,...,4224.490677,5869.650188,5.926127,60.513508,1.043667,1.016506,0.153906,0.727335,1.03982,1
augur,456.766484,377.0,323.06407,3.191605,18.721484,20.261361,15.85,19.904376,1.715515,3.295499,...,5483.098352,15132.52055,6.322831,47.898602,1.385219,1.304811,0.451716,6.304069,48.630149,1
funfair,568.948052,297.0,630.972189,2.180213,4.649262,0.03946,0.029683,0.029096,2.533569,7.665132,...,3075.443915,86043.307517,14.154505,221.879295,1.449644,1.233333,0.659184,2.553176,7.034447,1
kucoin-shares,216.581121,15.0,1513.37328,11.114239,144.287287,3.680672,2.99,3.411707,2.522776,7.75043,...,1353.139363,964460.4913,12.316858,155.735274,0.926963,0.869565,0.341569,4.804123,34.968228,1
aion,350.647059,242.0,410.546863,3.701675,15.264639,2.691735,2.355,1.888573,1.611819,3.118996,...,8500.707163,155013.764539,13.618181,202.1781,1.163089,1.126638,0.201389,1.502889,4.640763,1


----------------------------------------
label 2


Unnamed: 0,active_address_mean,active_address_median,active_address_stddev,active_address_skewns,active_address_kurtos,close_mean,close_median,close_stddev,close_skewns,close_kurtos,...,trans_per_address_median,trans_per_address_stddev,trans_per_address_skewns,trans_per_address_kurtos,tx_per_address_mean,tx_per_address_median,tx_per_address_stddev,tx_per_address_skewns,tx_per_address_kurtos,label
binance-coin,999.766234,217.0,7791.168444,14.382106,226.810603,8.084689,9.26,5.893013,0.043746,-1.360099,...,5051.448981,23890.993978,6.502648,49.615987,1.082846,1.062092,0.193961,0.827556,2.178119,2
omisego,3904.139896,2467.5,8247.695864,6.691323,49.809485,10.89489,9.82,4.869719,0.360876,0.131471,...,7600.454755,18506.548801,13.566305,214.684558,1.375349,1.256275,0.413903,4.556722,30.751261,2


----------------------------------------
Unclustered coins:
factom
tron
mixin
bitcoin
elastos
veritaseum
ontology
gifto
iostoken
litecoin
bitcoin-cash
tether
huobi-token
stratis
cardano
kin
smartcash
monero
qtum
pundi-x
iota
nebulas-token
tezos
ripple
ethereum
cryptonex
mithril
nano
nxt
eos
stellar
tenx


## Important notes
### Approach limitations
Implemented approach is very general. It doesn't detect any mutual dependencies, following trends and other complex elements. Therefore, it should be considered as clustering by basic time series characteristics. Particularly, top-level clustering is a rather necessary measure than the usual taken step. List of features describing each time serie is quite short, but it can easily be extended with additional features.
### Label "-1"
HDBSCAN (as extension of DBSCAN method) groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). This leads to appearing a section named "Unclustered coins" which includes these outliers belonging to -1 label. This list consists of two types of coins: coins with big amount of data missing and **real** outliers. Each coin presented in this list should be considered as **special**. it's not surprising, therefore, that couple of crypto-headliners appeared there: they are really outliers by many of the parameters.
### Metrics
HDBSCAN [can use different metrics][0] to calculate distance between coins by their values of features. The case examined uses default Euclidean metrics. Main disadvantage of such metrics is assumption that different parameters equally influence the distance. That actually is not true. The question of choosing metrics for high-dimensional clustering is [very][1] [controversial][2]. Generally one of the best ways is to define weight for each feature and use weighted Euclidean distance. Possibly this can be the theme of separate (and pretty complex) research.
### Methods
Of course there are a lot of clustering algorithms nowadays that can be applied for this task instead of (H)DBSCAN. But most probably, applying of described approach to data splitted by timeframes of different market states as well as widening feature-sets extraction method can deliver more significant results. Next proposed step is switching to tsfresh library for automated features extraction. 
[0]:https://hdbscan.readthedocs.io/en/latest/basic_hdbscan.html#what-about-different-metrics
[1]:https://www.researchgate.net/post/What_is_the_best_distance_measure_for_high_dimensional_data
[2]:https://stats.stackexchange.com/questions/99171/why-is-euclidean-distance-not-a-good-metric-in-high-dimensions