We will try to cluster the heroes from Heroes of the Storm according to the player stats from hotslogs.com
First we will try k-mean clustering alone, then we will try it with the help of an autoencoder. 

Let's begin with k-mean clustering.
If the playerbase played each hero according to their optimal role, we expect the following classes:
Front line/Main tank: takes the lead in hero-hero fights
Damage dealer: deals high hero damage
Side laner: absorbs minion lanes (high siege damage)
Support: Form factor in team fights and/or healer

Some heroes can perform more than one role.

In [2]:
import numpy as np
import sklearn
from sklearn import preprocessing
from sklearn.cluster import KMeans

In [3]:
import itertools
from tabulate import tabulate

In [4]:
def kmeans_classify(X, exclusions, nclasses):
    
    #perform clustering and call the class indices
    kmeans = KMeans(n_clusters=nclasses, tol=1e-6).fit(X)
    indx = kmeans.labels_
    
    #Order classes by size
    #This way the classes don't bounce around after re-run
    cls_size = np.unique(indx, return_counts=True)[1]
    sorted_indx = np.argsort(cls_size)

    l=[]
    for i in sorted_indx:
        ltemp = list(heroes[np.argwhere(indx==i)].flatten())
        ltemp.sort()
        l.append(ltemp)
    ltrans = list(map(list,itertools.zip_longest(*l,fillvalue=None)))
    print(tabulate(ltrans))

    #remove exlucded columns from header and table
    header2 = np.delete(header,exclusions,0)
    stats2 = np.delete(stats,exclusions,1)
    
    #Print the mean and std of stats in each class
    cls_avg = []
    cls_avg = [np.append('Metric - Class#',header2)]
    j=0
    for i in sorted_indx:
        avg = np.round(np.mean(stats2[np.argwhere(indx==i).flatten()],axis=0),1)
        std = np.std(stats2[np.argwhere(indx==i).flatten()],axis=0)
        cv = np.round(std/(avg+1e-6),2)
        avg = np.append('Avg. %d'%j,avg)
        std = np.append('CV. %d'%j,cv)
        j+=1

        cls_avg.append(avg)
        cls_avg.append(std)
    cls_avg = list(map(list,zip(*cls_avg)))
    print(tabulate(cls_avg))

In [5]:
#Load data
d = np.load('hots_stats.npz')
header = d['header'][1:]
heroes = d['heroes']
stats = d['stats']

In [6]:
#Normalize data
normalizer = preprocessing.Normalizer()
norm_stats = normalizer.transform(stats)  

In [7]:
#Remove Games, Win rate, Avg. game duration (superfluous) and T/D (redundant)
X = np.delete(norm_stats,[0,1,2,3],1)
#Exclude these rows from the print out
exclusions = [0,1,2,3]

In [8]:
kmeans_classify(X=X,exclusions=exclusions,nclasses=4)
#with 4 classes healers already isolated
#but we see damage dealers with tanks

---------  -----------  ----------------  -----------
Anub'arak  Alexstrasza  Alarak            Abathur
Artanis    Ana          Arthas            Azmodan
Diablo     Anduin       Blaze             Cassia
E.T.C.     Auriel       Chen              Chromie
Garrosh    Brightwing   Cho               Deathwing
Johanna    Deckard      D.Va              Falstad
Li-Ming    Kharazim     Dehaka            Fenix
Mei        Li Li        Genji             Gall
Muradin    Lt. Morales  Hogger            Gazlowe
Raynor     Lúcio        Illidan           Greymane
Sonya      Malfurion    Imperius          Gul'dan
Stitches   Rehgar       Kerrigan          Hanzo
Tychus     Stukov       Leoric            Jaina
Valla      Tyrande      Maiev             Junkrat
Varian     Uther        Mal'Ganis         Kael'thas
           Whitemane    Malthael          Kel'Thuzad
                        Medivh            Lunara
                        Qhira             Mephisto
                        Rexxar            Murky


In [9]:
#Inspect stats of misplaced damage dealers
hero1=stats[np.argwhere(heroes==['Raynor'])].flatten()
hero2=stats[np.argwhere(heroes==['Valla'])].flatten()
hero3=stats[np.argwhere(heroes==['Dehaka'])].flatten()
print(tabulate(np.transpose(np.stack((header,hero1,hero2,hero3)))))
#They do seem different. perhaps they were leftovers who did not fit anywhere else

----------  ---------  ---------  --------
Games       146356     216085     94831
Win %           49.4       50.8      49.6
Avg Length      18.93      18.95     19.13
T/D              3.4        3.9       4.2
Takedowns       11.3       13.1      11.3
Kills            3.9        5.3       2.6
Deaths           3.4        3.4       2.7
Hero Dmg     51078      63637     34286
Siege Dmg    65184      77053     94981
Healing          0          0         0
Self Heal    11314       4916     25718
Dmg Taken    38297      32673     76365
XP           11229      10963     15179
----------  ---------  ---------  --------


In [10]:
kmeans_classify(X=X,exclusions=exclusions,nclasses=5)
#Much better with 5 classes
#1: high tanked damage
#2: healers
#3: high hero and siege damage, low tank
#4: similar, but less siege damage and more takedowns; they engage in hero fights
#5: Versatile; mid hero damage, good siege damage, good tank

---------  -----------  -----------  ---------  ----------------
Anub'arak  Alexstrasza  Abathur      Azmodan    Alarak
Artanis    Ana          Chromie      Cassia     Arthas
Diablo     Anduin       Deathwing    Falstad    Blaze
E.T.C.     Auriel       Fenix        Gazlowe    Chen
Garrosh    Brightwing   Gall         Greymane   Cho
Johanna    Deckard      Gul'dan      Hanzo      D.Va
Mei        Kharazim     Junkrat      Jaina      Dehaka
Muradin    Li Li        Kel'Thuzad   Kael'thas  Genji
Sonya      Lt. Morales  Mephisto     Li-Ming    Hogger
Stitches   Lúcio        Murky        Lunara     Illidan
Varian     Malfurion    Probius      Nazeebo    Imperius
           Rehgar       Ragnaros     Nova       Kerrigan
           Stukov       Samuro       Orphea     Leoric
           Tyrande      Sgt. Hammer  Raynor     Maiev
           Uther        Xul          Sylvanas   Mal'Ganis
           Whitemane    Zagara       Tassadar   Malthael
                                     Tychus     Medivh


In [12]:
#You'd expect Varian and Sonya to be on a different group
#Varian can deal heavy hero damage, but his actual damage is modest
#Sonya deals high siege damage, but she did not fit into other classes
hero1=stats[np.argwhere(heroes==['Varian'])].flatten()
hero2=stats[np.argwhere(heroes==['Sonya'])].flatten()
print(tabulate(np.transpose(np.stack((header,hero1,hero2)))))

----------  ---------  ---------
Games       180456     149662
Win %           50         51.1
Avg Length      18.98      18.87
T/D              3.5        3
Takedowns       12.9       10.9
Kills            3.5        3.5
Deaths           3.6        3.6
Hero Dmg     38271      40934
Siege Dmg    44672      89210
Healing          0          0
Self Heal    33339      26521
Dmg Taken    81273      67623
XP            9777      13696
----------  ---------  ---------


In [345]:
kmeans_classify(X=X,exclusions=exclusions,nclasses=6)
#With 6 classes the large class (5 previously) splits into classes 0 and 5
#0 has higher hero damage, kills and takedowns (kill assists)
#5 is closer to the older class
#Sonya has moved to Azmodan's class which makes sense

-------  ---------  -----------  -----------  ---------  ----------------
Alarak   Anub'arak  Alexstrasza  Abathur      Azmodan    Arthas
Genji    Artanis    Ana          Chromie      Cassia     Blaze
Maiev    Diablo     Anduin       Deathwing    Falstad    Chen
Medivh   E.T.C.     Auriel       Fenix        Gazlowe    Cho
Nova     Garrosh    Brightwing   Gall         Greymane   D.Va
Qhira    Johanna    Deckard      Gul'dan      Hanzo      Dehaka
Tracer   Mei        Kharazim     Junkrat      Jaina      Hogger
Valeera  Muradin    Li Li        Kel'Thuzad   Kael'thas  Illidan
Zeratul  Stitches   Lt. Morales  Mephisto     Li-Ming    Imperius
         Varian     Lúcio        Murky        Lunara     Kerrigan
                    Malfurion    Probius      Nazeebo    Leoric
                    Rehgar       Ragnaros     Orphea     Mal'Ganis
                    Stukov       Samuro       Raynor     Malthael
                    Tyrande      Sgt. Hammer  Sonya      Rexxar
                    Uther   

In [287]:
hero1=stats[np.argwhere(heroes==['Medivh'])].flatten()
hero2=stats[np.argwhere(heroes==['Tracer'])].flatten()
print(tabulate(np.transpose(np.stack((header,hero1,hero2)))))

----------  --------  --------
Games       24043     57214
Win %          42.1      53.8
Avg Length     19.29     18.87
T/D             4.5       4.2
Takedowns      12.4      14.1
Kills           3         6.1
Deaths          2.7       3.4
Hero Dmg    46291     52842
Siege Dmg   50256     58983
Healing     20836      4502
Self Heal     598     12708
Dmg Taken   38792     44230
XP           9215     11141
----------  --------  --------


In [13]:
kmeans_classify(X=X,exclusions=exclusions,nclasses=7)


----------  -----------  ---------  ---------  -----------  ----------------  ---------
Anduin      Alexstrasza  Anub'arak  Alarak     Abathur      Arthas            Azmodan
Brightwing  Ana          Artanis    Deathwing  Chromie      Blaze             Cassia
Li Li       Auriel       Diablo     Genji      Fenix        Chen              Falstad
Lúcio       Deckard      E.T.C.     Hogger     Gall         Cho               Gazlowe
Malfurion   Kharazim     Garrosh    Maiev      Junkrat      D.Va              Greymane
Stukov      Lt. Morales  Johanna    Malthael   Kel'Thuzad   Dehaka            Gul'dan
Uther       Rehgar       Mei        Medivh     Murky        Illidan           Hanzo
            Tyrande      Muradin    Mephisto   Probius      Imperius          Jaina
            Whitemane    Stitches   Nova       Ragnaros     Kerrigan          Kael'thas
                         Varian     Qhira      Samuro       Leoric            Li-Ming
                                    Tracer     Sgt. Ha

The results are already looking fairly good. But let us see how dim-red (dimensionality reduction) can alter the classes. We'll focus on 5 and 6 classes from now on.