The idea here is to try clustering Lux builds. Maybe that will reveal some underlying structure.

In [None]:
%matplotlib inline
import numpy as np
from sklearn import svm, metrics, cluster
from sklearn import linear_model as lmod
import matplotlib.pyplot as plt

In [None]:
builds = np.load('../datasets/np/champ_99_items_feature_10000.npy')
patch = np.load('../datasets/np/champ_99_version_feature_10000.npy')

In [None]:
db = cluster.DBSCAN(eps=0.3, min_samples=10)
db.fit(builds)

In [None]:
labels = db.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
print(n_clusters_)

DBSCAN produces one giant cluster. The way that DBSCAN works is by identifying regions of high density, so it stands to reason that most Lux builds would have high density around a single point. It also means that the pre-patch and post-patch segments are not significantly different.

In [None]:
km = cluster.KMeans(n_clusters=10)
km.fit(builds)

In [None]:
print(km.labels_)
km.cluster_centers_.round()

In [None]:
import json
with open('../datasets/static/item.json') as f:
    items = json.load(f)
    item_name_map = {i: items['data'][k]['name'] for i, k in enumerate(items['data'].keys())}

In [None]:
L = lmod.LogisticRegression()
output = np.ones(patch.shape)
output[patch == '5.11'] = 0
L.fit(km.labels_.reshape((-1, 1)), output)
print(metrics.r2_score(output, L.predict(km.labels_.reshape((-1,1)))))

KMeans -> logit performs poorly (to put it lightly). Let's try a different clustering method.

In [None]:
AP = cluster.AffinityPropagation()
AP.fit(builds)

In [None]:
indices = AP.cluster_centers_indices_

In [None]:
AP_M = svm.SVC(kernel='poly')
AP_M.fit(AP.labels_.reshape((-1,1)), output)
pred = AP_M.predict(AP.labels_.reshape((-1,1)))
fpr, tpr, _ = metrics.roc_curve(output, pred)
print(metrics.auc(fpr, tpr))

In [None]:
print('foo')