## Using Mapper method to evaluate features for classification purposes

The classification method is borrowed from scikit-learn.  Here it creates a synthetic data set and classifies data into two classes with values 0 and 1.  You will want to change random_state=0 option to a value greater than 0 to produce a unique data set.  Then you can use the options in the second cell or implement a different method to save your unique X and y that result from the classification.  I use 1000 points, 10 features, 3 significant features.  You will use 4000 points with 20 features, 6 of them significant.

You will see I use a numpy function to save X and y.

At the very end, I have different lines coloring the mapper graph by labels and by a projection to the 6th coordinate.  You will need to do this or something similar many times and ispect the results.

Where to find your graph?  It's in the same directory as your notebook, saved as an HTML file with the name displayed in the last cell here.  If you want to save all of them or some of them for the final report, you need to change the name of that file in the last cell or else you will be overwriting the HTML file.


In [None]:
pip install kmapper

Collecting kmapper
[?25l  Downloading https://files.pythonhosted.org/packages/ef/2f/ccfde8ee5b1411608e1bc0a9e1655089cb6202637e8977fb7f5e9a19a8dc/kmapper-1.2.0.tar.gz (97kB)
[K     |███▍                            | 10kB 17.5MB/s eta 0:00:01[K     |██████▊                         | 20kB 3.2MB/s eta 0:00:01[K     |██████████▏                     | 30kB 4.7MB/s eta 0:00:01[K     |█████████████▌                  | 40kB 3.0MB/s eta 0:00:01[K     |█████████████████               | 51kB 3.7MB/s eta 0:00:01[K     |████████████████████▎           | 61kB 4.4MB/s eta 0:00:01[K     |███████████████████████▋        | 71kB 5.0MB/s eta 0:00:01[K     |███████████████████████████     | 81kB 5.6MB/s eta 0:00:01[K     |██████████████████████████████▍ | 92kB 6.3MB/s eta 0:00:01[K     |████████████████████████████████| 102kB 4.6MB/s 
Building wheels for collected packages: kmapper
  Building wheel for kmapper (setup.py) ... [?25l[?25hdone
  Created wheel for kmapper: filename=kmapper

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

import numpy

In [None]:
make_new = True
if make_new:
    X, y = make_classification(n_samples=4000, n_features=20,
                           n_informative=6, n_redundant=0,
                           random_state=0, shuffle=False)
    numpy.savetxt("my-data-new.csv", X, delimiter=",")
    numpy.savetxt("my-labels-new.csv", y, delimiter=",")
else:
    X = numpy.loadtxt("my-data.csv", delimiter=",") #rename new file to this in order to reuse
    y = numpy.loadtxt("my-labels.csv", delimiter=",")
clf = RandomForestClassifier(n_estimators=100, max_depth=2,
                             random_state=0)
clf.fit(X, y) 
print(clf.feature_importances_)
print(clf.predict([[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]]))

[0.21389544 0.34383208 0.03015637 0.14851396 0.13285238 0.02581967
 0.00144233 0.0013772  0.00056222 0.00283706 0.00349898 0.00250407
 0.00786237 0.02210289 0.01667139 0.00608484 0.00363643 0.00858977
 0.00742539 0.02033513]
[0]


In [None]:

#print(len(X[:,0]))
#print(X[:,0])


from scipy.stats import norm

In [None]:
print(X.shape)
#print(X)
from scipy import stats
print(stats.describe(X))

(4000, 20)
DescribeResult(nobs=4000, minmax=(array([-4.98158292, -5.83352889, -4.35897215, -5.68139708, -5.39659656,
       -4.34807089, -3.84367787, -3.51628451, -3.30227934, -3.81973252,
       -3.60741055, -3.37706575, -3.30938283, -3.31314784, -3.14096384,
       -2.87696485, -3.93749333, -3.77362746, -4.33369486, -3.07102877]), array([6.1756907 , 6.17617618, 6.31327805, 6.69662267, 5.71490953,
       6.96542896, 3.61933715, 3.97538855, 4.02938042, 3.74870469,
       4.09438398, 3.58692147, 3.78752433, 3.43969127, 3.38805998,
       3.4563141 , 4.04626168, 3.7123124 , 3.8325801 , 3.29429136])), mean=array([ 4.96416142e-01,  1.32022555e-02,  9.97777658e-01,  4.99276483e-01,
       -5.14969209e-01,  1.02994584e+00, -6.21588576e-03, -1.53859247e-02,
        1.93307164e-02,  1.80533568e-02, -1.60577425e-03, -3.40246509e-02,
       -1.62477322e-03, -1.28265362e-02,  2.37286522e-02, -6.61120750e-03,
       -4.00996515e-04,  2.37448373e-02, -1.77781686e-02,  2.28501388e-02]), variance=arr

In [None]:
print(y)

[0 0 0 ... 1 1 1]


In [None]:
# matplotlib inline
import matplotlib

# Import the class
import kmapper as km
import sklearn

# Some sample data
from sklearn import datasets
#data, labels = datasets.make_circles(n_samples=5000, noise=0.03, factor=0.3)

# Initialize
mapper = km.KeplerMapper(verbose=1)

# Fit to and transform the data
projected_data = mapper.fit_transform(X, projection=[0]) # X-Y axis

print("***")
print(stats.describe(projected_data))
print("***")



# Create dictionary called 'graph' with nodes, edges and meta-information
graph = mapper.map(projected_data, X, 
                   clusterer=sklearn.cluster.DBSCAN(algorithm='auto', eps=4.5, leaf_size=200, 
                                                    metric='euclidean', metric_params=None, min_samples=3, 
                                                    n_jobs=None, p=None), 
                   cover=km.Cover(n_cubes=10, perc_overlap=0.4, limits=None, verbose=0), 
                   nerve=km.GraphNerve(min_intersection=1), precomputed=False, remove_duplicate_nodes=False, 
                   overlap_perc=None, nr_cubes=None)

print(graph)

# Visualize it
mapper.visualize(graph, path_html="classifier_y2.html",
                 title="classifier on y",
                 color_function=y # y is the classification label light=0 dark=1
                 #color_function = norm.mean(X[:,15])
                 #color_function=X[:,2] # coloring by the 6th coordinate projection
                )

KeplerMapper()
..Composing projection pipeline of length 1:
	Projections: [0]
	Distance matrices: False
	Scalers: MinMaxScaler(copy=True, feature_range=(0, 1))
..Projecting on data shaped (4000, 20)

..Projecting data using: [0]

..Scaling with: MinMaxScaler(copy=True, feature_range=(0, 1))

***
DescribeResult(nobs=4000, minmax=(array([0.]), array([1.])), mean=array([0.49098008]), variance=array([0.02268879]), skewness=array([-0.13318316]), kurtosis=array([-0.13387529]))
***
Mapping on data shaped (4000, 20) using lens shaped (4000, 1)

Creating 10 hypercubes.

Created 12 edges and 13 nodes in 0:00:00.253385.
{'meta_nodes': defaultdict(<type 'list'>, {}), 'nodes': defaultdict(<type 'list'>, {'cube2_cluster0': [2, 42, 108, 156, 196, 205, 211, 218, 219, 270, 274, 284, 287, 412, 471, 496, 530, 586, 622, 639, 693, 705, 709, 733, 764, 772, 784, 790, 805, 844, 865, 877, 983, 1000, 1002, 1004, 1005, 1011, 1013, 1014, 1016, 1022, 1026, 1027, 1028, 1032, 1034, 1038, 1040, 1043, 1044, 1048, 1053

u'<!DOCTYPE html>\n<html>\n\n<head>\n  <meta charset="utf-8">\n  <meta name="generator" content="KeplerMapper">\n  <title>classifier on y | KeplerMapper</title>\n\n  <link rel="icon" type="image/png" href="http://i.imgur.com/axOG6GJ.jpg" />\n\n  <link href=\'https://fonts.googleapis.com/css?family=Roboto+Mono:700,300\' rel=\'stylesheet\' type=\'text/css\'>\n  <style>* {\n  margin: 0;\n  padding: 0;\n}\n\nhtml, body {\n  height: 100%;\n}\n\nbody {\n  font-family: "Roboto Mono", "Helvetica", sans-serif;\n  font-size: 14px;\n}\n\n#logo {\n  position: absolute;\n  right: 00px;\n  top: 0px;\n  width: 90px;\n  height: 90px;\n  z-index: 999999;\n}\n\n#display {\n  color: #95A5A6;\n  background: #212121;\n}\n\n#print {\n  color: #000;\n  background: #FFF;\n}\n\nh1 {\n  font-size: 21px;\n  font-weight: 300;\n  font-weight: 300;\n}\n\nh2 {\n  font-size: 18px;\n  padding-bottom: 20px;\n  font-weight: 300;\n}\n\nh3 {\n  font-size: 14px;\n  font-weight: 700;\n  text-transform: uppercase;\n}\n\nh4 {

In [None]:
# Visualize it
mapper.visualize(graph, path_html="classifier_y2X1.html",
                 title="classifier on mean of the 3rd coordinate projection",
                 #color_function=y # y is the classification label light=0 dark=1
                 color_function = norm.mean(X[:,2]) #coloring by the mean of the 3rd coordinate projection
                 #color_function=X[:,2] # coloring by the 6th coordinate projection
                )

Wrote visualization to: classifier_y2X1.html


u'<!DOCTYPE html>\n<html>\n\n<head>\n  <meta charset="utf-8">\n  <meta name="generator" content="KeplerMapper">\n  <title>classifier on mean of the 3rd coordinate projection | KeplerMapper</title>\n\n  <link rel="icon" type="image/png" href="http://i.imgur.com/axOG6GJ.jpg" />\n\n  <link href=\'https://fonts.googleapis.com/css?family=Roboto+Mono:700,300\' rel=\'stylesheet\' type=\'text/css\'>\n  <style>* {\n  margin: 0;\n  padding: 0;\n}\n\nhtml, body {\n  height: 100%;\n}\n\nbody {\n  font-family: "Roboto Mono", "Helvetica", sans-serif;\n  font-size: 14px;\n}\n\n#logo {\n  position: absolute;\n  right: 00px;\n  top: 0px;\n  width: 90px;\n  height: 90px;\n  z-index: 999999;\n}\n\n#display {\n  color: #95A5A6;\n  background: #212121;\n}\n\n#print {\n  color: #000;\n  background: #FFF;\n}\n\nh1 {\n  font-size: 21px;\n  font-weight: 300;\n  font-weight: 300;\n}\n\nh2 {\n  font-size: 18px;\n  padding-bottom: 20px;\n  font-weight: 300;\n}\n\nh3 {\n  font-size: 14px;\n  font-weight: 700;\n  t

In [None]:
# Visualize it
mapper.visualize(graph, path_html="classifier_y2X2.html",
                 title="classifier on mean of the 5th coordinate projection",
                 #color_function=y # y is the classification label light=0 dark=1
                 color_function = norm.mean(X[:,4]) #coloring by the mean of the 5th coordinate projection
                 #color_function=X[:,2] # coloring by the 6th coordinate projection
                )

Wrote visualization to: classifier_y2X2.html


u'<!DOCTYPE html>\n<html>\n\n<head>\n  <meta charset="utf-8">\n  <meta name="generator" content="KeplerMapper">\n  <title>classifier on mean of the 5th coordinate projection | KeplerMapper</title>\n\n  <link rel="icon" type="image/png" href="http://i.imgur.com/axOG6GJ.jpg" />\n\n  <link href=\'https://fonts.googleapis.com/css?family=Roboto+Mono:700,300\' rel=\'stylesheet\' type=\'text/css\'>\n  <style>* {\n  margin: 0;\n  padding: 0;\n}\n\nhtml, body {\n  height: 100%;\n}\n\nbody {\n  font-family: "Roboto Mono", "Helvetica", sans-serif;\n  font-size: 14px;\n}\n\n#logo {\n  position: absolute;\n  right: 00px;\n  top: 0px;\n  width: 90px;\n  height: 90px;\n  z-index: 999999;\n}\n\n#display {\n  color: #95A5A6;\n  background: #212121;\n}\n\n#print {\n  color: #000;\n  background: #FFF;\n}\n\nh1 {\n  font-size: 21px;\n  font-weight: 300;\n  font-weight: 300;\n}\n\nh2 {\n  font-size: 18px;\n  padding-bottom: 20px;\n  font-weight: 300;\n}\n\nh3 {\n  font-size: 14px;\n  font-weight: 700;\n  t

In [None]:
# Visualize it
i =0
while i <=20:
  for j in range(0,20):
    mapper.visualize(graph, path_html="classifier_y2X3"+str(j+1)+".html",
                 title="classifier on "+str(j+1)+" coordinate projection",
                 #color_function=y # y is the classification label light=0 dark=1
                 #color_function = norm.mean(X[:,2]) #coloring by the mean of the 3rd coordinate projection
                 color_function=X[:,j] # coloring by the 6th coordinate projection
                )
  i+=1

Wrote visualization to: classifier_y2X31.html
Wrote visualization to: classifier_y2X32.html
Wrote visualization to: classifier_y2X33.html
Wrote visualization to: classifier_y2X34.html
Wrote visualization to: classifier_y2X35.html
Wrote visualization to: classifier_y2X36.html
Wrote visualization to: classifier_y2X37.html
Wrote visualization to: classifier_y2X38.html
Wrote visualization to: classifier_y2X39.html
Wrote visualization to: classifier_y2X310.html
Wrote visualization to: classifier_y2X311.html
Wrote visualization to: classifier_y2X312.html
Wrote visualization to: classifier_y2X313.html
Wrote visualization to: classifier_y2X314.html
Wrote visualization to: classifier_y2X315.html
Wrote visualization to: classifier_y2X316.html
Wrote visualization to: classifier_y2X317.html
Wrote visualization to: classifier_y2X318.html
Wrote visualization to: classifier_y2X319.html
Wrote visualization to: classifier_y2X320.html
Wrote visualization to: classifier_y2X31.html
Wrote visualization to:

In [None]:
# Visualize it
mapper.visualize(graph, path_html="classifier_y2X5.html",
                 title="classifier on mean of the 3rd coordinate projection",
                 #color_function=y # y is the classification label light=0 dark=1
                 color_function = norm.mean(X[:,1]) #coloring by the mean of the 3rd coordinate projection
                 #color_function=X[:,5] # coloring by the 6th coordinate projection
                )

Wrote visualization to: classifier_y2X5.html


u'<!DOCTYPE html>\n<html>\n\n<head>\n  <meta charset="utf-8">\n  <meta name="generator" content="KeplerMapper">\n  <title>classifier on mean of the 3rd coordinate projection | KeplerMapper</title>\n\n  <link rel="icon" type="image/png" href="http://i.imgur.com/axOG6GJ.jpg" />\n\n  <link href=\'https://fonts.googleapis.com/css?family=Roboto+Mono:700,300\' rel=\'stylesheet\' type=\'text/css\'>\n  <style>* {\n  margin: 0;\n  padding: 0;\n}\n\nhtml, body {\n  height: 100%;\n}\n\nbody {\n  font-family: "Roboto Mono", "Helvetica", sans-serif;\n  font-size: 14px;\n}\n\n#logo {\n  position: absolute;\n  right: 00px;\n  top: 0px;\n  width: 90px;\n  height: 90px;\n  z-index: 999999;\n}\n\n#display {\n  color: #95A5A6;\n  background: #212121;\n}\n\n#print {\n  color: #000;\n  background: #FFF;\n}\n\nh1 {\n  font-size: 21px;\n  font-weight: 300;\n  font-weight: 300;\n}\n\nh2 {\n  font-size: 18px;\n  padding-bottom: 20px;\n  font-weight: 300;\n}\n\nh3 {\n  font-size: 14px;\n  font-weight: 700;\n  t