# Diabetes

## Importamos pandas

In [30]:
import pandas as pd

### Importamos la base de datos:

Los siguientes datos fueron recolectados por Miller y Reaven en 1976, de 145 pacientes y se encuentran publicados en:

* Andrews, D. F. and Herzberg A. M. \textit{Data: a collection of problems from many fields for the student and research worker}. Springer-Verlag, New York, 1985.

Los resultados fueron presentados en:

* Miller, R. J. Discussion - projection pursuit. Ann. Statist. 13, 2 (1985), 510-513. With discussion.

In [31]:
data = pd.read_csv('Diabetes.csv')

#### Echémosle un ojo:

In [32]:
data.head(20)

Unnamed: 0,Patient number,Relative weight,Fasting plasma glucose,Glucose area,Insulin area,SSPG,Clinical Classification
0,1,0.81,80,356,124,55,3
1,2,0.95,97,289,117,76,3
2,3,0.94,105,319,143,105,3
3,4,1.04,90,356,199,108,3
4,5,1.0,90,323,240,143,3
5,6,0.76,86,381,157,165,3
6,7,0.91,100,350,221,119,3
7,8,1.1,85,301,186,105,3
8,9,0.99,97,379,142,98,3
9,10,0.78,97,296,131,94,3


In [33]:
data.shape

(145, 7)

In [34]:
data.describe()

Unnamed: 0,Patient number,Relative weight,Fasting plasma glucose,Glucose area,Insulin area,SSPG,Clinical Classification
count,145.0,145.0,145.0,145.0,145.0,145.0,145.0
mean,73.0,0.97731,121.717241,542.8,185.455172,183.117241,2.296552
std,42.001984,0.129235,63.723982,315.993354,121.343405,105.348133,0.817552
min,1.0,0.71,70.0,269.0,10.0,29.0,1.0
25%,37.0,0.88,90.0,352.0,118.0,100.0,2.0
50%,73.0,0.98,97.0,413.0,155.0,159.0,3.0
75%,109.0,1.08,112.0,558.0,221.0,257.0,3.0
max,145.0,1.2,353.0,1568.0,748.0,480.0,3.0


## Importamos las librerias:

In [35]:
import numpy as np

def patch_asscalar(a):
    return a.item()
            
setattr(np, "asscalar", patch_asscalar)

import kmapper as km
import sklearn

## Limpiamos columnas

Hacemos un arreglo para extraer las características relevantes.

In [36]:
feature_names=[c for c in data.columns if c not in ["Patient number","Clinical Classification"]]

In [37]:
feature_names

['Relative weight',
 'Fasting plasma glucose',
 'Glucose area',
 'Insulin area',
 'SSPG']

Guardamos las características en un nuevo arreglo:

In [38]:
X = np.array(data[feature_names].fillna(0))

In [39]:
X.shape

(145, 5)

También guardamos la clasficación de cada paciente:

In [40]:
y=np.array(data["Clinical Classification"])

In [41]:
y

array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 2, 2, 3, 2, 2,
       3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 2, 3, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int64)

## Iniciamos el algoritmo

Usaremos como filtro un kernel de densidad

In [43]:
from sklearn.neighbors import KernelDensity

mapper=km.KeplerMapper(verbose=1)
lens = mapper.fit_transform(X, projection=KernelDensity(kernel='gaussian',bandwidth=0.9))

graph = mapper.map(
    lens,
    X,
    clusterer=sklearn.cluster.DBSCAN(eps=350, min_samples=3),
    cover=km.Cover(n_cubes=3, perc_overlap=0.3),
)

mapper.visualize(graph, path_html="Diabetes KernelDensity.html",custom_tooltips=y,
                 title="Diabetes KernelDensity")

KeplerMapper(verbose=1)
..Composing projection pipeline of length 1:
	Projections: KernelDensity(bandwidth=0.9)
	Distance matrices: False
	Scalers: MinMaxScaler()
..Projecting on data shaped (145, 5)

..Projecting data using: 
	KernelDensity(bandwidth=0.9)


..Scaling with: MinMaxScaler()

Mapping on data shaped (145, 5) using lens shaped (145, 5)

Creating 243 hypercubes.

Created 100 edges and 27 nodes in 0:00:00.106514.
Wrote visualization to: Diabetes KernelDensity.html


'<!DOCTYPE html>\n<html>\n\n<head>\n  <meta charset="utf-8">\n  <meta name="generator" content="KeplerMapper">\n  <title>Diabetes KernelDensity | KeplerMapper</title>\n\n  <link rel="icon" type="image/png" href="http://i.imgur.com/axOG6GJ.jpg" />\n\n  <link href=\'https://fonts.googleapis.com/css?family=Roboto+Mono:700,300\' rel=\'stylesheet\' type=\'text/css\'>\n  <style>* {\n  margin: 0;\n  padding: 0;\n}\n\nhtml, body {\n  height: 100%;\n}\n\nbody {\n  font-family: "Roboto Mono", "Helvetica", sans-serif;\n  font-size: 14px;\n}\n\n#logo {\n  width:  85px;\n  height: 85px;\n}\n\n#display {\n  color: #95A5A6;\n  background: #212121;\n}\n\n#header {\n  background: #111111;\n}\n\n#print {\n  color: #000;\n  background: #FFF;\n}\n\nh1 {\n  font-size: 21px;\n  font-weight: 300;\n  font-weight: 300;\n}\n\nh2 {\n  font-size: 18px;\n  padding-bottom: 20px;\n  font-weight: 300;\n}\n\nh3 {\n  font-size: 14px;\n  font-weight: 700;\n  text-transform: uppercase;\n}\n\nh4 {\n  font-size: 13px;\n  f

In [52]:
lens = mapper.fit_transform(X)

graph = mapper.map(
    lens,
    X,
    clusterer=sklearn.cluster.DBSCAN(eps=40, min_samples=3),
    cover=km.Cover(n_cubes=15, perc_overlap=0.4),
)

mapper.visualize(graph, path_html="Diabetes Sum.html",
                 custom_tooltips=y,
                title="Diabetes Sum")

..Composing projection pipeline of length 1:
	Projections: sum
	Distance matrices: False
	Scalers: MinMaxScaler()
..Projecting on data shaped (145, 5)

..Projecting data using: sum

..Scaling with: MinMaxScaler()

Mapping on data shaped (145, 5) using lens shaped (145, 1)

Creating 15 hypercubes.

Created 4 edges and 10 nodes in 0:00:00.057509.
Wrote visualization to: Diabetes Sum.html


'<!DOCTYPE html>\n<html>\n\n<head>\n  <meta charset="utf-8">\n  <meta name="generator" content="KeplerMapper">\n  <title>Diabetes Sum | KeplerMapper</title>\n\n  <link rel="icon" type="image/png" href="http://i.imgur.com/axOG6GJ.jpg" />\n\n  <link href=\'https://fonts.googleapis.com/css?family=Roboto+Mono:700,300\' rel=\'stylesheet\' type=\'text/css\'>\n  <style>* {\n  margin: 0;\n  padding: 0;\n}\n\nhtml, body {\n  height: 100%;\n}\n\nbody {\n  font-family: "Roboto Mono", "Helvetica", sans-serif;\n  font-size: 14px;\n}\n\n#logo {\n  width:  85px;\n  height: 85px;\n}\n\n#display {\n  color: #95A5A6;\n  background: #212121;\n}\n\n#header {\n  background: #111111;\n}\n\n#print {\n  color: #000;\n  background: #FFF;\n}\n\nh1 {\n  font-size: 21px;\n  font-weight: 300;\n  font-weight: 300;\n}\n\nh2 {\n  font-size: 18px;\n  padding-bottom: 20px;\n  font-weight: 300;\n}\n\nh3 {\n  font-size: 14px;\n  font-weight: 700;\n  text-transform: uppercase;\n}\n\nh4 {\n  font-size: 13px;\n  font-weight