# **Clasificador Bayesiano con Dataset para Predicción de Cáncer de Mama**

Conjunto de datos de Wisconsin (diagnóstico) sobre el cáncer de mama. Link del Dataset en Kaggle:

https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

# **Preparación del entorno**

In [17]:
import numpy as np
import pandas as pd

from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

In [23]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# **Lectura y visualización de los datos**

Previamente se debe descargar el dataset del link proporcionado y subirlo a Google Colab.

In [24]:
data_frame = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Clasificador Bayesiano/data.csv", header = 0)

In [25]:
data_frame.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


# **Selección de las características**

**Selección de las características que se usará como entradas (X) y la salida esperada (y).**

In [None]:
X = data_frame[['radius_mean', 'area_mean', 'perimeter_mean', 'concavity_mean', 'concave points_mean']]
y = data_frame['diagnosis']

**Separación del dataset: 70% entrenamiento (Train), 30% prueba (Test)**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

**Visualización como quedaron los datasets para entrenamiento y prueba.**

In [None]:
print(X_train, y_train)
print(X_test, y_test)

     radius_mean  area_mean  perimeter_mean  concavity_mean  \
96         12.18      451.1           77.79         0.02490   
462        14.40      646.1           92.25         0.03476   
436        12.87      509.2           82.67         0.01797   
95         20.26     1264.0          132.40         0.14650   
445        11.99      441.3           77.61         0.05441   
..           ...        ...             ...             ...   
345        10.26      321.6           66.20         0.03581   
344        11.71      420.3           75.03         0.04006   
34         16.13      807.2          107.00         0.13540   
184        15.28      710.6           98.92         0.05375   
159        10.90      366.8           68.69         0.00309   

     concave points_mean  
96              0.029410  
462             0.017370  
436             0.020900  
95              0.086830  
445             0.042740  
..                   ...  
345             0.020370  
344             0.032500  


# **Creación y entrenamiento del Clasificador Bayesiano**

In [None]:
naive_bayes = GaussianNB().fit(X_train, y_train)

# **Predección sobre el dataset de entrenamiento y prueba**

In [None]:
print("Predicciones con datos de entrenamiento")
print(naive_bayes.predict(X_train))
print("Salidas de entrenamiento")
print(y_train.values)
print("Predicciones con datos de prueba")
print(naive_bayes.predict(X_test))
print("Salidas de prueba")
print(y_test.values)

Predicciones con datos de entrenamiento
['B' 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'M' 'B' 'B' 'M' 'M' 'B'
 'B' 'B' 'M' 'B' 'B' 'B' 'M' 'M' 'B' 'B' 'B' 'B' 'M' 'B' 'M' 'M' 'M' 'B'
 'M' 'M' 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'B' 'M' 'M' 'B' 'B' 'M' 'B' 'B' 'B'
 'B' 'M' 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'M' 'B' 'B' 'B' 'M' 'B' 'B'
 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'M' 'M' 'M' 'B' 'B' 'B' 'B' 'M'
 'B' 'B' 'B' 'M' 'B' 'M' 'M' 'M' 'B' 'M' 'M' 'M' 'B' 'B' 'B' 'M' 'M' 'M'
 'M' 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'M' 'M' 'M' 'M' 'M' 'B' 'B' 'B' 'B' 'M'
 'M' 'B' 'B' 'M' 'M' 'B' 'B' 'M' 'M' 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B'
 'B' 'B' 'B' 'B' 'B' 'M' 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'M' 'B'
 'B' 'B' 'M' 'B' 'M' 'B' 'B' 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'M' 'M'
 'B' 'M' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'B' 'B' 'M' 'M' 'B' 'B'
 'B' 'B' 'B' 'B' 'B' 'B' 'M' 'M' 'M' 'B' 'M' 'B' 'B' 'B' 'B' 'M' 'M' 'B'
 'B' 'M' 'B' 'B' 'M' 'M' 'B' 'B' 'B' 'M' 'M' 'B' 'B' 'B' 'M' 'M' 'M' 'B'
 'B' 'M' 'B

# **Medición de la exactitud (Accuracy) con el dataset de entrenamiento y prueba**

In [None]:
naive_bayes.score(X_train, y_train)

0.9296482412060302

In [None]:
naive_bayes.score(X_test, y_test)

0.8713450292397661