# SVM  para clasificación de asteroides



**Objetivo:** Clasificación de asteroides detectados por la NASA como peligrosos (Hazardous) y no peligrosos (Not Hazardous)

**Información del dataset**

NeoWs (Near Earth Object Web Service) is a RESTful web service for near earth Asteroid information. With NeoWs a user can: search for Asteroids based on their closest approach date to Earth, lookup a specific Asteroid with its NASA JPL small body id, as well as browse the overall data-set.


https://www.kaggle.com/shrutimehta/nasa-asteroids-classification



**Número de instancias:** 4687

# 1. Acceso a drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# 2. Importando librerías

In [None]:
import ____ as pd
from ____ import train_test_split
from ____ import metrics
import ___ as sns
from ____ import StandardScaler
import _____ as np
from sklearn.____ import SVC
import ____ as plt
import os
import itertools
from _____ import confusion_matrix
from imblearn.under_sampling import RandomUnderSampler
from imblearn.over_sampling import RandomOverSampler

# 3. Lectura del archivo de rasgos del dataset de fracturas

In [None]:
path = ________
file = 'nasa.csv'

In [None]:
nasadf = ______ # Lectura del archivo
_____.head()

# 4. Exploración de los datos y preparación

In [None]:
_____.shape

**Valores que toma la variable dependiente**

In [None]:
clases = ______.iloc[:,-1].unique()
n_clases = len(clases)
print(clases)

**Nombres de las columnas del dataframe**

In [None]:
_____.columns

**Información sobre los tipos de dato en cada columna**

In [None]:
_____.info()

**Eliminando las columnas 'Neo Reference ID', 'Name' y los que sean del tipo object**

In [None]:
nasadf = _______

In [None]:
nasadf

# 5. Escalamiento y codificación

In [None]:
scaler = StandardScaler()
nasadf.loc[:, nasadf.columns != 'Hazardous'] = ______(nasadf.loc[:, nasadf.columns != 'Hazardous'])

# Distribución de clases

In [None]:
f,ax=plt.subplots(figsize=(8,5))
sns.countplot(____['Hazardous'], ax=ax)
ax.set_title('Distribucion de Asteroides peligrosos')
plt.show()

# Undersampling

In [None]:
X = _____.drop(['Hazardous'],axis=1)
y = _____['Hazardous'].values

In [None]:
# estrategia de muestreo
sampling = _______(sampling_strategy=_____)
# ajustar y aplicar el muestreo
X, y = sampling.fit_resample(X, y)

In [None]:
fig = plt.figure(figsize = (8,5))
p = pd.Series(y).value_counts(normalize = False).plot(kind='bar', color= ['hotpink','teal'])
p.set_xticklabels(nasadf['Hazardous'].unique())
plt.title('Asteroides después del muestreo (Dataset balaceado)')
plt.show()

# 6. Validación

In [None]:
seed = 40

In [None]:
X_train, X_test, y_train, y_test = train_test_split(___, ____, test_size=____, random_state=____, shuffle=____)

# 7. Máquina de soporte vectorial para clasificación

**Creación del objeto svm a partir de la clase SVC**

In [None]:
svm =  ____(random_state=seed)

**Entrenamiento**

In [None]:
svm = _____.____(____, _____)

**Score de accuracy de entrenamiento**

In [None]:
____.____(____, ____)

0.966142107773009

**Clasificación de datos de prueba**

In [None]:
y_pred = ___.____(___)

**Accuracy de datos de prueba**

In [None]:
metrics.accuracy_score(_____, ____)

In [None]:
def plot_confusion_matrix(cm, classes, tit, normalize=False):
    if normalize:
        cm = cm.astype('float')/cm.sum(axis=1)
        title, fmt = 'Matriz de confusión normalizada', '.2f'
    else:
        title, fmt = tit, 'd'
    plt.figure(figsize=(10,8))
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title(title)#, fontsize=12)
    plt.colorbar(pad=0.05)
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=40)
    plt.yticks(tick_marks, classes)
    thresh = cm.max()/2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),horizontalalignment="center", 
                 color="white" if cm[i, j] > thresh else "black")
    plt.tight_layout()
    plt.ylabel('Clase Verdadera')
    plt.xlabel('Clase Predicha')
    plt.savefig(title+'.png')
    plt.grid(False)
    plt.show()

In [None]:
cnf_matrix = confusion_matrix(_____, _____, labels=range(n_clases))
tit = 'Matriz de confusión SVM'
plot_confusion_matrix(cnf_matrix,['False','True'], tit, normalize=False)

In [None]:
sensitivity = []
specificity = []
acc=[]
for i,name in enumerate(nasadf.Hazardous.unique()):
  TP = np.sum((y_test==name) & (y_pred==name))
  TN = np.sum((y_test!=name) & (y_pred!=name))
  FP = np.sum((y_test!=name) & (y_pred==name))
  FN = np.sum((y_test==name) & (y_pred!=name))
  sensitivity.append(TP/(TP+FN))
  specificity.append(FP/(TN+FP))
  acc.append(TP/(TP+FP))
sensitivity.append(sum([x*y for x,y in zip(sensitivity,[1/2]*2)]))
specificity.append(sum([x*y for x,y in zip(specificity,[1/2]*2)]))
acc.append(sum([x*y for x,y in zip(acc,[1/2]*2)]))
d = {'Sensitivity':sensitivity, 'Specificity':specificity, 'Accuracy':acc}
ind = list(clases)+['Promedio']
df = pd.DataFrame(d, index=ind)
index = df.index
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.heatmap(df, annot=True)