<a href="https://colab.research.google.com/github/mastanca/75.70_TP1/blob/master/7570_tp1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 75.70 Trabajo Práctico 1

## Imports y seteo de semillas para randoms

Se importan numpy, tensorflow, keras y sklearn. 

Se setean las semillas de randoms de numpy y tensorflow para tener consistencia entre corridas.

Seteamos constante con ubicacion del .csv.

In [0]:
import numpy as np
np.random.seed(91218) # Set np seed for consistent results across runs

import tensorflow as tf
tf.set_random_seed(91218)

from tensorflow import keras
from sklearn.model_selection import train_test_split


#CSVPATH = '/content/Admission_norm.csv'
CSVPATH = '/content/Admission_Predict_Ver1.1.csv'

## Extracción y procesamiento de datos

Se definen las funciones para extracción y procesamiento del set de datos.

Se descarta la columna de IDs.

Creamos función para categorizar los datos de entrada segun la salida que tienen (Para la segunda red).

In [0]:
def categorize_admition(chance):
  if chance < 0.5:
    return 0 #'very_bad'
  elif chance < 0.65:
    return 1 #'bad'
  elif chance < 0.8:
    return 2 #'medium'
  elif chance < 0.9:
    return 3 #'good'
  else:
    return 4 #'very_good'

def process_row(row, categorize=False):
  return [
      float(row[1])/340.0,
      float(row[2])/120.0,
      float(row[3])/5.0,
      float(row[4])/5.0,
      float(row[5])/5.0,
      float(row[6])/10.0,
      float(row[7]),
      categorize_admition(float(row[8])) if categorize else float(row[8])
  ]
  
def extract_data(csvfile, categorize):
  rows = csvfile.read().splitlines()
  
  # Remove headaer and parse rows
  rows = rows[1:]
  data = [row.split(',') for row in rows]
  data = [process_row(row, categorize) for row in data]
  
  return data


## Formas de entrenamiento

Definimos funciones para entrenar. Puede ser utilizando los primeros n elementos del set, utilizando elementos random o los n primeros elementos usando categorias.

In [0]:
def fixed_train(x, y):
  max_x = int(len(x) * 0.67) # 33% used for test
  max_y = int(len(y) * 0.67) # 33% used for test
  return x[:max_x], x[max_x:], y[:max_y], y[max_y:]


def random_train(x, y):
  return train_test_split(x, y, test_size=0.33, random_state=43)


def categorical_train(x, y):
  x_train, x_test, y_train, y_test = fixed_train(x, y)
  y_train = keras.utils.to_categorical(y_train, num_classes=5)
  y_test = keras.utils.to_categorical(y_test, num_classes=5)
  return x_train, x_test, y_train, y_test

## Carga de datos

Se cargan los datos sin categorizar y categorizados.

Se toman las primeras 8 columnas como features y la ultima como output.

In [0]:
csv = open(CSVPATH, 'r')
normal_data = extract_data(csv, False)
categorized_data = extract_data(csv, True)
csv.close()

dataset = np.array(data)

x = dataset[:, 0:7]
y = dataset[:, 7]

## Entrenamiento de la primera red

Se crea la primera red usando datos sin categorizar

In [15]:
#X_train, X_test, y_train, y_test = random_train(x, y)
X_train, X_test, y_train, y_test = fixed_train(x, y)
#X_train, X_test, y_train, y_test = categorical_train(x, y)


model_1 = keras.Sequential([
    keras.layers.Dense(1, input_dim=7, activation='relu'),
    #keras.layers.Dense(50, activation='relu'),
    keras.layers.Dense(5, activation='softmax')
])

model_1.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model_1.fit(X_train, y_train, epochs=5, batch_size=50, shuffle=False)
print(model_1.summary())

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_8 (Dense)              (None, 1)                 8         
_________________________________________________________________
dense_9 (Dense)              (None, 5)                 10        
Total params: 18
Trainable params: 18
Non-trainable params: 0
_________________________________________________________________
None


## Ejecución de la primera red

Ejecutamos la primera red con los datos no categorizados

In [16]:
test_loss, test_acc = model_1.evaluate(X_test, y_test)

print('Test accuracy:', round(test_acc*100, 2), '%')

Test accuracy: 37.58 %


## Entrenamiento de la segunda red

Entrenamos la segunda red usando los datos categorizados

In [22]:
#X_train, X_test, y_train, y_test = random_train(x, y)
#X_train, X_test, y_train, y_test = fixed_train(x, y)
X_train, X_test, y_train, y_test = categorical_train(x, y)


model_2 = keras.Sequential([
    keras.layers.Dense(1, input_dim=7, activation='relu'),
    #keras.layers.Dense(50, activation='relu'),
    keras.layers.Dense(5, activation='softmax')
])

model_2.compile(optimizer='adam', 
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model_2.fit(X_train, y_train, epochs=5, batch_size=50, shuffle=False)
print(model_2.summary())

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_14 (Dense)             (None, 1)                 8         
_________________________________________________________________
dense_15 (Dense)             (None, 5)                 10        
Total params: 18
Trainable params: 18
Non-trainable params: 0
_________________________________________________________________
None


## Ejecución de la segunda red

Ejecutamos la segunda red, usando los datos categorizados

In [23]:
test_loss, test_acc = model_2.evaluate(X_test, y_test)

print('Test accuracy:', round(test_acc*100, 2), '%')

Test accuracy: 37.58 %
