# Aprendizaje Supervisado de un Conjunto de Datos del Clima

En este notebook se muestra un benchmark de distintos metodos de aprendizaje supervisado, en este caso es un conjunto de datos climatico en Australia.

Este set de datos puede ser descargado del siguiente link:

https://www.kaggle.com/jsphyg/weather-dataset-rattle-package

Se iniciara por cargar las bibliotecas necesarias.

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn import preprocessing
%matplotlib inline

## Preprocesamiento

En esta seccion se realiza el pre-procesamiento sobre el conjunto de datos, este proviente de la investigacion anterior y por lo tanto no es explicara el racionamiento de este pre-procesamiento

In [None]:
weather_results = pd.read_csv("weatherAUS.csv")
weather_results.replace(to_replace='Yes', value = 1, inplace = True)
weather_results.replace(to_replace='No',  value = 0, inplace = True)

directions = ["N","NNE","NE","ENE","E","ESE", "SE", "SSE","S","SSW","SW","WSW","W","WNW","NW","NNW"]

def compassToDeg(compass_direction):
    global directions
    index = directions.index(compass_direction)
    angle = index * 22.5
    return angle

def windGustDirConvert(direction):
    global directions
    if direction['WindGustDir'] in directions:
        return compassToDeg(direction['WindGustDir'])

def windDir9amConvert(direction):
    global directions
    if direction['WindDir9am'] in directions:
        return compassToDeg(direction['WindDir9am'])

def windDir3pmConvert(direction):
    global directions
    if direction['WindDir3pm'] in directions:
        return compassToDeg(direction['WindDir3pm'])

windGustDirAngle = weather_results.filter(regex=r'WindGustDir').apply(windGustDirConvert, axis=1)
windDir9amAngle = weather_results.filter(regex=r'WindDir9am').apply(windDir9amConvert, axis=1)
windDir3pmAngle = weather_results.filter(regex=r'WindDir3pm').apply(windDir3pmConvert, axis=1)

windGustDirAngleCos = windGustDirAngle.apply(np.cos)
windGustDirAngleSin = windGustDirAngle.apply(np.sin)
windDir9amAngleCos = windDir9amAngle.apply(np.cos)
windDir9amAngleSin = windDir9amAngle.apply(np.sin)
windDir3pmAngleCos = windDir3pmAngle.apply(np.cos)
windDir3pmAngleSin = windDir3pmAngle.apply(np.sin)

del weather_results['WindGustDir']
del weather_results['WindDir9am']
del weather_results['WindDir3pm']

weather_results['windGustDirAngleCos'] = windGustDirAngleCos
weather_results['windGustDirAngleSin'] = windGustDirAngleSin
weather_results['windDir9amAngleCos'] = windDir9amAngleCos
weather_results['windDir9amAngleSin'] = windDir9amAngleSin
weather_results['windDir3pmAngleCos'] = windDir3pmAngleCos
weather_results['windDir3pmAngleSin'] = windDir3pmAngleSin

weather_results.fillna(weather_results.mean(), inplace = True)

weather_results.sample(10)

La normalizacion se realizara segun el tipo de dato, es decir toda la temperatura se normalizara segun el maximo global de todos los datos de temperatura, igual para la velocidad del viento y otros datos.

In [None]:
# Normalize all temperature columns
temp_cols = weather_results.filter(regex=r'Temp')
temp_cols = ((temp_cols-temp_cols.min())/(temp_cols.max()-temp_cols.min()))

weather_results['MinTemp'] = temp_cols['MinTemp']
weather_results['MaxTemp'] = temp_cols['MaxTemp']
weather_results['Temp9am'] = temp_cols['Temp9am']
weather_results['Temp3pm'] = temp_cols['Temp3pm']

# Normalize all speed columns
speed_cols = weather_results.filter(regex=r'Speed')
speed_cols = ((speed_cols-speed_cols.min())/(speed_cols.max()-speed_cols.min()))

weather_results['WindGustSpeed'] = speed_cols['WindGustSpeed']
weather_results['WindSpeed9am'] = speed_cols['WindSpeed9am']
weather_results['WindSpeed3pm'] = speed_cols['WindSpeed3pm']


# Normalize all preassure columns
preassure_cols = weather_results.filter(regex=r'Pressure')
preassure_cols = ((preassure_cols-preassure_cols.min())/(preassure_cols.max()-preassure_cols.min()))


weather_results['Pressure9am'] = preassure_cols['Pressure9am']
weather_results['Pressure3pm'] = preassure_cols['Pressure3pm']

# Normalize all humidity columns
humidity_cols = weather_results.filter(regex=r'Humidity')
humidity_cols = ((humidity_cols-humidity_cols.min())/(humidity_cols.max()-humidity_cols.min()))

weather_results['Humidity9am'] = humidity_cols['Humidity9am']
weather_results['Humidity3pm'] = humidity_cols['Humidity3pm']

# Normalize all cloud columns
cloud_cols = weather_results.filter(regex=r'Cloud')
cloud_cols = ((cloud_cols-cloud_cols.min())/(cloud_cols.max()-cloud_cols.min()))

weather_results['Cloud9am'] = cloud_cols['Cloud9am']
weather_results['Cloud3pm'] = cloud_cols['Cloud3pm']

# Normalize remaining columns individually
cols_to_norm = ['Rainfall','Evaporation',
                'Evaporation','Evaporation',
                'Sunshine', 'Cloud9am','Cloud3pm',
                'RainToday','RISK_MM', 'RainTomorrow']
weather_results[cols_to_norm] = weather_results[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

weather_results.sample(10)

## Metodos Supervisados

El set de datos consiste de distintas mediciones de datos meteorologicos ademas de informacion sobre la presencia o no de lluvia en el dia siguiente.

Este problema puede ser modelado como un problema de clasificacion donde lo que se busca es clasificar las distinas mediciones segun la cantidad de lluvia

In [None]:
weather_results.sample(10)

% Class Methods
### Regresion Linear? (Clasificacion) o SVM (Regresion)
### kNN
### Naive Bayes
### Regresion Decision Tress
### Random Forest 
### Kernel SVM


% New Methods
### Stochastic Gradient Descent?
### Gaussian Process Regression/Classification
### Cross Decomposition?
### AdaBoost
### Forest of randomized trees
### Isotonic Regression

## Comparacion de Resultados