# Task for Today  

***

## Visibility Prediction  

Given *data about weather in Szeged, Hungary from 2006-2016*, let's try to predict the **visibility** on a given day at a given time.

We will use linear regression, decision tree regression, and K-nearest neighbors regression to make our predictions. 

# Getting Started

In [None]:
import numpy as np
import pandas as pd

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor

In [None]:
data = pd.read_csv('../input/szeged-weather/weatherHistory.csv')

In [None]:
data

In [None]:
data.info()

# Preprocessing

In [None]:
data

In [None]:
def preprocess_inputs(df):
    df = df.copy()
    
    # Drop Summary, Loud Cover, and Daily Summary columns
    df = df.drop(['Summary', 'Loud Cover', 'Daily Summary'], axis=1)
    
    # Fill missing values in Precip Type column
    df['Precip Type'] = df['Precip Type'].fillna(df['Precip Type'].mode()[0])
    
    # Extract date/time features from Formatted Date column
    df['Formatted Date'] = pd.to_datetime(data['Formatted Date'], format='%Y-%m-%d %H:%M:%S.%f %z')
    
    df['Year'] = df['Formatted Date'].apply(lambda x: x.year)
    df['Month'] = df['Formatted Date'].apply(lambda x: x.month)
    df['Day'] = df['Formatted Date'].apply(lambda x: x.day)
    df['Hour'] = df['Formatted Date'].apply(lambda x: x.hour)
    
    df = df.drop('Formatted Date', axis=1)
    
    # Binary encode Precip Type column
    df['Precip Type'] = df['Precip Type'].apply(lambda x: 1 if x == 'snow' else 0)
    
    # Split df into X and y
    y = df['Visibility (km)'].copy()
    X = df.drop('Visibility (km)', axis=1).copy()
    
    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=123)
    
    # Scale X with a standard scaler
    scaler = StandardScaler()
    scaler.fit(X_train)
    
    X_train = pd.DataFrame(scaler.transform(X_train), columns=X_train.columns)
    X_test = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns)
    
    return X_train, X_test, y_train, y_test

In [None]:
X_train, X_test, y_train, y_test = preprocess_inputs(data)

In [None]:
X_train

In [None]:
y_train

# Training/Results

In [None]:
models = {
    "  Linear Regression": LinearRegression(),
    "      Decision Tree": DecisionTreeRegressor(),
    "K-Nearest Neighbors": KNeighborsRegressor()
}

for name, model in models.items():
    model.fit(X_train, y_train)
    print(name + " trained.")

In [None]:
for name, model in models.items():
    print(name + " R^2: {:.5f}".format(model.score(X_test, y_test)))

# Data Every Day  

This notebook is featured on Data Every Day, a YouTube series where I train models on a new dataset each day.  

***

Check it out!  
https://youtu.be/2O68xxOO5zc