# Intro to Machine Learning

This notebook will introduce Machine Learning. After introducing the concept, we will propose a problem to solve using machine learning concepts, implement a simple approach to solving the problem, expand the complexity of our solution while giving you the option to select the data your solution can access. We will finally explore the concept of Neural Networks, after which we will open up to quesitons.

## Retreiving the Data
The question we will try to answer with machine learning is: What will the temperature be tomorrow? The weather data we will have access to is from [weather data](https://towardsdatascience.com/obtain-historical-weather-forecast-data-in-csv-format-using-python-5a6c090fc828) 

In [None]:
# code here shows a sample of the weather data so the user can see the data
# they are working with
import pandas as pd
import numpy as np
from wwo_hist import retrieve_hist_data
import matplotlib.pyplot as plt
import csv
from sklearn.linear_model import LinearRegression

frequency = 24
start_date = '25-DEC-2018'
end_date = '25-DEC-2019'
api_key = '080bb880d8ba4f43ad5231331211703'
location_list = ['43210']
hist_weather_data = retrieve_hist_data(api_key,
                                location_list,
                                start_date,
                                end_date,
                                frequency,
                                location_label = False,
                                export_csv = True,
                                store_df = True)[0]


## Simple Linear Regression
Implement a simple linear regression of 1 variable which will serve as a model to predict the temperature tomorrow. The variable within the domain of the linear regression will be today's temperature, which it will use to predict tomorrow's. 

Weather data will only be from the past year (365 days)
Expected outcome is that the linear regression will learn: 
Tomorrow's Temp = 1 * Today's Temp + 0 degrees

It makes sense to expect the outcome of Tomorrow's Temp = 1 * Today's Temp + 0

A simple linear regression using the temperature on each day to create a model for predicting temperature on a day.

In [None]:
# code to set up the 1 variable simple linear regression

today_temp = list(map(int, hist_weather_data["tempC"].tolist()))
tomorrow_temp = list(map(int, hist_weather_data["tempC"].tolist()))

tomorrow_temp.pop(0)
today_temp.pop()

x = np.array(today_temp).reshape(-1,1)
y = tomorrow_temp

reg = LinearRegression().fit(x, y)

# Points on the regression line
xplt = np.array([min(x) - 2, max(x) + 2])          
yplt = reg.predict(xplt)

# Prints regression equation
coefficients = reg.coef_
coefficient = coefficients[0]
intercept = reg.intercept_
correlation = reg.score(x,y)
print('slope is %f' %(coefficient))
print('intercept is %f' %(intercept))
print('correlation is %f' %(correlation))
print('y-hat = %fx + %f explains about %f%% of variation' %(coefficient, intercept, correlation*100))



## Testing
This will calculate the accuracy of the model by comparing today's temperature to tomorrow's temperature and subtracting that from 1 to get the r squared value. The r squared tells us how much error our model has compared to the real data. 

In [None]:
r_squared=reg.score(x, y)
print('R^2 value is %f' %(r_squared))

## Graph
Plot the regression with today's tempature versus tomorrows's temperature

In [None]:
plt.plot(x,y,'o')                    # Plot the data points
plt.plot(xplt,yplt,'-',linewidth=3)  # Plot the regression line
plt.xlabel('Today\'s temperature')
plt.ylabel('Tomorrow\'s temperature')
plt.suptitle('Today\'s Temperature vs Tomorrow\'s Temperature in degrees C')
plt.grid(True)
plt.savefig('linear_reg.png')
plt.show()

In [None]:
xplt = np.array(today_temp).reshape(-1,1)
yplt = reg.predict(xplt)
day = range(365)
dayplt = range(len(yplt))

plt.plot(day,y,'o')                    # Plot the data points
plt.plot(dayplt,yplt,'-',linewidth=3)  # Plot the regression line
plt.xlabel('Day')
plt.ylabel('Temperature')
plt.suptitle('Predicting Temperature in Degrees C')
plt.grid(True)
plt.savefig('daily_linear_reg.png')
plt.show()

## Multiple Linear Regression
Mutiple Linear regression of 2 variables to improve predicting the weather (today's temp and another variable which may be selected) to produce tomorrow's temp

In [None]:
# independent variables
numVar = 3
var1 = list(map(int, hist_weather_data["tempC"].tolist()))
var1.pop()
var2 = list(map(int, hist_weather_data["pressure"].tolist()))
var2.pop()
var3 = list(map(int, hist_weather_data["humidity"].tolist()))

# tomorrow temp
tomorrow_temp = list(map(int, hist_weather_data["tempC"].tolist()))
tomorrow_temp.pop(0)

# tranform variable
sqVar1 = [x*x for x in var1]
reciprocalVar1 = np.reciprocal(var1)
log10Var1 = np.log10(var1)
lnVar1 = np.log(var1)

# create x and y arrays
x1 = np.array(var1)
x2 = np.array(var2)
x3 = np.array(var3)
y = tomorrow_temp

# # Points on the regression line
x = []       
for i in range(len(x1)):
    x.append([x1[i],x2[i],x3[i]])
xplt = np.array(x)
reg = LinearRegression().fit(xplt, y)

yplt = reg.predict(xplt)

day = range(365)
dayplt = range(len(yplt))

#Prints information
coefficients = reg.coef_
intercept = reg.intercept_
correlation = reg.score(xplt,y)
for j in range(numVar):
    print('slope of x%d is %f' %(j,coefficients[j]))
print('intercept is %f' %(intercept))
print('correlation is %f' %(correlation))
print('y-hat explains about %f%% of variation' %(correlation*100))

Plot the regression line and the data

In [None]:
plt.plot(day,y,'o')                    # Plot the data points
plt.plot(dayplt,yplt,'-',linewidth=3)  # Plot the regression line
plt.xlabel('Day')
plt.ylabel('Temperature')
plt.suptitle('Predicting Tomorrow\'s Temperature in Degrees C')
plt.grid(True)
plt.savefig('multi_linear_reg.png')
plt.show()

In this section you may select as much data as you want ot be in the domain of the linear regression. This will serve to demonstrate overtraining and undertraining as well as how some data will be more beniefical to solving a problem while other data is completely independent from the solution.

In [None]:
# code here sets up a linear regression of n variables