# Should You Play Golf Today?

Suppose we wish to build a simple linear model to help you decide whether to play golf today given the weather conditions.

<img src="golfing.jpg " width ="300" />

## Import Data

Begin by reading in data from the file called "PlayGolfData.csv" using the [read_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) function.

In [1]:
# Importing Library
import pandas as pd

# Setting the variable for the datq
play_data_df = pd.read_csv("PlayGolfData.csv")
play_data_df

Unnamed: 0,ID_Num,Outlook,Temperature,Humidity,Windy,Play
0,1,sunny,85,85,False,no
1,2,sunny,80,90,True,no
2,3,overcast,83,86,False,yes
3,4,rainy,70,96,False,yes
4,5,rainy,68,80,False,yes
5,6,rainy,65,70,True,no
6,7,overcast,64,65,True,yes
7,8,sunny,72,95,False,no
8,9,sunny,69,70,False,yes
9,10,rainy,75,80,False,yes


## Data Preparation

Let's start by performing categorical feature encoding for the "outlook, windy, play" features using one-hot encoding.
We will use the [pandas.get_dummies()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html) method to do this.  Set the drop_first parameter to true to ensure the encoded binary features are not correlated.  Your goal here is to get the final feature matrix and target vector - call these "X" and "y", respectively.

In [3]:
# It takes every value that the categorical feature may assume, 
# and esstainly makes a new feature showing whether the outlook was either windy, rainy, or sunny for each room and presents 0s and 1s on whether or not that was true or false 
# This essentially is an example of one-hot encoding
# The drop_first function drops the first value for all of the options on the columns
X = pd.get_dummies(play_data_df, columns=['Outlook', 'Windy', 'Play'], drop_first = True)
X

Unnamed: 0,ID_Num,Temperature,Humidity,Outlook_rainy,Outlook_sunny,Windy_True,Play_yes
0,1,85,85,0,1,0,0
1,2,80,90,0,1,1,0
2,3,83,86,0,0,0,1
3,4,70,96,1,0,0,1
4,5,68,80,1,0,0,1
5,6,65,70,1,0,1,0
6,7,64,65,0,0,1,1
7,8,72,95,0,1,0,0
8,9,69,70,0,1,0,1
9,10,75,80,1,0,0,1


Create a DataFrame called `y` that contains the labels.

In [4]:
# Creating datafram with play_yes column
y = X[['Play_yes']]

# Outputting dataframe 
y

Unnamed: 0,Play_yes
0,0
1,0
2,1
3,1
4,1
5,0
6,1
7,0
8,1
9,1


Drop the `ID_Num` attribute, since it offers no information, and the `Play_yes` attribute because we can not use our labels as one of the features when training the model!

In [5]:
# Dropping both the Id and play_yess attribute
X = X.drop(['Play_yes'], axis = 1);
X = X.drop(['ID_Num'], axis = 1);
X

Unnamed: 0,Temperature,Humidity,Outlook_rainy,Outlook_sunny,Windy_True
0,85,85,0,1,0
1,80,90,0,1,1
2,83,86,0,0,0
3,70,96,1,0,0
4,68,80,1,0,0
5,65,70,1,0,1
6,64,65,0,0,1
7,72,95,0,1,0
8,69,70,0,1,0
9,75,80,1,0,0


## Train a Linear Regression Model

From the sklearn.linear_model library, import the LinearRegression class and fit it to your feature matrix, `X`, and your labels, `y`.

In [11]:
# Importing numpy to obatin linear Regression Class/Library
import numpy as np
from sklearn.linear_model import LinearRegression

# Instition of object reg, from class LinearRegression
reg = LinearRegression()

# Trained model
reg.fit(X, y)

# Outputting linear Regression Model
reg

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Display the model coefficients:

In [7]:
# Displays linear Regression model coefficients
reg.coef_

array([[-0.01434714, -0.01416828, -0.45334004, -0.58135391, -0.41075329]])

Display the model intercept:

In [8]:
# Displays linear Regression model intercept
reg.intercept_

array([3.40070674])

## Evaluate on the Training Data

Use your model to make a prediction on the training set:

In [9]:
# Predict funcation which makes a predicition on training set 
reg.predict(X)

array([[ 0.3955416 ],
       [-0.01431738],
       [ 0.99142152],
       [ 0.58291153],
       [ 0.83829833],
       [ 0.6122693 ],
       [ 1.15079789],
       [ 0.44037165],
       [ 0.83762014],
       [ 0.73786832],
       [ 0.34078399],
       [ 0.68181369],
       [ 1.17596691],
       [ 0.22865251]])

## Quick Function Links

Function Name | Computation 

- [copy()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.copy.html)  | Copies the object
- [reshape()](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html#numpy.reshape) | Changes the shape of an array
- [itertuples()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.itertuples.html) | Iterates over the rows of DataFrame
- [assign()]() | Assigns a new column to a DataFrame

In [15]:
#numpy was imported as np that is why np is used in this code segment 

# make copy of y data frame
y_outputs = y.copy()

# The following is a set comprehension
# itertuples allows you to iterate over DataFrame rows as namedtuples.
reg_outputs = [float(reg.predict(np.reshape(row, (1, -1)))) for row in X.itertuples(index=False)]
# List of ones if regressiong output is greater than .5
predicted = np.array([1 if reg_output >= 0.5 else 0 for reg_output in reg_outputs])
y_outputs = y_outputs.assign(regression_predicted=reg_outputs)
y_outputs = y_outputs.assign(predicted=predicted)
y_outputs

Unnamed: 0,Play_yes,regression_predicted,predicted
0,0,0.395542,0
1,0,-0.014317,0
2,1,0.991422,1
3,1,0.582912,1
4,1,0.838298,1
5,0,0.612269,1
6,1,1.150798,1
7,0,0.440372,0
8,1,0.83762,1
9,1,0.737868,1


Eliminate the `regression_predicted` column from the `y_outputs` column:

In [16]:
# make copy of y data frame
y_outputs = y.copy()
y_outputs = y_outputs.assign(predicted=predicted)
y_outputs

Unnamed: 0,Play_yes,predicted
0,0,0
1,0,0
2,1,1
3,1,1
4,1,1
5,0,1
6,1,1
7,0,0
8,1,1
9,1,1


Calculate the accuracy score using the [accuracy_score()]() function and the true and predicted values contained in the `y_outputs` DataFrame:

In [21]:
# Importation of sklearn metrics
from sklearn.metrics import accuracy_score

# Outputting accuracy score 
print("accuracy score: ",accuracy_score(y_outputs['Play_yes'], y_outputs['predicted']))

accuracy score:  0.8571428571428571
