# Creating a Model using a Dataset

This Notebook uses the produced data from the Desease Propogation to generate a Linear Regression model using SciKit-Learn(sklearn).

Reference: [https://scikit-learn.org/stable/tutorial/statistical_inference/supervised_learning.html](https://scikit-learn.org/stable/tutorial/statistical_inference/supervised_learning.html)

## Steps to create training and test samples from a given dataset:
1. Install the Scikit-Learn Python Library
2. Import the dataset from CSV
3. Create training and test samples from the datasets
4. Perform a linear regression
5. Output model Co-efficients for the formula
6. Score the model's ability to describe the dataset

## 1 Install the SciKit-Learn Python Library to create the model:
**[Note: This only needs to be done once and it will require you to "Restart the Kernel"]**

In [None]:
!pip install scikit-learn --user

In [None]:
from sklearn import linear_model

# import the dataset into a pandas DataFrame
import pandas as pd
import numpy as np

## 2. Import the dataset from CSV

In [None]:
# This will be the name of the input file
diseaseData = pd.read_csv("mycsvfile.csv") 

In [None]:
# Imported Dataset View
diseaseData.head()

In [None]:
# Imported Dataset Statistics
diseaseData.describe()

## 3. Create training and test samples from the datasets

In [None]:
# Create training and test datasets where train contains the variables and test the results
train_percentage = .4 #train based on 80% of the dataset

msk = np.random.rand(len(diseaseData)) < train_percentage
print(msk)
train = diseaseData[msk]
test = diseaseData[~msk]

print("Number of Training Samples:",len(train))
print("Number of Test Samples:",len(test))

# Create X and Y datasets for both train and test where 
# X contains the variables and Y the results

#variable to check against
#test_variable = "max_sick"
test_variable = "max_sick_mask_wearers"


train_X = train.drop(columns =[test_variable])
train_Y = train[test_variable]
test_X = test.drop(columns =[test_variable])
test_Y = test[test_variable]



## 4. Perform a linear regression

In [None]:
regr = linear_model.LinearRegression()

In [None]:
regr.fit(train_X, train_Y)

## 5. Output model Co-efficients for the formula

In [None]:
print(regr.coef_) 

## 6. Score the model's ability to describe the dataset

In [None]:
# The mean square error
np.mean((regr.predict(test_X) - test_Y)**2)

In [None]:
# Explained variance score: 1 is perfect prediction
# and 0 means that there is no linear relationship
# between X and y.

regr.score(test_X, test_Y)