# Group 9: Provisional Neural Network Model
## Week 1

**COVID-19 Model**
1. Supervised Regresson to predict CoVID new_cases and new_deaths
2. (Potential second model) Unsupervised to determine predictive factors for number of cases and deaths.  

**Potential Limitations:**


Data Source: Our World in Data
https://github.com/owid/covid-19-data/tree/master/public/data



In [17]:
# Import  Dependencies
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,OneHotEncoder
import pandas as pd
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

from datetime import datetime
from sqlalchemy import create_engine

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# import psycopg2


# import seaborn as sns (pip install)  -- https://www.tensorflow.org/tutorials/keras/regression


### Connect to Provisional pgAdmin Database

In [18]:
#Create Connection String to SQL 
password = "BoomerSooner2!"

db_string = f"postgres://postgres:{password}@127.0.0.1:5432/Final_Project"
## Make above a config.py for password

engine = create_engine(db_string)
    

# Connect to PostgreSQL server

dbConnection = engine.connect();

# Read data from PostgreSQL database table and load into a DataFrame instance

usa_covid_sql_df = pd.read_sql("select * from \"usa_covid\"", dbConnection)

pd.set_option('display.expand_frame_repr', False)


# Close the database connection

dbConnection.close();

# Proof of Connection:
usa_covid_sql_df.head()



#___________________________________________________________________


# Import Dataset
covid_data_raw_df = pd.read_csv('owid-covid-data.csv')





## 1. Preprocess Data

•	Provisionally, select desired columns

•	Clean data set


In [19]:
# Scale down data set to minimal data for model proof-of-concept (poc)
covid_data_poc_df = covid_data_raw_df[["location", "date", "new_cases", "new_deaths", "population"]]

# For proof-of-concept, focus on 1 country
covid_data_poc_df = covid_data_poc_df[covid_data_poc_df["location"] == "United States"]

covid_data_poc_df.head()

Unnamed: 0,location,date,new_cases,new_deaths,population
71614,United States,2020-01-22,,,331002647.0
71615,United States,2020-01-23,0.0,,331002647.0
71616,United States,2020-01-24,1.0,,331002647.0
71617,United States,2020-01-25,0.0,,331002647.0
71618,United States,2020-01-26,3.0,,331002647.0


In [20]:
# covid_data_poc_df shape
print('shape of array :', covid_data_poc_df.shape)

shape of array : (421, 5)


In [21]:
# Convert NAN to 0
covid_data_poc_df = covid_data_poc_df.fillna(0)
covid_data_poc_df.head()

Unnamed: 0,location,date,new_cases,new_deaths,population
71614,United States,2020-01-22,0.0,0.0,331002647.0
71615,United States,2020-01-23,0.0,0.0,331002647.0
71616,United States,2020-01-24,1.0,0.0,331002647.0
71617,United States,2020-01-25,0.0,0.0,331002647.0
71618,United States,2020-01-26,3.0,0.0,331002647.0


In [22]:
covid_data_poc_df.dtypes

location       object
date           object
new_cases     float64
new_deaths    float64
population    float64
dtype: object

In [23]:
# Convert date column to dtype datetime
covid_data_poc_df['date'] = pd.to_datetime(covid_data_poc_df['date'])
covid_data_poc_df.dtypes






location              object
date          datetime64[ns]
new_cases            float64
new_deaths           float64
population           float64
dtype: object

### Note to Graders:  We are attempting a model type not covered in our modules.  

### Rather than attempting to solve a classification problem, we are incorporating a supervised regression ML model.  This has proved to have some unforeseen difficulties.  

### Deep learning methods are trained using supervised learning and expect data in the form of samples with inputs and outputs.

### We plan to use an ARIMA approach (below) to predict both CoVID deaths and cases over time.  

### We are currently working through some issues in order to complete this model.  



Begin model with a **single variable** to verify proof-of-concept

In [24]:
df1 = covid_data_poc_df

def convert2matrix(data_arr, look_back):
   X, Y =[], []
   for i in range(len(data_arr)-look_back):
       d=i+look_back  
   X.append(data_arr[i:d,])
   Y.append(data_arr[d,])
   return np.array(X), np.array(Y)

In [25]:
train_size = 200
train,test = df1.values[0:train_size,:], df1.values[train_size:len(df1.values),:]
look_back = 30 #create window size as look_back=30
test = np.append(test,np.repeat(test[-1,], look_back))
train = np.append(train,np.repeat(train[-1,],look_back))
trainX,trainY =convert2matrix(train,look_back)
testX,testY =convert2matrix(test, look_back)
# reshape input to be [samples, window size, features]
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

In [30]:
trainX

array([[[1075.0, 331002647.0, 331002647.0, 331002647.0, 331002647.0,
         331002647.0, 331002647.0, 331002647.0, 331002647.0,
         331002647.0, 331002647.0, 331002647.0, 331002647.0,
         331002647.0, 331002647.0, 331002647.0, 331002647.0,
         331002647.0, 331002647.0, 331002647.0, 331002647.0,
         331002647.0, 331002647.0, 331002647.0, 331002647.0,
         331002647.0, 331002647.0, 331002647.0, 331002647.0,
         331002647.0]]], dtype=object)

In [27]:
np.array(trainX, dtype=np.float)

array([[[1.07500000e+03, 3.31002647e+08, 3.31002647e+08, 3.31002647e+08,
         3.31002647e+08, 3.31002647e+08, 3.31002647e+08, 3.31002647e+08,
         3.31002647e+08, 3.31002647e+08, 3.31002647e+08, 3.31002647e+08,
         3.31002647e+08, 3.31002647e+08, 3.31002647e+08, 3.31002647e+08,
         3.31002647e+08, 3.31002647e+08, 3.31002647e+08, 3.31002647e+08,
         3.31002647e+08, 3.31002647e+08, 3.31002647e+08, 3.31002647e+08,
         3.31002647e+08, 3.31002647e+08, 3.31002647e+08, 3.31002647e+08,
         3.31002647e+08, 3.31002647e+08]]])

In [28]:
# from keras.models import Sequential
# from keras.layers import Dense, SimpleRNN
# from keras.callbacks import EarlyStopping

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, SimpleRNN
from tensorflow.keras.callbacks import EarlyStopping


def model_rnn(look_back):
  model=Sequential()
  model.add(SimpleRNN(units=32, input_shape=(1,look_back), activation="relu"))
  model.add(Dense(8, activation='relu'))
  model.add(Dense(1))
  model.compile(loss='mean_squared_error',  optimizer='adam',metrics = ['mse', 'mae'])
  return model

In [29]:
model=model_rnn(look_back)


history=model.fit(trainX,trainY, epochs=100, batch_size=30, verbose=1, validation_data=(testX,testY),callbacks=[EarlyStopping(monitor='val_loss', patience=10)],shuffle=False)

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

#### Notes: PREPROCESS DATE SERIES DATA FOR SUPERVISED LEARNING MODEL 
___________________________________________________________________________

!!!!https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/!!!!!

 Deep learning methods are trained using supervised learning and expect data in the form of samples with inputs and outputs. 
Time series are long sequences of numbers.
 ????How to transform a time series into a form suitable for supervised learning????



 https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/



https://towardsdatascience.com/preprocessing-time-series-data-for-supervised-learning-2e27493f44ae
    



https://towardsdatascience.com/a-quick-deep-learning-recipe-time-series-forecasting-with-keras-in-python-f759923ba64

In [None]:
# Split preprocessed data into feature(s) and target
X = covid_data_poc_df['date']
y = covid_data_poc_df['new_deaths']


# Plot new-deaths data
plt.scatter(X, y)
plt.plot (X, y)
plt.show()

In [None]:
# Split the preprocessed data into a training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=78)


print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

## 3. Normalize/Standardize Numerical Features

**Using StandardScaler**

### No need to Normalize for a single feature proof-of-concept model

In [None]:
# Reshape!!!
X_train = X_train.to_numpy(dtype=None, copy=False).reshape(-1, 1)
X_test = X_test.to_numpy(dtype=None, copy=False).reshape(-1, 1)


# Create standard scaler instance
X_scaler = StandardScaler()

# Fit the scaler
X_scaler.fit(X_train)

# Scale both the training and testing data
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)



## 4. Build the Model

a. **Pick Model**: Deep Neural Net, Sequential



f. **Compile the Model and Define Loss / Accuracy Metrics**: Use  model as a binary classifier, use the **binary_crossentropy loss function**, **adam optimizer**, and **accuracy metrics** --same parameters used for our basic neural network. 

In [None]:
# Define the model - deep neural net
number_input_features = 1
hidden_nodes_layer1 =  80
hidden_nodes_layer2 = 30

nn_provisional = tf.keras.models.Sequential()

# First hidden layer
nn_provisional.add(
    tf.keras.layers.Dense(units=hidden_nodes_layer1, input_dim=number_input_features, activation="relu")
)

# c.Second hidden layer
nn_provisional.add(tf.keras.layers.Dense(units=hidden_nodes_layer2, activation="relu"))

# Output layer
nn_provisional.add(tf.keras.layers.Dense(units=1, activation="sigmoid"))

# Check the structure of the model
nn_provisional.summary()

In [None]:
# f. Compile the model
nn_provisional.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])



## 5. Fit/Train the Model

a. Train model on training data

b. Evaluate model using test data (Loss and Accuracy.  

In [None]:
# a Train the model on training data
fit_model = nn_provisional.fit(X_train,y_train,epochs=10)

In [None]:
# b. Evaluate the model using the test data
model_loss, model_accuracy = nn_provisional.evaluate(X_test,y_test,verbose=2)
print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

In [None]:
# visualize model's loss over the full 100 epochs

# Create a DataFrame containing training history
history_df = pd.DataFrame(fit_model.history, index=range(1,len(fit_model.history["loss"])+1))

# Plot the loss
history_df.plot(y="loss")

In [None]:
# Plot model's accuracy over all epochs
history_df.plot(y="accuracy")