Covid-19 has hit everyone very hard. There is lot of uncertainty when it comes to symptoms of covid and how to treat it. This created shortage of resources at hospitals. Even the best hospitals in New York had suffered from limited resources.

What if we could utilize ML to predict an approximate use of resources next day. This will help the Hospital Management and doctors to plan resources and schedules to take care of patients effectively. Our work falls in same line and tries to solve above stated problem.

**Load Libraries**

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from datetime import datetime, timedelta
from sklearn import preprocessing
import xgboost as xgb
import matplotlib.pyplot as plt
from xgboost import plot_importance
from sklearn.metrics import mean_squared_error
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
import pickle

**Read File Path**

In [None]:
file = "../input/uncover/RDSC-07-30-Update/RDSC-07-30-Update/coronadatascraper/coronadatascraper-timeseries.csv"
covid_stats = pd.read_csv(file)

**Preprocessing**

In [None]:
X_columns = ['state', 'country',
       'population', 'lat', 'long', 'cases',
       'deaths', 'recovered', 'active', 'tested',
       'hospitalized_current', 'icu', 'icu_current',
       'growthfactor', 'date']

covid_stats['date'] = pd.to_datetime(covid_stats['date'])

covid_stats_X = covid_stats[X_columns]

**Remove rows with Hospitalized Resources null**

In [None]:
df1 = covid_stats_X.loc[
    (covid_stats_X['hospitalized_current'].notnull())
]

**Helper Function for creating Lags**

In [None]:
id_cols = ['date','country','state','population']

def lag_feature(df, lags, col):
    tmp = df[id_cols + [col]]
    for i in lags:
        shifted = tmp.copy()
        shifted.columns = id_cols + [(col+'_lag_'+str(i))]
        shifted['date'] += timedelta(days=i)
        df = pd.merge(df, shifted, on=id_cols, how='left')
    return df

**Create Lags for Selected Columns**

In [None]:
cases_to_consider = ['hospitalized_current','deaths','tested']

for c in cases_to_consider:
    if c == 'hospitalized_current':
        df1 = lag_feature(df1,[1,3,7],c)
    if c == 'deaths':
        df1 = lag_feature(df1,[1],c)
    if c == 'tested':
        df1 = lag_feature(df1,[7],c)

**New Data with Lags**

In [None]:
df1.head()

**Label Encoding for columns to support XG-Boost**

In [None]:
df1['state'] = df1['state'].astype(str)

LE = preprocessing.LabelEncoder()

df1['state'] = LE.fit_transform(df1['state'])
df1['country'] = LE.fit_transform(df1['country'])


**XGB Initialization**

In [None]:
model = xgb.XGBRegressor(max_depth=8,n_estimators=1000,
                     min_child_weight=300,colsample_bytree=0.8,
                     subsample=0.8,eta=0.3,seed=42)

**Train Validation Test Split**

In [None]:
X = df1

X = shuffle(X)

Y = X['hospitalized_current']

X.drop(['hospitalized_current'],axis = 1,inplace=True)
X.drop(['cases'],axis = 1,inplace=True)
X.drop(['deaths'],axis = 1,inplace=True)
X.drop(['recovered'],axis = 1,inplace=True)
X.drop(['tested'],axis = 1,inplace=True)
X.drop(['date'],axis = 1,inplace=True)

X_train, X_valid, y_train, y_valid = train_test_split(X, Y, test_size=0.30, random_state=42)

X_valid_a, X_test, y_valid_a, y_test = train_test_split(X_valid,y_valid,test_size = .10,random_state=42)

**Fit Model - it probably take ~4hrs**

In [None]:
model.fit(X_train,y_train,eval_metric="rmse",
          eval_set=[(X_train, y_train), (X_valid_a, y_valid_a)],
          verbose=True,early_stopping_rounds = 10)

**Helper function for plotting feature importance after XG BOOST**

In [None]:
def plot_features(booster, figsize):    
    fig, ax = plt.subplots(1,1,figsize=figsize)
    return plot_importance(booster=booster, ax=ax)

plot_features(model, (10,14))

**TESTING**

In [None]:
y_test_pred = model.predict(X_test)
mean_squared_error(y_test, y_test_pred)

**SAVE MODEL :)**

In [None]:
filehandler = open('object_model_1.md', 'wb') 
pickle.dump(model, filehandler)