# TEAM ID : PTID-CDS-DEC-23-1738

# BATCH : 14 TH AUG BATCH

# PROJECT NAME : PRCP-1023-JohnsHopkinsCovid19


# Problem Statement

## Task 1:- Prepare a complete data analysis report on the given data.

## Task 2:- Fix a period for prediction of confirmed cases/deaths. Create a predictive model to forecast the Covid19 cases based on past cases for a specific country or region.

## Task3:- Make suggestions to the government health department of the country/region for preparation based on your predictions.



# Domain Expertise
## Domain: Healthcare
## 3 Dataset for Analysis
## 1.time_series_covid19_confirmed_global
- Contains the count of  confirmed patients on each day along with their country,states etc.
## 2.time_series_covid19_deaths_global
- Contains the count of  death cases on each day along with their country,States etc.
## 3.time_series_covid19_recovered_global
- Contains the count of  recovered patients on each day along with their country,states,etc.
## This is a daily updating version of COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). The data updates every day at 6am UTC, which updates just after the raw JHU data typically updates.
## Attributes:
### 1.Province/State
- This attribute contains Provinces or states of different countries.
### 2.Country/Region
- Contains different countries affected by covid 19.
### 3.Lat
- Latitude of the countries.
### 4.Long	
- Longitude of the countries.
### 5. 1/22/20	to 9/21/20
- These are the daily confirmed,death.recovered cases from 1/22/20	to 9/21/20.



In [1]:
import matplotlib.pyplot as plt
img = plt.imread('Covidp')
plt.imshow(img)
plt.axis('off')
plt.show()

FileNotFoundError: [Errno 2] No such file or directory: 'Covidp'

# Importing Necessary Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [None]:
import plotly.express as px
import plotly.graph_objects as go

# Getting the Dataset

In [None]:
confirmed=pd.read_csv('time_series_covid19_confirmed_global.csv')
deaths=pd.read_csv('time_series_covid19_deaths_global.csv')
recovered=pd.read_csv('time_series_covid19_recovered_global.csv')

# Basic Checks

In [None]:
recovered.head()

In [None]:
confirmed.head()

In [None]:
deaths.head()

In [None]:
confirmed.tail()

In [None]:
deaths.tail()

In [None]:
recovered.tail()

In [None]:
confirmed.shape,deaths.shape,recovered.shape 

### Both confirmed and death dataset have 265 rows and 248 columns but recovered dataset contains 253 rows and 248 columns 

In [None]:
 confirmed.info()

In [None]:
deaths.info()

In [None]:
recovered.info()

### The datasets contains 3 types of datatypes - float64,int64 and object.

In [None]:
recovered.describe().T

In [None]:
confirmed.describe().T

In [None]:
deaths.describe().T

# DATA PREPROCESSING

### Melting the 3 datasets and converting all the columns from 1/22/20 to 9/21/20 as rows with column name Date and taking all the values  as  confirmed,recovered,deaths

In [None]:
dates = confirmed.columns[4:]
confirmed_= confirmed.melt(
    id_vars = ['Province/State', 'Country/Region', 'Lat','Long'],
    value_vars = dates,
    var_name ='Date',
    value_name = 'Confirmed'
)
deaths_ = deaths.melt(
        id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
        value_vars=dates, 
        var_name='Date', 
        value_name='Deaths'
)

recovered_ = recovered.melt(
        id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
        value_vars=dates, 
        var_name='Date', 
        value_name='Recovered'
    )

In [None]:
confirmed_

In [None]:
deaths_

In [None]:
recovered_

### Merging the 3 dataset to one single dataset

In [None]:
df = confirmed_.merge(
  right=deaths_, 
  how='left',
  on=['Province/State', 'Country/Region', 'Date', 'Lat', 'Long']
)

df = df.merge(
  right=recovered_, 
  how='left',
  on=['Province/State', 'Country/Region', 'Date', 'Lat', 'Long']
)

In [None]:
df

In [None]:
df.Date=pd.to_datetime(df['Date'])#Changing the Date's datatype into datetime

In [None]:
df

In [None]:
df.describe().T

In [None]:
df.shape#After merging we have 64904 rows and 8 attributes.

In [None]:
df.isnull().sum()

### Recovered attribute contains 4636  null values.

In [None]:
df['Recovered'] = df['Recovered'].fillna(0)#Replacing the null values with 0 

In [None]:
df.isnull().sum()

In [None]:
df.dtypes

In [None]:
df['Recovered'] = df['Recovered'].fillna(0).astype(int)#Converting the datatype of Recovered to integer

In [None]:
df.dtypes

In [None]:
df=df.rename(columns={"Country/Region":"Country"})#Renaming the Country/Region atrribute to Country

In [None]:
df.info()

In [None]:
df.drop(['Province/State',"Long","Lat"],axis=1,inplace=True)#Dropping the unnecessary attributes

In [None]:
df

In [None]:
df.describe()

## Getting Active cases by substracting Confirmed,Deaths and Recovered.

In [None]:
df['Active']= df['Confirmed']-df['Deaths']-df['Recovered']

In [None]:
df

In [None]:
full_grouped = df.groupby(['Date', 'Country'])['Confirmed', 'Deaths', 'Recovered', 'Active'].sum().reset_index()
#Grouping the values based on Date,Country attributes 

In [None]:
#Getting the new cases by using shift method.
temp = full_grouped.groupby(['Country', 'Date', ])['Confirmed', 'Deaths', 'Recovered']
temp = temp.sum().diff().reset_index()

mask = temp['Country'] != temp['Country'].shift(1)

temp.loc[mask, 'Confirmed'] = np.nan
temp.loc[mask, 'Deaths'] = np.nan
temp.loc[mask, 'Recovered'] = np.nan

# Renaming the new cases per day as 'New cases', 'New deaths' and 'New recovered'
temp.columns = ['Country', 'Date', 'New cases', 'New deaths', 'New recovered']

# Merging the new columns and grouping them by Country and Date 
full_grouped = pd.merge(full_grouped, temp, on=['Country', 'Date'])

# Replacing all the NAN with 0
full_grouped = full_grouped.fillna(0)

# Fixing data types as Int
cols = ['New cases', 'New deaths', 'New recovered']
full_grouped[cols] = full_grouped[cols].astype('int')
full_grouped['New cases'] = full_grouped['New cases'].apply(lambda x: 0 if x<0 else x)

In [None]:
full_grouped

# EDA (Exploratory Data Analysis)

In [None]:
temp = full_grouped.groupby('Date')['Confirmed', 'Deaths', 'Recovered', 'Active'].sum().reset_index()
temp = temp.melt(id_vars="Date", value_vars=['Confirmed', 'Deaths', 'Recovered', 'Active'],
                 var_name='Case', value_name='Count')
color_map = {
    'Confirmed': 'blue',
    'Deaths': 'red',
    'Recovered': 'green',
    'Active': 'orange'
}
fig = px.line(temp, x="Date", y="Count", color='Case',
             title='Cases over time',color_discrete_map=color_map)
fig

In [None]:
fig = px.bar(temp, x="Date", y="Count", color='Case',
             title='Cases over time')
fig

# Insight
### The COVID-19 outbreak started in February 2020, and by mid-April it started to increase exponentially,there were around 31.02197 million confirmed cases At the time of september, total 21.26057 million people had recovered from the virus, but unfortunately around 960.685K people have lost the life.however,at the begining time of covid recovery rate is too less by the the time peoples and doctors taking more caution, recoveries rate also continued to increase, and total active cases in the last of the september is around 9.021534 million.

In [None]:
world_cases=temp.to_numpy()

In [None]:
#Plotting the new cases to
temp = full_grouped.groupby('Date')['New cases', 'New deaths', 'New recovered'].sum().reset_index()
temp = temp.melt(id_vars="Date", value_vars=['New cases', 'New deaths', 'New recovered'],
                 var_name='Case', value_name='Count')
temp.head()

fig = px.line(temp, x="Date", y="Count", color='Case',
             title='Daily Cases')
fig

# Insight
- In the month of the August we see high recovery rate and also the maximum death rate 
- The graph is showing a sudden hike in recovery in the month of july.

In [None]:
D1 = df.groupby('Country')['Confirmed', 'Deaths', 'Recovered', 'Active'].sum().reset_index()


In [None]:
D1['Active'] = D1['Active'].astype(int)


In [None]:
D1 = D1.sort_values('Confirmed', ascending=False)


In [None]:
others_series = pd.Series(np.sum(D1[10:]))
D1_others = D1[:10]
D1_others = D1_others.append(others_series, ignore_index=True)
D1_others.iloc[10,0] = 'Rest of the World'

In [None]:
fig = px.bar(D1.head(10).sort_values('Confirmed', ascending=True),
             x="Confirmed", y="Country",title='Top 10 Countries with confirmed cases',
             text='Confirmed', orientation='h',
             width=700, height=700)
fig.update_traces(opacity=0.6)
fig

# Insight
- From the above bar graph we can see that US is the top most country having confirmed cases around  544,027,944 and the Italy is least affected as we consider the top 10 countries with confirm cases.

In [None]:
fig = px.bar(D1.sort_values('Deaths', ascending=False).head(10).sort_values('Deaths', ascending=True),
             x="Deaths", y="Country", title=' Top 10 Countries with Death Cases', text='Deaths', orientation='h',
             width=700, height=700)
fig.update_traces(opacity=0.6)
fig

# Insight
- Italy is having less confirmed cases but it has 4th position in the death rate and in the peru not much people are loosing there lives .


In [None]:
fig = px.bar(D1.sort_values('Recovered', ascending=False).head(10).sort_values('Recovered', ascending=True),
             x="Recovered", y="Country", title='Top 10 Countries with Recovered Cases', text='Recovered', orientation='h',
             width=700, height=700)
fig.update_traces(opacity=0.6)
fig

# Insight
- US has highest confirmed and death cases but it has also showing good result in recovery ,even Brazil ranking 2nd position in confirmed and death cases and showing better result than Us in the recovery insights.

In [None]:
fig = px.bar(D1.sort_values('Active', ascending=False).head(10).sort_values('Active', ascending=True),
             x="Active", y="Country", title='Top 10 Countries with Active Cases', text='Active', orientation='h',
             width=700, height=700)
fig.update_traces(opacity=0.6)
fig

# Insight
- Top country with highest count of active cases are US,Brazil and India.

In [None]:
import plotly.express as px

# Convert the 'Date' column to string type
df["Date"] = df["Date"].astype(str)

# Create the choropleth map
choro_map = px.choropleth(
    df, 
    locations="Country", 
    locationmode="country names",
    color="Confirmed", 
    hover_name="Country", 
    animation_frame="Date"
)

# Update the layout
choro_map.update_layout(
    title_text='Global Spread of Coronavirus Confirmed cases',
    title_x=0.5,
    geo=dict(
        showframe=False,
        showcoastlines=False,
    )
)

choro_map.show()


# Insight
-  Above animated map is showing that global spread of confirmed COVID-19 cases over time.
- Each country is represented on the map, and the color intensity varies corresponding to the number of confirmed cases.
- And based on the 'Date' column,we see the progression of confirmed cases over different dates.
- Higher the cases country is highlighted by yellow shade and lower the cases country is highlighted by blue colour
- From february to march there is gradually increase of confirmed cases in the Us and it become consistent.
- On march onwards there will be gradually increase in the confirmed cases in Brazil and India.
- The moment we play the start button it automatically shows each changes till the last date.

In [None]:
df["Date"] = df["Date"].astype(str)

# Create the choropleth map
choro_map = px.choropleth(
    df, 
    locations="Country", 
    locationmode="country names",
    color="Recovered", 
    hover_name="Country", 
    animation_frame="Date"
)

# Update the layout
choro_map.update_layout(
    title_text='Global Spread of Coronavirus Recovered cases',
    title_x=0.5,
    geo=dict(
        showframe=False,
        showcoastlines=False,
    )
)

# Show the map
choro_map.show()


# Insights
- Above animated map is showing that global spread of Recoverd COVID-19 cases over time.
- Over the duration, there was a gradual increase in the recovery rate in the India.
- US is the 1st country showing more recovery rate upto last of june.

In [None]:
df["Date"] = df["Date"].astype(str)

# Create the choropleth map
choro_map = px.choropleth(
    df, 
    locations="Country", 
    locationmode="country names",
    color="Deaths", 
    hover_name="Country", 
    animation_frame="Date"
)

# Update the layout
choro_map.update_layout(
    title_text='Global Spread of Coronavirus Deaths cases',
    title_x=0.5,
    geo=dict(
        showframe=False,
        showcoastlines=False,
    )
)

# Show the map
choro_map.show()


# Insights
- Above animated map is showing that global spread of COVID-19 death cases over time.
- After the february,there was a gradual increase in the death rate in the US,Italy,France,Spain,United Kingdom and in Brazil - till 2020-07-14 there was not much death rate seen in india and from the period of 2020-07-14 to 2020-09-10 there was a gradual increase in the death rate in india .
- From above map we conclude that the US is showing the maximum death rate compared to other country.


In [None]:
df["Date"] = df["Date"].astype(str)

# Create the choropleth map
choro_map = px.choropleth(
    df, 
    locations="Country", 
    locationmode="country names",
    color="Active", 
    hover_name="Country", 
    animation_frame="Date"
)

# Update the layout
choro_map.update_layout(
    title_text='Global Spread of Coronavirus Active cases',
    title_x=0.5,
    geo=dict(
        showframe=False,
        showcoastlines=False,
    )
)

# Show the map
choro_map.show()


# Insights
- Above animated map is showing that global spread of COVID-19 Active cases over time.
- At the beginning of the march italy is showing the most active cacses after sometime there will be gradually descrese in the active cases.
- From 2020-03-10 to 2020-03-22 in US there is gradual increase in the active cases after it become consistant

# FEATURE ENGINEERING

In [None]:
full_grouped.corr()


In [None]:
plt.figure(figsize=(10,5))## Checking correlation
sns.heatmap(full_grouped.corr(),annot=True,linecolor='pink',cmap='CMRmap')

# Taking Country India for Predictive Modelling 

In [None]:
India_confirmed=confirmed[confirmed['Country/Region']=="India"].T
India_deaths=deaths[deaths['Country/Region']=="India"].T
India_recovered=recovered[recovered['Country/Region']=="India"].T
print(India_confirmed)
print(India_deaths)
print(India_recovered)

In [None]:
India_join=India_confirmed.join(India_deaths,how='left',lsuffix='_confirmed',rsuffix='_deaths')
India=India_join.join(India_recovered,how='left',lsuffix='_',rsuffix='_recovered')


In [None]:
India

In [None]:
India=India.rename(columns={"143_confirmed":"confirmed","143_deaths":"deaths",130:"recovered"})

In [None]:
India

In [None]:
# Excluding the less significant raw of the data
India=India[4:]

In [None]:
India

In [None]:
India.index=pd.to_datetime(India.index)# Converting the Index into datetime format


In [None]:
India

In [None]:
split_point=int(0.8*len(India))#Taking the split point 
print(split_point)
train=India[0:split_point]#Training data
test=India[split_point:]#Testing data

In [None]:
train

In [None]:
test

In [None]:
#Taking the confirmed cases for further processes
#Graphical Representation of Confirmed Cases
plt.figure(num=None,figsize=(8,6),dpi=80)
plt.plot(train.confirmed,label='confirmed Cases Train Data')
plt.plot(test.confirmed,label='confirmed Cases Test Data')
plt.legend()
plt.title("Covid-19 confirmed Cases in India")
plt.show()

In [None]:
# Resampling the data to weekly interval
resample=train.resample('7D')
weekly=resample.sum()
print(weekly)

In [None]:
# Resampling the data to monthly interval
resample=train.resample('1M')
monthly=resample.sum()
print(monthly)

In [None]:
# Decompose the monthly time-series to its components using statementslibrary.
from statsmodels.tsa.seasonal import seasonal_decompose

result = seasonal_decompose(monthly.confirmed, period = 1 ,model="Multiplicative")
result.plot()
plt.show();

In [None]:
#Convert the dataframe columns into int to eliminate any other datatypes
train.confirmed=pd.to_numeric(train.confirmed)
train.deaths=pd.to_numeric(train.deaths)
train.recovered=pd.to_numeric(train.recovered)
test.confirmed=pd.to_numeric(test.confirmed)
train.deaths=pd.to_numeric(test.deaths)
test.recovered=pd.to_numeric(test.recovered)

## Checking the Stationarity

In [None]:
plt.figure(figsize=(10,5))
sns.lineplot(data=India,x=India.index,y=India.confirmed)

In [None]:
India['rollmean']=India.confirmed.rolling(window=12).mean()

In [None]:
India['rollStd']=India.confirmed.rolling(window=12).std()

In [None]:
plt.figure(figsize=(10,5))
sns.lineplot(data=India,x=India.index,y=India.confirmed,label='Confirmed')
sns.lineplot(data=India,x=India.index,y=India.rollmean,label='Rollmean')
sns.lineplot(data=India,x=India.index,y=India.rollStd,label='Rollstd')

In [None]:
# it is not constant thus it is not stationary

# ADF TEST

In [None]:
from statsmodels.tsa.stattools import adfuller

In [None]:
adf=adfuller(India['confirmed'],autolag='AIC')

In [None]:
adf

In [None]:
adf[0:4]

In [None]:
stats=pd.Series(adf[0:4],index=['Test Statistic','p-value',"#lags used",'number of observation used'])
stats

In [None]:
for key,values in adf[4].items():
    print("criticality",key,":",values)

In [None]:
#condition= test statistics< crtitical value
# p value<0.05
# Here -2.306830 > crtitical value
# Which contradicts the assumption
# Therfore it is non-stationary

In [None]:
# To make stationarity-
#1.Shift
#2.Log
#3.square root,cube...


In [None]:
def test_stationarity(dataframe,var):
    dataframe['rollmean']=dataframe[var].rolling(window=12).mean()
    dataframe['rollStd']=dataframe[var].rolling(window=12).std()
    
    from statsmodels.tsa.stattools import adfuller
    adf=adfuller(dataframe[var],autolag='AIC')
    stats=pd.Series(adf[0:4],index=['Test Statistic','p-value',"#lags used",'number of observation used'])
    print(stats)
    
    for key,values in adf[4].items():
        print("criticality",key,":",values)
    
    
    
    


## Method 1- Shift Method

In [None]:
dt=India[['confirmed']]
dt['shift']=dt.confirmed.shift()
dt.head()

In [None]:
dt['shiftdiff']=dt.confirmed-dt['shift']

In [None]:
dt


In [None]:
test_stationarity(dt.dropna(),'shiftdiff')

#### Here, test statistics< crtitical value satisfys the condition.
#### Therefore we can say that the data is staionary.

# MODEL CREATION

# MODEL 1 :- ARIMA-TESTING


In [None]:
#Finding the Partial Autocorrelation and Autocorrelation
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf

In [None]:
plot_pacf(India['confirmed'],lags=12);#Plotting the PACF


In [None]:
plot_acf(India['confirmed'],lags=12); #Plotting the ACF

In [None]:
#p=1,q=1/2,d=1

In [None]:
from statsmodels.tsa.arima.model import ARIMA

# Define and fit the ARIMA model
model = ARIMA(train['confirmed'], order=(1, 1, 2))
result = model.fit()


In [None]:
# Generating the predictions using the fitted model
predict = result.predict(start=test.index[0], end=test.index[-1], dynamic=False)


In [None]:
predict

In [None]:
test['arima_pred']=predict

In [None]:
test

In [None]:
sns.lineplot(data=India,x=India.index,y='confirmed',label='Confirmed')
sns.lineplot(data=test,x=test.index,y='arima_pred',label='Arima_model')
plt.xticks(rotation=45)
# Show the plot
plt.show()

# MODEL EVALUATION

In [None]:
from sklearn.metrics import mean_squared_error


In [None]:
m=np.sqrt(mean_squared_error(test['confirmed'],predict))
print("Mean Squared Error:", m)

###  The mean Squared error for Arima Model is  538135.4134552339.

# MODEL 2 :- SARIMAX

In [None]:
from statsmodels.tsa.statespace.sarimax import SARIMAX

In [None]:
India

In [None]:
from statsmodels.tsa.statespace.sarimax import SARIMAX
import numpy as np


#Model parameters
order = (1, 1, 1)
seasonal_order = (1, 1, 1, 12)

try:
    # Define and fit the SARIMAX model
    model = SARIMAX(train['confirmed'], order=order, seasonal_order=seasonal_order)
    result = model.fit()
    prediction = result.predict(start=test.index[0], end=test.index[-1])

except np.linalg.LinAlgError:
    print("Singular matrix encountered. Adjusting model parameters or data preprocessing may help.")


In [None]:
prediction

In [None]:
test['sarima_pred']=prediction

In [None]:
sns.lineplot(data=India,x=India.index,y='confirmed',label='Confirmed')
sns.lineplot(data=test,x=test.index,y='sarima_pred',label='Sarimax_pred')
plt.xticks(rotation=45)

# Show the plot
plt.show()

# MODEL EVALUATION

In [None]:
np.sqrt(mean_squared_error(test['confirmed'],prediction))

### The mean squared error for SARIMAX Model is 112996.42785296212


# MODEL 3 :- HOLT-WINTER

In [None]:
constant=1
train['Conf_']=train['confirmed']+constant



In [None]:
train

In [None]:
from statsmodels.tsa.holtwinters import ExponentialSmoothing
hwmodel=ExponentialSmoothing(train.Conf_,trend="mul",seasonal="mul",seasonal_periods=30,damped=True).fit()

In [None]:
test_pred=hwmodel.forecast(49)

In [None]:
test_pred

In [None]:
test['test_pred']=test_pred

In [None]:
train['Conf_'].plot(legend=True,label='Train',figsize=(10,6))
test['confirmed'].plot(legend=True,label='Test')
test_pred.plot(legend=True,label='ExponentialSmoothing_pred')

# MODEL EVALUATION

In [None]:
np.sqrt(mean_squared_error(test['confirmed'],test_pred))#evaluation

### The mean squared error for Exponential Smoothing Model is 187485.6519744362.

## While considering the 3 model, it is seen that Sarimax is showing much more better performance than other models.So we are taking Sarimax for future prediction.

# FUTURE PREDICTION

In [None]:
import pandas as pd

# Generate a DataFrame with dates as index
f = pd.DataFrame(index=pd.date_range(start="2020-09-21", end="2020-12-21"))

# Check the first few rows of the DataFrame
print(f.head())


In [None]:
f#Creating a new dataframe for future prediction

In [None]:
result.predict(start=f.index[0],end=f.index[-1])#Predicting the values for the new dataframe

In [None]:
result.predict(start=f.index[0],end=f.index[-1]).plot()#Plotting the value

In [None]:
sns.lineplot(data=India,x=India.index,y='confirmed',label='Confirmed')
sns.lineplot(data=test,x=test.index,y='sarima_pred',label='Sarima_model')
a=result.predict(start=f.index[0],end=f.index[-1])
a.plot(label="Future_values")
plt.legend()

# CONCLUSION
- The JohnsHopkinsCovid19 project consist of 3 different dataset.They are 'time_series_covid19_confirmed_global','time_series_covid19_deaths_global','time_series_covid19_recovered_global'.
- Since the 3 datasets have 248 columns in which first 4 columns are Province/State,Country/Region,Lat,Long and rest are the dates so we converted the wide dataset into long by using the melt function.
- After melting 3 datasets then we merged those datasets into one single dataset.
- While doing the EDA it is seen that US,Brazil and India are those countries mostly affected by the Covid19 virus.
- We had tried 3 models for forcasting -ARIMA,SARIMAX,Exponential Smoothing.
- After evaluating multiple time series models, it was determined that the   SARIMAX model demonstrated the best performance based on Mean Squared Error (MSE).Thus we selected SARIMAX model for predicting confirm cases for the period of time "2020-09-21" to "2020-12-21".


# SUGGESTION
#### Based on our analysis on India these are our suggestions,
- Improve testing capabilities:-Make testing more accessible and capable in order to quickly detect and isolate affected people.
- Improve Healthcare System :-Boost the healthcare system by making sure there are enough hospital beds, medical supplies, and equipment to handle any spikes in COVID-19 cases.
- Affordable medical support :- As most of these medical facilities are expensive most of the people stay back from these which can be threaten for their life.
- Proper Awareness programs on Rural Areas:- As the population in Rural areas are comparitively more proper awareness programs are necessary.

# RISKS
- Since there was 3 dataset it was difficult to merge them.
- Finding the best time series model was also a challenging one.
