# Exemple of prediction using the Predictor class

In [1]:
from predictor import Predictor

The Predictor class is built using three argument : 
* $\texttt{state}$ : the abbreviation of the state on which we want to make the prediction. 
* $\texttt{path}$ : the path to the COVIDMINDER folder : the program will use the file COVIDMINDER/data/csv/time_series/covid_TS_counties_long.cases.csv
* $\texttt{col}$ : the column on which we want to make the predictions (for exemple : p_cases,p_diff,cases,deaths, etc..)

In [2]:
state='NY'
path = '../' 
col = 'p_cases' 
Pred=Predictor(state=state,path=path,col=col)

Once the model is built, it needs to build the data it will train on. The method $\texttt{Build_Data}$ does that, it returns the DataFrame on which the model will be trained. This dataframe has dates as index, names of the counties as column names, and the selected column value for said county at said date.

In [3]:
Data=Pred.Build_Data()
display(Data)

County,Albany,Allegany,Bronx,Broome,Cattaraugus,Cayuga,Chautauqua,Chemung,Chenango,Clinton,...,Sullivan,Tioga,Tompkins,Ulster,Warren,Washington,Wayne,Westchester,Wyoming,Yates
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-03-04T00:00:00Z,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1.033585,0.000000,0.000000
2020-03-05T00:00:00Z,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1.860454,0.000000,0.000000
2020-03-06T00:00:00Z,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1.963812,0.000000,0.000000
2020-03-07T00:00:00Z,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,5.891436,0.000000,0.000000
2020-03-08T00:00:00Z,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,8.578758,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-07-13T00:00:00Z,728.299935,143.194984,3411.138148,434.148083,181.299841,164.542415,134.748588,177.338957,357.997755,135.428962,...,1942.146569,334.004108,182.031709,1048.582836,437.883148,406.836154,232.433995,3651.346865,255.902055,192.670493
2020-07-14T00:00:00Z,735.501103,147.534226,3415.086796,441.497627,182.613608,165.848308,141.840618,178.537193,362.234414,136.671429,...,1943.472266,334.004108,182.031709,1054.214323,437.883148,408.470035,237.994617,3655.377848,258.410898,196.684462
2020-07-15T00:00:00Z,746.302855,149.703847,3419.529025,447.272269,185.241142,167.154200,146.568639,180.933666,370.707734,137.913897,...,1944.797964,338.153227,193.775690,1063.224702,444.138621,408.470035,245.779488,3658.271887,258.410898,200.698431
2020-07-16T00:00:00Z,749.576113,151.873468,3425.099439,451.996976,186.554909,167.154200,150.508656,180.933666,377.062724,137.913897,...,1947.449358,338.153227,193.775690,1066.603594,445.702490,408.470035,248.003737,3661.165926,258.410898,200.698431


Once this is done, the best parameters can be found using $\texttt{GridSearch(n_days)}$. This evaluates the Mean Absolute Error for each parameters, over $\texttt{n_days}$ days, and prints the best parameters and its score (this step takes a while).

In [None]:
Pred.GridSearch(n_days=7)

Training failed for parameters : (2, 0, 3)
Training failed for parameters : (2, 0, 4)
Training failed for parameters : (3, 0, 1)
Training failed for parameters : (3, 1, 2)
Training failed for parameters : (3, 0, 3)
Training failed for parameters : (3, 0, 4)


The prediction can then be made using $\texttt{Prediction(n_days,params)}$ where n_days is the number of out-of-sample days to predict the values. The params argument can either be the $(p,d,q)$ parameters of the ARIMA model, or 'Best', to use the found parameters found using the Grid Search. By default, it uses the best parameters. The output has 5 columns : the usual date and county, aswell as :
* $\texttt{mean}$ : the actual prediction
* $\texttt{mean_ci_upper}$ and $\texttt{mean_ci_lower}$ : Respectively the upper and lower bounds of the confidence interval.
The output can then be saved using the $\texttt{save_pred(pathsave)}$ method.

In [None]:
output=Pred.Prediction(n_days=7)
display(output)
Pred.save_pred('Prediction_NY.csv')