# Passo 3: Testar diferentes algoritmos preditivos

Neste capítulo são apresentados os algoritmos de machine learning que foram testados com os dados selecionados no passo anterior.

## **Primeiras explorações da ferramenta Automated Time Series Forecasting (AutoTS)**

O AutoTS busca o modelo mais adequado para a base de dados, dentre mais de 20 modelos pré-definidos, de acordo com a sua sazonalidade e tendência, além de limpar os dados e eliminar outliers.

Fonte: https://pypi.org/project/AutoTS/

Fonte: https://analyticsindiamag.com/hands-on-guide-to-autots-effective-model-selection-for-multiple-time-series/

Fonte: https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html

```{figure} /fig/autots.png
:name: Modelos

[fonte](https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html)
```


In [16]:
file_to_read = open("datasets/dados.pickle", "rb")
dados = pickle.load(file_to_read)
file_to_read.close()

dados_autoML = dados.copy()
dados_autoML.reset_index(level=0, inplace=True)
dados_autoML.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 84 entries, 0 to 83
Data columns (total 24 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   index                   84 non-null     datetime64[ns]
 1   year                    84 non-null     int64         
 2   month                   84 non-null     int64         
 3   interruptores           84 non-null     int64         
 4   pulsadores              84 non-null     int64         
 5   tomadas                 84 non-null     int64         
 6   total                   84 non-null     int64         
 7   day                     84 non-null     int64         
 8   IBC-Br                  84 non-null     float64       
 9   Taxa de Desocupação     84 non-null     float64       
 10  Imobiliário             84 non-null     float64       
 11  Comercial               84 non-null     float64       
 12  Residencial             84 non-null     float64     

In [17]:
dados_autoML.rename(columns={'index': 'Data'}, inplace = True)
dados_autoML.to_csv('datasets/dados_autoML.csv', index=True)
dados_autoML

Unnamed: 0,Data,year,month,interruptores,pulsadores,tomadas,total,day,IBC-Br,Taxa de Desocupação,...,PIB,Variacao_PIB,CONSUMO,ICST-A,ICST-R,geracaoGWh,IndEletronica,IndEletrica,IndEletronica+Eletrica,IndGeral
0,2014-01-01,2014,1,5109625,149613,8168659,13427897,1,2.75,6.4,...,455935.0,-3.720284,4.027976e+07,99.6,97.8,50045.962407,47.4,45.3,46.308,53.1
1,2014-02-01,2014,2,5235998,141128,8774994,14152120,1,3.13,6.7,...,450358.8,-1.223025,4.165349e+07,98.4,96.7,46459.730274,48.8,49.1,48.956,52.3
2,2014-03-01,2014,3,5100680,144853,8465237,13710770,1,3.09,7.2,...,462159.8,2.620355,4.026943e+07,97.1,96.3,48004.277170,48.9,47.6,48.224,52.4
3,2014-04-01,2014,4,4695875,132094,7774926,12602895,1,2.37,7.1,...,468767.5,1.429744,3.959185e+07,92.9,92.6,45412.583753,45.8,44.2,44.968,49.2
4,2014-05-01,2014,5,5563254,156946,8883242,14603442,1,2.23,7.0,...,473347.1,0.976945,3.910050e+07,95.2,94.6,44890.282526,42.6,40.1,41.300,47.9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
79,2020-08-01,2020,8,11713475,210921,22059317,33983713,1,-3.35,14.4,...,628818.8,-0.325028,3.912152e+07,87.5,87.8,45954.347797,58.1,58.2,58.152,57.0
80,2020-09-01,2020,9,10635364,229234,21341217,32205815,1,-3.56,14.6,...,632047.2,0.513407,4.020856e+07,90.5,91.5,46045.040560,63.0,63.1,63.052,61.6
81,2020-10-01,2020,10,11688385,213752,21847140,33749277,1,-3.94,14.3,...,660199.5,4.454145,4.245064e+07,93.7,95.2,46098.056273,56.7,62.8,59.872,61.8
82,2020-11-01,2020,11,11337980,237001,22187188,33762169,1,-4.06,14.1,...,666928.9,1.019298,4.100828e+07,92.2,93.8,45883.704482,61.1,64.6,62.920,62.9


In [18]:
dados_autoML.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 84 entries, 0 to 83
Data columns (total 24 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   Data                    84 non-null     datetime64[ns]
 1   year                    84 non-null     int64         
 2   month                   84 non-null     int64         
 3   interruptores           84 non-null     int64         
 4   pulsadores              84 non-null     int64         
 5   tomadas                 84 non-null     int64         
 6   total                   84 non-null     int64         
 7   day                     84 non-null     int64         
 8   IBC-Br                  84 non-null     float64       
 9   Taxa de Desocupação     84 non-null     float64       
 10  Imobiliário             84 non-null     float64       
 11  Comercial               84 non-null     float64       
 12  Residencial             84 non-null     float64     

In [19]:
from autots import AutoTS

long = True

model = AutoTS(
    forecast_length=3,
    frequency='infer',
    prediction_interval=0.9,
    ensemble=None,
    model_list="superfast", # É possível reduzir esta lista de modelos candidatos.
	transformer_list="fast",
    max_generations=10, # original 5
    num_validations=4, # original 2
    validation_method="backwards"
) 
#Saída de dados da TS em 'total'.
model = model.fit(
    dados_autoML,
    date_col='Data' if long else None,
    value_col='total' if long else None,
    id_col= 'None' if long else None,
)

prediction = model.predict()

 SeasonalNaive
Model Number: 45 with model ZeroesNaive in generation 0 of 10
Model Number: 46 with model LastValueNaive in generation 0 of 10
Template Eval Error: ValueError('Unable to coerce to Series, length must be 1: given 81') in model 46: LastValueNaive
Model Number: 47 with model LastValueNaive in generation 0 of 10
Model Number: 48 with model ZeroesNaive in generation 0 of 10
Model Number: 49 with model SeasonalNaive in generation 0 of 10
Template Eval Error: ValueError('Unable to coerce to Series, length must be 1: given 81') in model 49: SeasonalNaive
Model Number: 50 with model SeasonalNaive in generation 0 of 10
Template Eval Error: ValueError('Unable to coerce to Series, length must be 1: given 81') in model 50: SeasonalNaive
Model Number: 51 with model GLS in generation 0 of 10
Model Number: 52 with model GLS in generation 0 of 10
Model Number: 53 with model AverageValueNaive in generation 0 of 10
Template Eval Error: ValueError('Unable to coerce to Series, length must be

In [21]:
# Print the details of the best model
print(model)

Initiated AutoTS object with best model: 
SeasonalNaive
{"fillna": "ffill", "transformations": {"0": "RobustScaler", "1": "StandardScaler", "2": "DifferencedTransformer", "3": "bkfilter"}, "transformation_params": {"0": {}, "1": {}, "2": {}, "3": {}}}
{"method": "LastValue", "lag_1": 420, "lag_2": "None"}


In [22]:
# point forecasts dataframe
forecasts_df = prediction.forecast
# upper and lower forecasts
forecasts_up, forecasts_low = prediction.upper_forecast, prediction.lower_forecast

In [None]:
# accuracy of all tried model results
model_results = model.results()
model_results

Unnamed: 0,ID,Model,ModelParameters,TransformationParameters,TransformationRuntime,FitRuntime,PredictRuntime,TotalRuntime,Ensemble,Exceptions,Runs,ValidationRound,smape,mae,rmse,containment,spl,contour,smape_weighted,mae_weighted,rmse_weighted,containment_weighted,spl_weighted,contour_weighted,TotalRuntimeSeconds,Score
0,2887d2af24ace64615ccf0154ef2e4be,AverageValueNaive,"{""method"": ""Mean""}","{""fillna"": ""fake_date"", ""transformations"": {""0...",0 days 00:00:00.040499,0 days 00:00:00.001050,0 days 00:00:00.002149,0 days 00:00:00.043698,0,,1,0,5.130918,1.669637e+06,1.788363e+06,1.000000,0.517606,0.5,5.130918,1.669637e+06,1.788363e+06,1.000000,0.517606,0.5,1,20.332584
1,48b83f14f0704b892258390f56edbd5f,AverageValueNaive,"{""method"": ""Mean""}","{""fillna"": ""mean"", ""transformations"": {""0"": ""C...",0 days 00:00:00.101405,0 days 00:00:00.000770,0 days 00:00:00.001415,0 days 00:00:00.103590,0,,1,0,4.794812,1.604529e+06,2.230759e+06,1.000000,0.808371,0.5,4.794812,1.604529e+06,2.230759e+06,1.000000,0.808371,0.5,1,19.497838
2,da3c53c0617d9f838e7cfcf5e786e702,AverageValueNaive,"{""method"": ""Mean""}","{""fillna"": ""rolling_mean_24"", ""transformations...",0 days 00:00:00,0 days 00:00:00,0 days 00:00:00,0 days 00:00:00,0,"ValueError('Unable to coerce to Series, length...",1,0,,,,,,,,,,,,,1,
3,36ca223faaa43b7034902862aafc1026,GLS,{},"{""fillna"": ""rolling_mean"", ""transformations"": ...",0 days 00:00:00.205293,0 days 00:00:00.001402,0 days 00:00:00.029477,0 days 00:00:00.236172,0,,1,0,13.122061,4.623450e+06,4.918553e+06,0.333333,0.690577,0.5,13.122061,4.623450e+06,4.918553e+06,0.333333,0.690577,0.5,1,51.815702
4,c1d0aca175f6832e905ebfd3f8fe1985,GLS,{},"{""fillna"": ""median"", ""transformations"": {""0"": ...",0 days 00:00:00.028544,0 days 00:00:00.001380,0 days 00:00:00.022714,0 days 00:00:00.052638,0,,1,0,67.712041,1.670516e+07,1.673927e+07,1.000000,1.269718,0.5,67.712041,1.670516e+07,1.673927e+07,1.000000,1.269718,0.5,1,258.358695
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
293,c77a1f7c1494d8f7ddee27f6bc376217,GLS,{},"{""fillna"": ""mean"", ""transformations"": {""0"": ""R...",0 days 00:00:00.017065,0 days 00:00:00.001432,0 days 00:00:00.026842,0 days 00:00:00.045339,0,,1,4,4.490592,1.158753e+06,1.231249e+06,1.000000,0.828747,0.5,4.490592,1.158753e+06,1.231249e+06,1.000000,0.828747,0.5,1,17.804546
294,cdfbec9a22055d8a31fc5ebec3c6acb5,GLS,{},"{""fillna"": ""rolling_mean_24"", ""transformations...",0 days 00:00:00.019139,0 days 00:00:00.001322,0 days 00:00:00.026185,0 days 00:00:00.046646,0,,1,4,4.490592,1.158753e+06,1.231249e+06,1.000000,0.828747,0.5,4.490592,1.158753e+06,1.231249e+06,1.000000,0.828747,0.5,1,17.804546
295,681ab6cfbe282fda3ed819bc3134d6b9,GLS,{},"{""fillna"": ""zero"", ""transformations"": {""0"": ""C...",0 days 00:00:00.013519,0 days 00:00:00.001906,0 days 00:00:00.024209,0 days 00:00:00.039634,0,,1,4,4.490592,1.158753e+06,1.231249e+06,1.000000,0.828747,0.5,4.490592,1.158753e+06,1.231249e+06,1.000000,0.828747,0.5,1,17.804546
296,d1f7f6a70c683c4d26a96d6b1e57010a,GLS,{},"{""fillna"": ""mean"", ""transformations"": {""0"": ""C...",0 days 00:00:00.014112,0 days 00:00:00.002237,0 days 00:00:00.023150,0 days 00:00:00.039499,0,,1,4,4.490592,1.158753e+06,1.231249e+06,1.000000,0.828747,0.5,4.490592,1.158753e+06,1.231249e+06,1.000000,0.828747,0.5,1,17.804546


In [23]:
# and aggregated from cross validation
model.results("validation")

Unnamed: 0,ID,Model,ModelParameters,TransformationParameters,Ensemble,Runs,smape,mae,rmse,containment,spl,contour,smape_weighted,mae_weighted,rmse_weighted,containment_weighted,contour_weighted,spl_weighted,TotalRuntimeSeconds,Score
0,0070f556f9604f90051ebc8e74c696d0,GLS,{},"{""fillna"": ""ffill_mean_biased"", ""transformatio...",0,5,17.953649,4.552921e+06,5.496600e+06,0.266667,2.157826,0.6,17.953649,4.552921e+06,5.496600e+06,0.266667,0.6,2.157826,1,61.514566
1,042fac61bfcf57fdd727fcc1be079e9b,GLS,{},"{""fillna"": ""rolling_mean"", ""transformations"": ...",0,5,17.953649,4.552921e+06,5.496600e+06,0.933333,1.544637,0.6,17.953649,4.552921e+06,5.496600e+06,0.933333,0.6,1.544637,1,61.086799
2,044c8ad1cdef05c9c6529fa2fefb6336,ZeroesNaive,{},"{""fillna"": ""ffill"", ""transformations"": {""0"": ""...",0,1,23.457447,6.943993e+06,7.047625e+06,0.000000,3.533724,0.5,23.457447,6.943993e+06,7.047625e+06,0.000000,0.5,3.533724,1,81.278559
3,049cd546893ce7ee47d42d2bd6507070,SeasonalNaive,"{""method"": ""LastValue"", ""lag_1"": 420, ""lag_2"":...","{""fillna"": ""rolling_mean_24"", ""transformations...",0,1,117.102081,4.831422e+08,8.153968e+08,0.000000,241.908013,0.5,117.102081,4.831422e+08,8.153968e+08,0.000000,0.5,241.908013,1,1057.295125
4,0506fbf3a7519055588447e44e402bc8,SeasonalNaive,"{""method"": ""Mean"", ""lag_1"": 2, ""lag_2"": ""None""}","{""fillna"": ""rolling_mean_24"", ""transformations...",0,1,101.842091,4.816650e+08,8.175588e+08,0.000000,241.281571,0.5,101.842091,4.816650e+08,8.175588e+08,0.000000,0.5,241.281571,1,1009.482918
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
169,f8ce8393397de812619568174d7c9403,ZeroesNaive,{},"{""fillna"": ""rolling_mean"", ""transformations"": ...",0,1,58.389489,1.492699e+07,1.496515e+07,0.000000,7.596185,0.5,58.389489,1.492699e+07,1.496515e+07,0.000000,0.5,7.596185,1,199.440434
170,f96455a8ac6ee2f47b4de9aad04262f0,SeasonalNaive,"{""method"": ""Median"", ""lag_1"": 24, ""lag_2"": ""No...","{""fillna"": ""ffill"", ""transformations"": {""0"": ""...",0,1,73.722613,1.777779e+07,1.782824e+07,0.000000,5.087147,1.0,73.722613,1.777779e+07,1.782824e+07,0.000000,1.0,5.087147,1,247.785259
171,fd3b98481e6f569f5f55ebc6f2161451,SeasonalNaive,"{""method"": ""LastValue"", ""lag_1"": 2, ""lag_2"": 24}","{""fillna"": ""rolling_mean"", ""transformations"": ...",0,1,3.405680,1.125735e+06,1.470055e+06,1.000000,0.491924,0.5,3.405680,1.125735e+06,1.470055e+06,1.000000,0.5,0.491924,1,12.005247
172,fe9ef9028f3190e9a1eb218f7a358215,LastValueNaive,{},"{""fillna"": ""rolling_mean_24"", ""transformations...",0,1,31.442781,8.981628e+06,9.057372e+06,0.000000,4.114129,0.5,31.442781,8.981628e+06,9.057372e+06,0.000000,0.5,4.114129,1,108.218777
