#Time Series with PyCaret Regression Module

Time series forecasting can broadly be categorized into the following categories:

- Classical / Statistical Models — Moving Averages, Exponential smoothing, ARIMA, SARIMA, TBATS
- Machine Learning — Linear Regression, XGBoost, Random Forest, or any ML model with reduction methods
- Deep Learning — RNN, LSTM

**For this project, we will use second category i.e. Machine Learning.**

# All Imports and Installation

Uncomment the lines to install packages 

In [None]:
!pip install pycaret 
!pip install markupsafe==2.0.1
!pip install --upgrade plotly

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pycaret
  Downloading pycaret-2.3.10-py3-none-any.whl (320 kB)
[K     |████████████████████████████████| 320 kB 14.5 MB/s 
Collecting kmodes>=0.10.1
  Downloading kmodes-0.12.1-py2.py3-none-any.whl (20 kB)
Collecting Boruta
  Downloading Boruta-0.3-py3-none-any.whl (56 kB)
[K     |████████████████████████████████| 56 kB 4.0 MB/s 
[?25hCollecting pandas-profiling>=2.8.0
  Downloading pandas_profiling-3.2.0-py2.py3-none-any.whl (262 kB)
[K     |████████████████████████████████| 262 kB 62.3 MB/s 
[?25hCollecting numba<0.55
  Downloading numba-0.54.1-cp37-cp37m-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 48.6 MB/s 
[?25hCollecting spacy<2.4.0
  Downloading spacy-2.3.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.4 MB)
[K     |████████████████████████████████| 10.4 MB 73.7 MB/s 
[?25hCollecting sc

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting markupsafe==2.0.1
  Downloading MarkupSafe-2.0.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (31 kB)
Installing collected packages: markupsafe
  Attempting uninstall: markupsafe
    Found existing installation: MarkupSafe 2.1.1
    Uninstalling MarkupSafe-2.1.1:
      Successfully uninstalled MarkupSafe-2.1.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pandas-profiling 3.2.0 requires markupsafe~=2.1.1, but you have markupsafe 2.0.1 which is incompatible.[0m
Successfully installed markupsafe-2.0.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting plotly
  Downloading plotly-5.9.0-py2.py3-none-any.whl (15.2 MB)
[K     |██████████████

In [1]:
import pandas as pd
import numpy as np
import pycaret
import jinja2
import plotly.express as px

  defaults = yaml.load(f)


# Data Loading

In [23]:
data = pd.read_excel('/content/DatasetDallas.xlsx')
data['datetime'] = pd.to_datetime(data['datetime'])
data.head()

Unnamed: 0,name,datetime,tempmax,tempmin,temp(avg),humidity,precip,snow,windspeed,visibility
0,dallas,2019-01-01,43.1,36.5,39.2,75.2,0.0,0.0,15.1,9.7
1,dallas,2019-01-02,36.8,32.8,34.7,88.6,0.83,0.0,14.9,6.7
2,dallas,2019-01-03,40.2,35.4,37.7,92.2,0.59,0.0,13.4,6.2
3,dallas,2019-01-04,59.3,37.2,45.4,66.8,0.0,0.0,13.9,9.6
4,dallas,2019-01-05,70.9,37.3,53.1,57.9,0.0,0.0,11.5,9.9


In [3]:
# create 12 month moving average
data['MA12'] = data['tempmax'].rolling(12).mean()
# plot the data and MA

fig = px.line(data, x="datetime", y=["tempmax", "MA12"], template = 'plotly_dark')
fig.show()

Since algorithms cannot directly deal with dates, let’s extract some simple features from dates such as month and year, and drop the original date column.

In [4]:
data=data.drop(columns=['name'],axis=0)

In [5]:
# extract month and year from dates
data['Month'] = [i.month for i in data['datetime']]
data['Year'] = [i.year for i in data['datetime']]
data['Day'] = [i.day for i in data['datetime']]
# create a sequence of numbers
data['Series'] = np.arange(1,len(data)+1)
# drop unnecessary columns and re-arrange
data.drop(['datetime', 'MA12'], axis=1, inplace=True)
data = data[['Series', 'Year', 'Month','Day', 'tempmax']] 
# check the head of the dataset
data.head()

Unnamed: 0,Series,Year,Month,Day,tempmax
0,1,2019,1,1,43.1
1,2,2019,1,2,36.8
2,3,2019,1,3,40.2
3,4,2019,1,4,59.3
4,5,2019,1,5,70.9


In [6]:
# split data into train-test set
train = data[data['Year'] < 2021]
test = data[data['Year'] >= 2021]
# check shape
train.shape, test.shape

((731, 5), (580, 5))

#Initialize Setup

Now it’s time to initialize the setup function, where we will explicitly pass the training data, test data, and cross-validation strategy using the fold_strategy parameter.

In [7]:
# import the regression module
from pycaret.regression import *
# initialize setup
s = setup(data = train, 
          test_data = test, 
          target = 'tempmax', 
          fold_strategy = 'timeseries', 
          numeric_features = ['Series', 'Year', 'Month','Day'], 
          fold = 3, 
          transform_target = True, 
          session_id = 123)

Unnamed: 0,Description,Value
0,session_id,123
1,Target,tempmax
2,Original Data,"(731, 5)"
3,Missing Values,False
4,Numeric Features,4
5,Categorical Features,0
6,Ordinal Features,False
7,High Cardinality Features,False
8,High Cardinality Method,
9,Transformed Train Set,"(731, 4)"


INFO:logs:create_model_container: 0
INFO:logs:master_model_container: 0
INFO:logs:display_container: 1
INFO:logs:Pipeline(memory=None,
         steps=[('dtypes',
                 DataTypes_Auto_infer(categorical_features=[],
                                      display_types=True, features_todrop=[],
                                      id_columns=[], ml_usecase='regression',
                                      numerical_features=['Series', 'Year',
                                                          'Month', 'Day'],
                                      target='tempmax', time_features=[])),
                ('imputer',
                 Simple_Imputer(categorical_strategy='not_available',
                                fill_value_categorical=None,
                                fill_value_nu...
                ('scaling', 'passthrough'), ('P_transform', 'passthrough'),
                ('binn', 'passthrough'), ('rem_outliers', 'passthrough'),
                ('cluster_all', 'p

#Train and Evaluate all Models

In [8]:
best = compare_models(sort = 'MAE')

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
et,Extra Trees Regressor,10.2986,194.9693,13.3397,0.2113,0.185,0.1525,0.1833
lightgbm,Light Gradient Boosting Machine,13.0132,288.4556,16.3918,-0.2556,0.2265,0.1838,0.0967
gbr,Gradient Boosting Regressor,13.0235,281.6556,16.2252,-0.2199,0.222,0.1817,0.0467
ada,AdaBoost Regressor,13.0659,289.5799,16.4983,-0.2455,0.2255,0.1838,0.0467
rf,Random Forest Regressor,13.2037,290.8962,16.6721,-0.2548,0.2283,0.1842,0.25
dt,Decision Tree Regressor,13.8874,325.473,17.7837,-0.4187,0.2479,0.1939,0.0167
dummy,Dummy Regressor,13.9154,264.6506,16.1804,-0.1175,0.2175,0.1896,0.02
llar,Lasso Least Angle Regression,13.9154,264.6506,16.1804,-0.1175,0.2175,0.1896,0.0167
knn,K Neighbors Regressor,15.8692,404.3162,20.0417,-0.7883,0.2749,0.2278,0.08
par,Passive Aggressive Regressor,16.4457,533.7774,21.4258,-1.0862,0.2726,0.2495,0.0167


INFO:logs:create_model_container: 18
INFO:logs:master_model_container: 18
INFO:logs:display_container: 2
INFO:logs:PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion

The best model based on cross-validated MAE is Extra Trees Regressor (MAE: 9.9391). Let’s check the score on the test set.

In [9]:
prediction_holdout = predict_model(best);

INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion='mse',
                                  

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,8.3419,121.4603,11.0209,0.5559,0.1891,0.1367


In [10]:
# generate predictions on the original dataset
predictions = predict_model(best, data=data)

INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion='mse',
                                  

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,3.6905,53.7353,7.3304,0.7948,0.1257,0.0605


In [11]:
predictions

Unnamed: 0,Series,Year,Month,Day,tempmax,Label
0,1,2019,1,1,43.1,43.099998
1,2,2019,1,2,36.8,36.799999
2,3,2019,1,3,40.2,40.200001
3,4,2019,1,4,59.3,59.299999
4,5,2019,1,5,70.9,70.900002
...,...,...,...,...,...,...
1306,1307,2022,7,30,99.5,90.061318
1307,1308,2022,7,31,101.6,87.557394
1308,1309,2022,8,1,99.9,89.560012
1309,1310,2022,8,2,100.5,90.387583


# Five Day Forecasting

In [12]:
#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2022-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

Unnamed: 0,Series,Year,Month,Day
0,1311,2022,8,4
1,1312,2022,8,5
2,1313,2022,8,6
3,1314,2022,8,7
4,1315,2022,8,8


In [13]:
#best model to use for prediction
final_best = finalize_model(best)

INFO:logs:Initializing finalize_model()
INFO:logs:finalize_model(estimator=PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion='mse',
                                

## Five Day Prediction of Dallas

In [14]:
predictions_future = predict_model(final_best, data=future_df)
predictions_future

INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion='mse',
                                  

Unnamed: 0,Series,Year,Month,Day,Label
0,1311,2022,8,4,89.738404
1,1312,2022,8,5,88.934504
2,1313,2022,8,6,92.013947
3,1314,2022,8,7,92.072986
4,1315,2022,8,8,92.116803
5,1316,2022,8,9,92.34744


# Plot Five Day Prediction/Forecasting

In [15]:
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2022-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["tempmax", "Label"], template = 'plotly_dark')
fig.show()

# Yearly Forecasting for better visualization

In [18]:
#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2023-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

Unnamed: 0,Series,Year,Month,Day
0,1311,2022,8,4
1,1312,2022,8,5
2,1313,2022,8,6
3,1314,2022,8,7
4,1315,2022,8,8


In [19]:
predictions_future = predict_model(final_best, data=future_df)
predictions_future

INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion='mse',
                                  

Unnamed: 0,Series,Year,Month,Day,Label
0,1311,2022,8,4,89.738404
1,1312,2022,8,5,88.934504
2,1313,2022,8,6,92.013947
3,1314,2022,8,7,92.072986
4,1315,2022,8,8,92.116803
...,...,...,...,...,...
366,1677,2023,8,5,88.934504
367,1678,2023,8,6,92.013947
368,1679,2023,8,7,92.072986
369,1680,2023,8,8,92.116803


In [20]:
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2023-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["tempmax", "Label"], template = 'plotly_dark')
fig.show()

# Average Temp Forecasting

In [25]:
data = pd.read_excel('/content/DatasetDallas.xlsx')
data['datetime'] = pd.to_datetime(data['datetime'])
data.head()

# create 12 month moving average
data['MA12'] = data['temp(avg)'].rolling(12).mean()
# plot the data and MA
import plotly.express as px
fig = px.line(data, x="datetime", y=["temp(avg)", "MA12"], template = 'plotly_dark')
fig.show()

# extract month and year from dates
data['Month'] = [i.month for i in data['datetime']]
data['Year'] = [i.year for i in data['datetime']]
data['Day'] = [i.day for i in data['datetime']]
# create a sequence of numbers
data['Series'] = np.arange(1,len(data)+1)
# drop unnecessary columns and re-arrange
data.drop(['datetime', 'MA12'], axis=1, inplace=True)
data = data[['Series', 'Year', 'Month','Day', 'temp(avg)']] 
# check the head of the dataset
data.head()

# split data into train-test set
train = data[data['Year'] < 2021]
test = data[data['Year'] >= 2021]
# check shape
train.shape, test.shape

# import the regression module
from pycaret.regression import *
# initialize setup
s = setup(data = train, 
          test_data = test, 
          target = 'temp(avg)', 
          fold_strategy = 'timeseries', 
          numeric_features = ['Series', 'Year', 'Month','Day'], 
          fold = 3, 
          transform_target = True, 
          session_id = 123)

#train and evaluate
best = compare_models(sort = 'MAE')

#prediction on testing set
prediction_holdout = predict_model(best);

# generate predictions on the original dataset
predictions = predict_model(best, data=data)

#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2022-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#best model to use for prediction
final_best = finalize_model(best)

#five day prediction of Dallas
predictions_future = predict_model(final_best, data=future_df)
print(predictions_future)

#Plotting Five Day of Dallas
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2022-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["temp(avg)", "Label"], template = 'plotly_dark')
fig.show()

#Yearly Plotting
#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2023-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#Prediction Yearly
predictions_future = predict_model(final_best, data=future_df)
predictions_future

#Yearly Visualization
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2023-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["temp(avg)", "Label"], template = 'plotly_dark')
fig.show()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
et,Extra Trees Regressor,10.2285,203.183,13.3093,0.1784,0.2084,0.1788,0.18
dt,Decision Tree Regressor,11.9762,257.3299,15.3547,-0.1089,0.2378,0.2017,0.02
gbr,Gradient Boosting Regressor,12.6857,282.8888,16.0841,-0.2642,0.2496,0.2056,0.05
ada,AdaBoost Regressor,12.7593,277.1889,15.914,-0.2454,0.249,0.2071,0.06
lightgbm,Light Gradient Boosting Machine,12.7746,282.1301,15.945,-0.2771,0.25,0.2065,0.0367
rf,Random Forest Regressor,12.8045,288.7545,16.3989,-0.282,0.255,0.2075,0.2533
dummy,Dummy Regressor,13.8563,253.3902,15.798,-0.0963,0.2393,0.2144,0.0133
llar,Lasso Least Angle Regression,13.8563,253.3902,15.798,-0.0963,0.2393,0.2144,0.0167
knn,K Neighbors Regressor,16.1175,417.5797,20.4258,-0.9088,0.315,0.2697,0.0833
par,Passive Aggressive Regressor,16.7517,560.0456,21.5755,-1.1772,0.3093,0.301,0.0167


INFO:logs:create_model_container: 18
INFO:logs:master_model_container: 18
INFO:logs:display_container: 2
INFO:logs:PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,7.6941,102.412,10.1199,0.6289,0.2061,0.1516


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion='mse',
                                  

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,3.4039,45.3081,6.7311,0.8224,0.1371,0.0671


INFO:logs:Initializing finalize_model()
INFO:logs:finalize_model(estimator=PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion='mse',
                                

   Series  Year  Month  Day      Label
0    1311  2022      8    4  81.227713
1    1312  2022      8    5  81.403371
2    1313  2022      8    6  82.489452
3    1314  2022      8    7  83.127069
4    1315  2022      8    8  83.182568
5    1316  2022      8    9  83.309605


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion='mse',
                                  

# Tempmin Forecasting

In [26]:
data = pd.read_excel('/content/DatasetDallas.xlsx')
data['datetime'] = pd.to_datetime(data['datetime'])
data.head()

# create 12 month moving average
data['MA12'] = data['tempmin'].rolling(12).mean()
# plot the data and MA
import plotly.express as px
fig = px.line(data, x="datetime", y=["tempmin", "MA12"], template = 'plotly_dark')
fig.show()

# create a sequence of numbers
data['Series'] = np.arange(1,len(data)+1)

# extract month and year from dates
data['Month'] = [i.month for i in data['datetime']]
data['Year'] = [i.year for i in data['datetime']]
data['Day'] = [i.day for i in data['datetime']]

# drop unnecessary columns and re-arrange
data.drop(['datetime', 'MA12'], axis=1, inplace=True)
data = data[['Series', 'Year', 'Month','Day', 'tempmin']] 
# check the head of the dataset
data.head()

# split data into train-test set
train = data[data['Year'] < 2021]
test = data[data['Year'] >= 2021]
# check shape
train.shape, test.shape

# import the regression module
from pycaret.regression import *
# initialize setup
s = setup(data = train, 
          test_data = test, 
          target = 'tempmin', 
          fold_strategy = 'timeseries', 
          numeric_features = ['Series', 'Year', 'Month','Day'], 
          fold = 3, 
          transform_target = True, 
          session_id = 123)

#train and evaluate
best = compare_models(sort = 'MAE')

#prediction on testing set
prediction_holdout = predict_model(best);

# generate predictions on the original dataset
predictions = predict_model(best, data=data)

#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2022-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#best model to use for prediction
final_best = finalize_model(best)

#five day prediction of Dallas
predictions_future = predict_model(final_best, data=future_df)
print(predictions_future)

#Plotting Five Day of Dallas
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2022-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["tempmin", "Label"], template = 'plotly_dark')
fig.show()

#Yearly Plotting
#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2023-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#Prediction Yearly
predictions_future = predict_model(final_best, data=future_df)
predictions_future

#Yearly Visualization
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2023-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["tempmin", "Label"], template = 'plotly_dark')
fig.show()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
et,Extra Trees Regressor,10.277,217.4199,13.4652,0.147,0.2451,0.2202,0.18
gbr,Gradient Boosting Regressor,12.1897,263.3541,15.4789,-0.1392,0.2776,0.2413,0.05
ada,AdaBoost Regressor,12.4995,275.9336,15.8547,-0.2155,0.2878,0.2459,0.0633
lightgbm,Light Gradient Boosting Machine,12.532,272.9379,15.6813,-0.225,0.2852,0.2413,0.11
rf,Random Forest Regressor,12.8665,303.9109,16.8042,-0.3327,0.304,0.2499,0.2467
dummy,Dummy Regressor,14.0453,254.4339,15.826,-0.0812,0.2781,0.2557,0.0133
llar,Lasso Least Angle Regression,14.0453,254.4339,15.826,-0.0812,0.2781,0.2557,0.0167
dt,Decision Tree Regressor,14.5332,378.9724,18.9012,-0.7277,0.3626,0.2785,0.0167
knn,K Neighbors Regressor,16.6603,459.4659,21.3736,-1.0262,0.375,0.3384,0.0833
par,Passive Aggressive Regressor,17.8071,641.2787,22.4305,-1.4095,0.3668,0.3855,0.02


INFO:logs:create_model_container: 18
INFO:logs:master_model_container: 18
INFO:logs:display_container: 2
INFO:logs:PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,8.3986,111.8496,10.5759,0.6116,0.2648,0.2274


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion='mse',
                                  

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,3.7156,49.4834,7.0344,0.8103,0.1761,0.1006


INFO:logs:Initializing finalize_model()
INFO:logs:finalize_model(estimator=PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion='mse',
                                

   Series  Year  Month  Day      Label
0    1311  2022      8    4  71.102888
1    1312  2022      8    5  71.543011
2    1313  2022      8    6  71.457718
3    1314  2022      8    7  72.023040
4    1315  2022      8    8  72.058812
5    1316  2022      8    9  71.690855


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                                max_depth=None, max_features='auto',
                                max_leaf_nodes=None, max_samples=None,
                                min_impurity_decrease=0.0,
                                min_impurity_split=None, min_samples_leaf=1,
                                min_samples_split=2,
                                min_weight_fraction_leaf=0.0, n_estimators=100,
                                n_jobs=-1, oob_score=False,
                                power_transformer_method='box-cox',
                                powe...
                                regressor=ExtraTreesRegressor(bootstrap=False,
                                                              ccp_alpha=0.0,
                                                              criterion='mse',
                                  

# Humidity Forecasting

In [27]:
data = pd.read_excel('/content/DatasetDallas.xlsx')
data['datetime'] = pd.to_datetime(data['datetime'])
data.head()

# create 12 month moving average
data['MA12'] = data['humidity'].rolling(12).mean()
# plot the data and MA
import plotly.express as px
fig = px.line(data, x="datetime", y=["humidity", "MA12"], template = 'plotly_dark')
fig.show()

# create a sequence of numbers
data['Series'] = np.arange(1,len(data)+1)
# extract month and year from dates
data['Month'] = [i.month for i in data['datetime']]
data['Year'] = [i.year for i in data['datetime']]
data['Day'] = [i.day for i in data['datetime']]

# drop unnecessary columns and re-arrange
data.drop(['datetime', 'MA12'], axis=1, inplace=True)
data = data[['Series', 'Year', 'Month','Day', 'humidity']] 
# check the head of the dataset
data.head()

# split data into train-test set
train = data[data['Year'] < 2021]
test = data[data['Year'] >= 2021]
# check shape
train.shape, test.shape

# import the regression module
from pycaret.regression import *
# initialize setup
s = setup(data = train, 
          test_data = test, 
          target = 'humidity', 
          fold_strategy = 'timeseries', 
          numeric_features = ['Series', 'Year', 'Month','Day'], 
          fold = 3, 
          transform_target = True, 
          session_id = 123)

#train and evaluate
best = compare_models(sort = 'MAE')

#prediction on testing set
prediction_holdout = predict_model(best);

# generate predictions on the original dataset
predictions = predict_model(best, data=data)

#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2022-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#best model to use for prediction
final_best = finalize_model(best)

#five day prediction of Dallas
predictions_future = predict_model(final_best, data=future_df)
print(predictions_future)

#Plotting Five Day of Dallas
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2022-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["humidity", "Label"], template = 'plotly_dark')
fig.show()

#Yearly Plotting
#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2023-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#Prediction Yearly
predictions_future = predict_model(final_best, data=future_df)
predictions_future

#Yearly Visualization
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2023-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["humidity", "Label"], template = 'plotly_dark')
fig.show()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
llar,Lasso Least Angle Regression,11.1619,178.847,13.3528,-0.0989,0.2203,0.1965,0.0167
dummy,Dummy Regressor,11.1619,178.847,13.3528,-0.0989,0.2203,0.1965,0.0167
lasso,Lasso Regression,11.4296,189.6173,13.7192,-0.1549,0.2227,0.1942,0.02
ada,AdaBoost Regressor,11.497,192.4419,13.8534,-0.1879,0.226,0.1989,0.0433
omp,Orthogonal Matching Pursuit,11.5532,194.3698,13.8959,-0.1882,0.2247,0.1953,0.02
en,Elastic Net,11.5819,194.2268,13.8936,-0.1899,0.2253,0.1969,0.02
rf,Random Forest Regressor,11.5916,194.6184,13.9431,-0.2142,0.2311,0.2086,0.2533
br,Bayesian Ridge,11.6252,195.0977,13.9298,-0.1968,0.2253,0.1969,0.0167
et,Extra Trees Regressor,11.6619,197.8501,14.0564,-0.2346,0.2326,0.2105,0.18
lightgbm,Light Gradient Boosting Machine,12.0093,208.7833,14.4383,-0.289,0.2379,0.2107,0.03


INFO:logs:create_model_container: 18
INFO:logs:master_model_container: 18
INFO:logs:display_container: 2
INFO:logs:PowerTransformedTargetRegressor(alpha=1.0, copy_X=True,
                                eps=2.220446049250313e-16, fit_intercept=True,
                                fit_path=True, jitter=None, max_iter=500,
                                normalize=True, positive=False,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                precompute='auto', random_state=123,
                                regressor=LassoLars(alpha=1.0, copy_X=True,
                                                    eps=2.220446049250313e-16,
                                                    fit_intercept=True,
                                                    fit_path=True, jitter=None,
                                                    max_iter=500,
                                  

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Least Angle Regression,11.2651,196.6158,14.022,-0.0339,0.2521,0.2213


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(alpha=1.0, copy_X=True,
                                eps=2.220446049250313e-16, fit_intercept=True,
                                fit_path=True, jitter=None, max_iter=500,
                                normalize=True, positive=False,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                precompute='auto', random_state=123,
                                regressor=LassoLars(alpha=1.0, copy_X=True,
                                                    eps=2.220446049250313e-16,
                                                    fit_intercept=True,
                                                    fit_path=True, jitter=None,
                                                    max_iter=500,
                                                    normalize=True,
        

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Least Angle Regression,11.0253,183.9152,13.5615,-0.0037,0.2307,0.1998


INFO:logs:Initializing finalize_model()
INFO:logs:finalize_model(estimator=PowerTransformedTargetRegressor(alpha=1.0, copy_X=True,
                                eps=2.220446049250313e-16, fit_intercept=True,
                                fit_path=True, jitter=None, max_iter=500,
                                normalize=True, positive=False,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                precompute='auto', random_state=123,
                                regressor=LassoLars(alpha=1.0, copy_X=True,
                                                    eps=2.220446049250313e-16,
                                                    fit_intercept=True,
                                                    fit_path=True, jitter=None,
                                                    max_iter=500,
                                                    normalize=True,
      

   Series  Year  Month  Day      Label
0    1311  2022      8    4  62.478963
1    1312  2022      8    5  62.478963
2    1313  2022      8    6  62.478963
3    1314  2022      8    7  62.478963
4    1315  2022      8    8  62.478963
5    1316  2022      8    9  62.478963


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(alpha=1.0, copy_X=True,
                                eps=2.220446049250313e-16, fit_intercept=True,
                                fit_path=True, jitter=None, max_iter=500,
                                normalize=True, positive=False,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                precompute='auto', random_state=123,
                                regressor=LassoLars(alpha=1.0, copy_X=True,
                                                    eps=2.220446049250313e-16,
                                                    fit_intercept=True,
                                                    fit_path=True, jitter=None,
                                                    max_iter=500,
                                                    normalize=True,
        

# Precip Forecasting
- in notebook: Weather_Forescasting_Dallas_using_Maching_Learning_Precip.ipynb

In [29]:
data = pd.read_excel('/content/DatasetDallas.xlsx')
data['datetime'] = pd.to_datetime(data['datetime'])
data.head()

# create 12 month moving average
data['MA12'] = data['precip'].rolling(12).mean()
# plot the data and MA
import plotly.express as px
fig = px.line(data, x="datetime", y=["precip", "MA12"], template = 'plotly_dark')
fig.show()

# create a sequence of numbers
data['Series'] = np.arange(1,len(data)+1)

# extract month and year from dates
data['Month'] = [i.month for i in data['datetime']]
data['Year'] = [i.year for i in data['datetime']]
data['Day'] = [i.day for i in data['datetime']]

# drop unnecessary columns and re-arrange
data.drop(['datetime', 'MA12'], axis=1, inplace=True)
data = data[['Series', 'Year', 'Month','Day', 'precip']] 
# check the head of the dataset
data.head()

# split data into train-test set
train = data[data['Year'] < 2021]
test = data[data['Year'] >= 2021]
# check shape
train.shape, test.shape

# import the regression module
from pycaret.regression import *
# initialize setup
s = setup(data = train, 
          test_data = test, 
          target = 'precip', 
          fold_strategy = 'timeseries', 
          numeric_features = ['Series', 'Year', 'Month','Day'], 
          fold = 3,  
          session_id = 123)

#train and evaluate
best = compare_models(sort = 'MAE')

#prediction on testing set
prediction_holdout = predict_model(best)

# generate predictions on the original dataset
predictions = predict_model(best, data=data)

#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2022-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#best model to use for prediction
final_best = finalize_model(best)

#five day prediction of Dallas
predictions_future = predict_model(final_best, data=future_df)
print(predictions_future)

#Plotting Five Day of Dallas
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2022-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["precip", "Label"], template = 'plotly_dark')
fig.show()

#Yearly Plotting
#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2023-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#Prediction Yearly
predictions_future = predict_model(final_best, data=future_df)
predictions_future

#Yearly Visualization
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2023-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["precip", "Label"], template = 'plotly_dark')
fig.show()



Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
huber,Huber Regressor,0.1372,0.1872,0.42,-0.1144,0.2494,0.9848,0.0333
knn,K Neighbors Regressor,0.1971,0.2093,0.4437,-0.2505,0.27,2.4718,0.08
dummy,Dummy Regressor,0.2301,0.1694,0.4011,-0.0267,0.2403,2.5849,0.0167
llar,Lasso Least Angle Regression,0.2301,0.1694,0.4011,-0.0267,0.2403,2.5849,0.0167
dt,Decision Tree Regressor,0.2533,0.241,0.4897,-0.9113,0.3053,3.349,0.0167
omp,Orthogonal Matching Pursuit,0.2744,0.177,0.413,-0.1157,0.2632,3.1516,0.0133
ridge,Ridge Regression,0.2833,0.1861,0.4255,-0.2105,0.2744,3.2491,0.0167
et,Extra Trees Regressor,0.2874,0.2213,0.4686,-0.6634,0.2965,4.0656,0.4567
lasso,Lasso Regression,0.3084,0.183,0.4196,-0.1456,0.274,4.545,0.0133
par,Passive Aggressive Regressor,0.3098,0.1951,0.4287,-0.1774,0.2792,4.7064,0.0133


INFO:logs:create_model_container: 18
INFO:logs:master_model_container: 18
INFO:logs:display_container: 2
INFO:logs:HuberRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True, max_iter=100,
               tol=1e-05, warm_start=False)
INFO:logs:compare_models() succesfully completed......................................
INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=HuberRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True, max_iter=100,
               tol=1e-05, warm_start=False), probability_threshold=None, encoded_labels=True, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.REGRESSION, display=None, drift_kwargs=None)
INFO:logs:Checking exceptions
INFO:logs:Preloading libraries
INFO:logs:Preparing display monitor


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Huber Regressor,0.0774,0.0703,0.2651,-0.0932,0.1737,1.0002


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=HuberRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True, max_iter=100,
               tol=1e-05, warm_start=False), probability_threshold=None, encoded_labels=True, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.REGRESSION, display=None, drift_kwargs=None)
INFO:logs:Checking exceptions
INFO:logs:Preloading libraries
INFO:logs:Preparing display monitor


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Huber Regressor,0.1107,0.1374,0.3707,-0.0979,0.223,1.0


INFO:logs:Initializing finalize_model()
INFO:logs:finalize_model(estimator=HuberRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True, max_iter=100,
               tol=1e-05, warm_start=False), fit_kwargs=None, groups=None, model_only=True, display=None, experiment_custom_tags=None, return_train_score=False)
INFO:logs:Finalizing HuberRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True, max_iter=100,
               tol=1e-05, warm_start=False)
INFO:logs:Initializing create_model()
INFO:logs:create_model(estimator=HuberRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True, max_iter=100,
               tol=1e-05, warm_start=False), fold=None, round=4, cross_validation=True, predict=True, fit_kwargs={}, groups=None, refit=True, verbose=False, system=False, metrics=None, experiment_custom_tags=None, add_to_model_list=False, probability_threshold=None, display=None, return_train_score=False, kwargs={})
INFO:logs:Checking exceptions
INFO:logs:Importing libraries
INFO:logs:Copying t

   Series  Year  Month  Day     Label
0    1311  2022      8    4 -0.000011
1    1312  2022      8    5 -0.000012
2    1313  2022      8    6 -0.000014
3    1314  2022      8    7 -0.000015
4    1315  2022      8    8 -0.000016
5    1316  2022      8    9 -0.000017


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=HuberRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True, max_iter=100,
               tol=1e-05, warm_start=False), probability_threshold=None, encoded_labels=True, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.REGRESSION, display=None, drift_kwargs=None)
INFO:logs:Checking exceptions
INFO:logs:Preloading libraries
INFO:logs:Preparing display monitor


# Snow Forecasting

In [28]:
data = pd.read_excel('/content/DatasetDallas.xlsx')
data['datetime'] = pd.to_datetime(data['datetime'])
data.head()

# create 12 month moving average
data['MA12'] = data['snow'].rolling(12).mean()
# plot the data and MA
import plotly.express as px
fig = px.line(data, x="datetime", y=["snow", "MA12"], template = 'plotly_dark')
fig.show()

# extract month and year from dates
data['Month'] = [i.month for i in data['datetime']]
data['Year'] = [i.year for i in data['datetime']]
data['Day'] = [i.day for i in data['datetime']]
# create a sequence of numbers
data['Series'] = np.arange(1,len(data)+1)
# drop unnecessary columns and re-arrange
data.drop(['datetime', 'MA12'], axis=1, inplace=True)
data = data[['Series', 'Year', 'Month','Day', 'snow']] 
# check the head of the dataset
data.head()

# split data into train-test set
train = data[data['Year'] < 2021]
test = data[data['Year'] >= 2021]
# check shape
train.shape, test.shape

# import the regression module
from pycaret.regression import *
# initialize setup
s = setup(data = train, 
          test_data = test, 
          target = 'snow', 
          fold_strategy = 'timeseries', 
          numeric_features = ['Series', 'Year', 'Month','Day'], 
          fold = 3, 
          #transform_target = True, 
          session_id = 123)

#train and evaluate
best = compare_models(sort = 'MAE')

#prediction on testing set
prediction_holdout = predict_model(best);

# generate predictions on the original dataset
predictions = predict_model(best, data=data)

#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2022-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#best model to use for prediction
final_best = finalize_model(best)

#five day prediction of Dallas
predictions_future = predict_model(final_best, data=future_df)
print(predictions_future)

#Plotting Five Day of Dallas
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2022-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["snow", "Label"], template = 'plotly_dark')
fig.show()

#Yearly Plotting
#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2023-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#Prediction Yearly
predictions_future = predict_model(final_best, data=future_df)
predictions_future

#Yearly Visualization
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2023-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["snow", "Label"], template = 'plotly_dark')
fig.show()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
lr,Linear Regression,0.0,0.0,0.0,1.0,0.0,,0.0133
br,Bayesian Ridge,0.0,0.0,0.0,1.0,0.0,,0.0167
lasso,Lasso Regression,0.0,0.0,0.0,1.0,0.0,,0.02
ridge,Ridge Regression,0.0,0.0,0.0,1.0,0.0,,0.0167
en,Elastic Net,0.0,0.0,0.0,1.0,0.0,,0.02
lar,Least Angle Regression,0.0,0.0,0.0,1.0,0.0,,0.0133
llar,Lasso Least Angle Regression,0.0,0.0,0.0,1.0,0.0,,0.0167
omp,Orthogonal Matching Pursuit,0.0,0.0,0.0,1.0,0.0,,0.0167
par,Passive Aggressive Regressor,0.0,0.0,0.0,1.0,0.0,,0.0167
gbr,Gradient Boosting Regressor,0.0,0.0,0.0,1.0,0.0,,0.0333


INFO:logs:create_model_container: 18
INFO:logs:master_model_container: 18
INFO:logs:display_container: 2
INFO:logs:LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False)
INFO:logs:compare_models() succesfully completed......................................
INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False), probability_threshold=None, encoded_labels=True, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.REGRESSION, display=None, drift_kwargs=None)
INFO:logs:Checking exceptions
INFO:logs:Preloading libraries
INFO:logs:Preparing display monitor


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Linear Regression,0.0151,0.0162,0.1271,-0.0144,0.0813,1.0


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False), probability_threshold=None, encoded_labels=True, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.REGRESSION, display=None, drift_kwargs=None)
INFO:logs:Checking exceptions
INFO:logs:Preloading libraries
INFO:logs:Preparing display monitor


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Linear Regression,0.0067,0.0071,0.0845,-0.0063,0.0541,1.0


INFO:logs:Initializing finalize_model()
INFO:logs:finalize_model(estimator=LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False), fit_kwargs=None, groups=None, model_only=True, display=None, experiment_custom_tags=None, return_train_score=False)
INFO:logs:Finalizing LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False)
INFO:logs:Initializing create_model()
INFO:logs:create_model(estimator=LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False), fold=None, round=4, cross_validation=True, predict=True, fit_kwargs={}, groups=None, refit=True, verbose=False, system=False, metrics=None, experiment_custom_tags=None, add_to_model_list=False, probability_threshold=None, display=None, return_train_score=False, kwargs={})
INFO:logs:Checking exceptions
INFO:logs:Importing libraries
INFO:logs:Copying training dataset
INFO:logs:Defining folds
INFO:logs:Declaring metric variables
INFO:logs:Importing untrained model
INFO:logs:Decla

   Series  Year  Month  Day  Label
0    1311  2022      8    4    0.0
1    1312  2022      8    5    0.0
2    1313  2022      8    6    0.0
3    1314  2022      8    7    0.0
4    1315  2022      8    8    0.0
5    1316  2022      8    9    0.0


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False), probability_threshold=None, encoded_labels=True, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.REGRESSION, display=None, drift_kwargs=None)
INFO:logs:Checking exceptions
INFO:logs:Preloading libraries
INFO:logs:Preparing display monitor


# Visibility Forecasting

In [30]:
data = pd.read_excel('/content/DatasetDallas.xlsx')
data['datetime'] = pd.to_datetime(data['datetime'])
data.head()

# create 12 month moving average
data['MA12'] = data['visibility'].rolling(12).mean()
# plot the data and MA
import plotly.express as px
fig = px.line(data, x="datetime", y=["visibility", "MA12"], template = 'plotly_dark')
fig.show()

# create a sequence of numbers
data['Series'] = np.arange(1,len(data)+1)
# extract month and year from dates
data['Month'] = [i.month for i in data['datetime']]
data['Year'] = [i.year for i in data['datetime']]
data['Day'] = [i.day for i in data['datetime']]

# drop unnecessary columns and re-arrange
data.drop(['datetime', 'MA12'], axis=1, inplace=True)
data = data[['Series', 'Year', 'Month','Day', 'visibility']] 
# check the head of the dataset
data.head()

# split data into train-test set
train = data[data['Year'] < 2021]
test = data[data['Year'] >= 2021]
# check shape
train.shape, test.shape

# import the regression module
from pycaret.regression import *
# initialize setup
s = setup(data = train, 
          test_data = test, 
          target = 'visibility', 
          fold_strategy = 'timeseries', 
          numeric_features = ['Series', 'Year', 'Month','Day'], 
          fold = 3, 
          transform_target = True, 
          session_id = 123)

#train and evaluate
best = compare_models(sort = 'MAE')

#prediction on testing set
prediction_holdout = predict_model(best);

# generate predictions on the original dataset
predictions = predict_model(best, data=data)

#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2022-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#best model to use for prediction
final_best = finalize_model(best)

#five day prediction of Dallas
predictions_future = predict_model(final_best, data=future_df)
print(predictions_future)

#Plotting Five Day of Dallas
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2022-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["visibility", "Label"], template = 'plotly_dark')
fig.show()

#Yearly Plotting
#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2023-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#Prediction Yearly
predictions_future = predict_model(final_best, data=future_df)
predictions_future

#Yearly Visualization
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2023-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["visibility", "Label"], template = 'plotly_dark')
fig.show()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
huber,Huber Regressor,0.4749,1.2125,1.0871,-0.1725,0.1306,0.0723,0.0233
omp,Orthogonal Matching Pursuit,0.484,1.1672,1.0689,-0.1355,0.129,0.0728,0.0167
br,Bayesian Ridge,0.4871,1.162,1.0664,-0.1299,0.1288,0.073,0.02
lasso,Lasso Regression,0.4917,1.1503,1.0603,-0.1161,0.1282,0.0734,0.68
en,Elastic Net,0.4931,1.1565,1.0632,-0.1222,0.1285,0.0736,0.67
knn,K Neighbors Regressor,0.4974,1.1715,1.0685,-0.1322,0.1289,0.074,0.0833
ada,AdaBoost Regressor,0.5017,1.0507,1.0135,-0.02,0.124,0.0733,0.0333
lightgbm,Light Gradient Boosting Machine,0.5084,1.0496,1.0143,-0.0239,0.1239,0.0738,0.1
rf,Random Forest Regressor,0.5286,1.0989,1.0383,-0.0733,0.1261,0.0763,0.25
par,Passive Aggressive Regressor,0.5334,1.3191,1.1346,-0.2773,0.1345,0.079,0.02


INFO:logs:create_model_container: 15
INFO:logs:master_model_container: 15
INFO:logs:display_container: 2
INFO:logs:PowerTransformedTargetRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True,
                                max_iter=100,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                regressor=HuberRegressor(alpha=0.0001,
                                                         epsilon=1.35,
                                                         fit_intercept=True,
                                                         max_iter=100,
                                                         tol=1e-05,
                                                         warm_start=False),
                                tol=1e-05, warm_start=False)
INFO:logs:compare_models() succesfully completed......................................
INFO:logs:Initializing predict_model()
IN

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Huber Regressor,0.4026,0.8689,0.9322,-0.0806,0.1181,0.063


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True,
                                max_iter=100,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                regressor=HuberRegressor(alpha=0.0001,
                                                         epsilon=1.35,
                                                         fit_intercept=True,
                                                         max_iter=100,
                                                         tol=1e-05,
                                                         warm_start=False),
                                tol=1e-05, warm_start=False), probability_threshold=None, encoded_labels=True, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.REGRESSION, display=None, drift_kwargs=

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Huber Regressor,0.4721,1.1436,1.0694,-0.1147,0.1303,0.0727


INFO:logs:Initializing finalize_model()
INFO:logs:finalize_model(estimator=PowerTransformedTargetRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True,
                                max_iter=100,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                regressor=HuberRegressor(alpha=0.0001,
                                                         epsilon=1.35,
                                                         fit_intercept=True,
                                                         max_iter=100,
                                                         tol=1e-05,
                                                         warm_start=False),
                                tol=1e-05, warm_start=False), fit_kwargs=None, groups=None, model_only=True, display=None, experiment_custom_tags=None, return_train_score=False)
INFO:logs:Finalizing PowerTransformedTargetRegresso

   Series  Year  Month  Day     Label
0    1311  2022      8    4  9.804932
1    1312  2022      8    5  9.804246
2    1313  2022      8    6  9.803561
3    1314  2022      8    7  9.802874
4    1315  2022      8    8  9.802187
5    1316  2022      8    9  9.801500


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True,
                                max_iter=100,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                regressor=HuberRegressor(alpha=0.0001,
                                                         epsilon=1.35,
                                                         fit_intercept=True,
                                                         max_iter=100,
                                                         tol=1e-05,
                                                         warm_start=False),
                                tol=1e-05, warm_start=False), probability_threshold=None, encoded_labels=True, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.REGRESSION, display=None, drift_kwargs=

# Windspeed Forecasting

In [31]:
data = pd.read_excel('/content/DatasetDallas.xlsx')
data['datetime'] = pd.to_datetime(data['datetime'])
data.head()

# create 12 month moving average
data['MA12'] = data['windspeed'].rolling(12).mean()
# plot the data and MA
import plotly.express as px
fig = px.line(data, x="datetime", y=["windspeed", "MA12"], template = 'plotly_dark')
fig.show()

# create a sequence of numbers
data['Series'] = np.arange(1,len(data)+1)

# extract month and year from dates
data['Month'] = [i.month for i in data['datetime']]
data['Year'] = [i.year for i in data['datetime']]
data['Day'] = [i.day for i in data['datetime']]

# drop unnecessary columns and re-arrange
data.drop(['datetime', 'MA12'], axis=1, inplace=True)
data = data[['Series', 'Year', 'Month','Day', 'windspeed']] 
# check the head of the dataset
data.head()

# split data into train-test set
train = data[data['Year'] < 2021]
test = data[data['Year'] >= 2021]
# check shape
train.shape, test.shape

# import the regression module
from pycaret.regression import *
# initialize setup
s = setup(data = train, 
          test_data = test, 
          target = 'windspeed', 
          fold_strategy = 'timeseries', 
          numeric_features = ['Series', 'Year', 'Month','Day'], 
          fold = 3, 
          transform_target = True)

#train and evaluate
best = compare_models(sort = 'MAE')

#prediction on testing set
prediction_holdout = predict_model(best);

# generate predictions on the original dataset
predictions = predict_model(best, data=data)

#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2022-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#best model to use for prediction
final_best = finalize_model(best)

#five day prediction of Dallas
predictions_future = predict_model(final_best, data=future_df)
print(predictions_future)

#Plotting Five Day of Dallas
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2022-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["windspeed", "Label"], template = 'plotly_dark')
fig.show()

#Yearly Plotting
#Future dates
future_dates = pd.date_range(start = '2022-08-04', end = '2023-08-09')
future_df = pd.DataFrame()
future_df['Series'] = np.arange(1311,(1311+len(future_dates)))
future_df['Year'] = [i.year for i in future_dates]
future_df['Month'] = [i.month for i in future_dates]
future_df['Day'] = [i.day for i in future_dates]      
future_df.head()

#Prediction Yearly
predictions_future = predict_model(final_best, data=future_df)
predictions_future

#Yearly Visualization
concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2019-01-01', end = '2023-08-09')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["windspeed", "Label"], template = 'plotly_dark')
fig.show()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
ridge,Ridge Regression,3.4635,18.7175,4.3234,-0.0464,0.2963,0.2816,0.02
omp,Orthogonal Matching Pursuit,3.6085,19.6989,4.4318,-0.1019,0.3069,0.3055,0.0167
llar,Lasso Least Angle Regression,3.6442,19.9628,4.4634,-0.1181,0.3092,0.309,0.6667
dummy,Dummy Regressor,3.6442,19.9628,4.4634,-0.1181,0.3092,0.309,0.0133
ada,AdaBoost Regressor,3.6555,21.2858,4.5834,-0.1753,0.3097,0.2853,0.0467
et,Extra Trees Regressor,3.6874,21.1443,4.5924,-0.1886,0.3144,0.3052,0.1867
huber,Huber Regressor,3.6949,20.8468,4.5494,-0.1564,0.3134,0.3138,0.0267
en,Elastic Net,3.7163,22.1555,4.6609,-0.2189,0.3141,0.284,0.0167
rf,Random Forest Regressor,3.7247,21.5023,4.6339,-0.2029,0.3153,0.3041,0.2467
gbr,Gradient Boosting Regressor,3.7256,21.8683,4.6693,-0.2199,0.3179,0.3034,0.05


INFO:logs:create_model_container: 16
INFO:logs:master_model_container: 16
INFO:logs:display_container: 2
INFO:logs:PowerTransformedTargetRegressor(alpha=1.0, copy_X=True, fit_intercept=True,
                                max_iter=None, normalize=False,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                random_state=1268,
                                regressor=Ridge(alpha=1.0, copy_X=True,
                                                fit_intercept=True,
                                                max_iter=None, normalize=False,
                                                random_state=1268,
                                                solver='auto', tol=0.001),
                                solver='auto', tol=0.001)
INFO:logs:compare_models() succesfully completed......................................
INFO:logs:Initializing predict_model()
INFO:logs:pr

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,3.796,25.241199,5.0241,-0.0929,0.3057,0.2569


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(alpha=1.0, copy_X=True, fit_intercept=True,
                                max_iter=None, normalize=False,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                random_state=1268,
                                regressor=Ridge(alpha=1.0, copy_X=True,
                                                fit_intercept=True,
                                                max_iter=None, normalize=False,
                                                random_state=1268,
                                                solver='auto', tol=0.001),
                                solver='auto', tol=0.001), probability_threshold=None, encoded_labels=True, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.REGRESSION, display=None, drift_kwargs=None)
INFO

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,3.5983,21.6927,4.6575,-0.0196,0.2929,0.2579


INFO:logs:Initializing finalize_model()
INFO:logs:finalize_model(estimator=PowerTransformedTargetRegressor(alpha=1.0, copy_X=True, fit_intercept=True,
                                max_iter=None, normalize=False,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                random_state=1268,
                                regressor=Ridge(alpha=1.0, copy_X=True,
                                                fit_intercept=True,
                                                max_iter=None, normalize=False,
                                                random_state=1268,
                                                solver='auto', tol=0.001),
                                solver='auto', tol=0.001), fit_kwargs=None, groups=None, model_only=True, display=None, experiment_custom_tags=None, return_train_score=False)
INFO:logs:Finalizing PowerTransformedTargetRegressor(alpha=1.

   Series  Year  Month  Day      Label
0    1311  2022      8    4  11.674988
1    1312  2022      8    5  11.716306
2    1313  2022      8    6  11.757709
3    1314  2022      8    7  11.799233
4    1315  2022      8    8  11.840842
5    1316  2022      8    9  11.882548


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=PowerTransformedTargetRegressor(alpha=1.0, copy_X=True, fit_intercept=True,
                                max_iter=None, normalize=False,
                                power_transformer_method='box-cox',
                                power_transformer_standardize=True,
                                random_state=1268,
                                regressor=Ridge(alpha=1.0, copy_X=True,
                                                fit_intercept=True,
                                                max_iter=None, normalize=False,
                                                random_state=1268,
                                                solver='auto', tol=0.001),
                                solver='auto', tol=0.001), probability_threshold=None, encoded_labels=True, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.REGRESSION, display=None, drift_kwargs=None)
INFO