# Microsoft Stock Price Prediction with Online/Incremental Learning


Ekin Bozyel
<br>

* Phone: 0534 328 27 56
* Mail address : ekinbozyel@gmail.com
* Linkedin account: (www.linkedin.com/in/ekin-bozyel-453934269)
* Github account : (https://github.com/john-fante)
* Kaggle account : (https://www.kaggle.com/banddaniel)
* Stack Overflow account :  (https://stackoverflow.com/users/22880135)

<hr>

I developed a solution using the steps below

* Data finding (dataset -> https://www.kaggle.com/datasets/benjaminpo/s-and-p-500-with-dividends-and-splits-daily-updated), data reading, preprocessing
* In addition to the Microsoft stock, I used the same domain 2 stock prices (Oracle and IBM) (I only used daily closing price and volume data), first I added Time Series Features(weekly, daily, quarterly), then I added RSI and MACD Features for other shares except Microsoft.
* I used tuned LinearRegression model from river library(online learning package).
* Evaluation metrics -> MAE and RMSE (the result MAE: 0.00301 , RMSE: 0.00301) (By looking at both MAE and RMSE together, you can get a better idea of your model's performance in terms of both average error (MAE) and the severity of large errors (RMSE).)
* Hyperparameter tuned with optuna.

## References 
* https://en.wikipedia.org/wiki/Online_machine_learning
* https://riverml.xyz/dev/api/overview/
* https://optuna.org

In [3]:
# Installing river package for online learning
!pip install river -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.9/89.9 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m57.3 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.1/13.1 MB[0m [31m77.8 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.6/40.6 MB[0m [31m38.0 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gensim 4.3.3 requires scipy<1.14.0,>=1.7.0, but you have scipy 1.15.1 which is incompatible.
google-colab 1.0.0 requires notebook==6.5.5, but you have notebook 6.5.4 which

In [4]:
# Importing dependencies

import warnings
warnings.filterwarnings('ignore')
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import itertools
import plotly.express as px

from river import compose, optim, metrics, preprocessing
from river.stream import iter_pandas
from river.linear_model import LinearRegression
from river.utils import Rolling

import optuna

# 1) Reading and Preparing Raw Data

In [5]:
# Creating raw df with dropping some features, I only use Close, Date and Volume features

def create_raw_df(main_file_path):
    data = pd.read_csv(main_file_path)
    data.drop(['Adj Close', 'High','Low'], axis = 1, inplace = True)
    data.rename(columns={'Open':'open','Close': 'close', 'Date': 'date', 'Volume':'volume'}, inplace=True)
    data.index = pd.to_datetime(data['date'])
    data.drop(['date'], axis = 1, inplace = True)
    return data

In [84]:
# orcl_data -> Oracle stock history up to date today
# ibm_data -> IBM stock history up to date today
# microsoft_data -> Microsoft stock history up to date today

orcl_data = create_raw_df('/kaggle/input/s-and-p-500-with-dividends-and-splits-daily-updated/ORCL.csv')
ibm_data = create_raw_df('/kaggle/input/s-and-p-500-with-dividends-and-splits-daily-updated/IBM.csv')
microsoft_data = create_raw_df('/kaggle/input/s-and-p-500-with-dividends-and-splits-daily-updated/MSFT.csv')

merged_1 = pd.merge(orcl_data, ibm_data, left_index=True, right_index=True, how='outer')
merged_1.rename(columns={'close_x': 'close_orcl','open_x': 'open_orcl', 'volume_x': 'volume_orcl', 'close_y':'close_ibm', 'open_y': 'open_ibm','volume_y':'volume_ibm'}, inplace=True)

final_merged_df = pd.merge(merged_1, microsoft_data, left_index=True, right_index=True, how='outer')
final_merged_df.rename(columns={'close': 'close_msft', 'volume':'volume_msft','open': 'open_msft'}, inplace=True)

# drop na values
final_merged_df.dropna(inplace=True)

In [85]:
# final data (IBM + Oracle + Microsoft, close prices and volumes)
final_merged_df.tail()

Unnamed: 0_level_0,close_orcl,open_orcl,volume_orcl,close_ibm,open_ibm,volume_ibm,close_msft,open_msft,volume_msft
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2025-01-27,158.279999,168.899994,42201600.0,224.130005,222.190002,4898400.0,434.559998,424.01001,35647800.0
2025-01-28,164.0,162.990005,20319300.0,225.660004,224.320007,4485400.0,447.200012,434.600006,23491700.0
2025-01-29,162.020004,164.029999,9866600.0,228.630005,225.619995,7079800.0,442.329987,446.690002,23581400.0
2025-01-30,170.380005,164.779999,14981700.0,258.269989,250.0,15381900.0,414.98999,418.769989,54586300.0
2025-01-31,170.059998,170.410004,6430145.0,255.679993,256.049988,4547481.0,415.059998,418.730011,33325644.0


# 2) Feature Engineering 


In [86]:
class FeatureEngineering:
    def __init__(self, df: pd.DataFrame):
        self.df = df.copy()


    # a function for creating time series features
    def create_time_series_features(self) -> pd.DataFrame:
        """
        Generates new time-related features from a DataFrame with a datetime index.
    
        The function extracts the following information from the datetime index:
        - Day of the week (week number)
        - Month of the year
        - Quarter of the year
    
        Parameters:
        ----------
        df : pandas.DataFrame
            A DataFrame with a datetime index. The index is expected to represent time 
            (e.g., `pd.to_datetime()`).
    
        Returns:
        -------
        pandas.DataFrame
            A DataFrame with the original data and new cyclic features:
            - `week_sin`: Sine transformation of the week (day of the week).
            - `week_cos`: Cosine transformation of the week (day of the week).
            - `month_sin`: Sine transformation of the month.
            - `month_cos`: Cosine transformation of the month.
            - `quarter_sin`: Sine transformation of the quarter.
            - `quarter_cos`: Cosine transformation of the quarter.
        """
        
        df = self.df
        df['week'] = df.index.dayofweek
        df['month'] = df.index.month
        df['quarter'] = df.index.quarter
        
        # creating cycles features
        df['week_sin'] = np.sin(df['week']*(2.*np.pi/7))
        df['week_cos'] = np.sin(df['week']*(2.*np.pi/7))
    
        df['month_sin'] = np.sin(df['month']*(2.*np.pi/12))
        df['month_cos'] = np.sin(df['month']*(2.*np.pi/12))
    
        df['quarter_sin'] = np.sin(df['quarter']*(2.*np.pi/4))
        df['quarter_cos'] = np.sin(df['quarter']*(2.*np.pi/4))
    
        df.drop(['week', 'month' ,'quarter'], axis = 1, inplace = True)
    
        return df


    # a function for calculating rsi with default args
    def create_rsi_features(self, column_name:str, window:int=14) -> pd.DataFrame:
        """
        Calculate the Relative Strength Index (RSI) for a given column in a DataFrame.
        The function computes the RSI, a momentum oscillator that measures the speed and 
        change of price movements. The RSI values are added as a new column in the input 
        DataFrame with the name `<column_name>_rsi`.
    
        Parameters:
        ----------
        df : pd.DataFrame
            A pandas DataFrame containing time series data. The DataFrame should have at least 
            one numeric column with the specified `column_name` for which RSI will be calculated.
        
        column_name : str
            The name of the column in the DataFrame for which the RSI will be calculated.
        
        window : int, default=14
            The number of periods (days) used for calculating the rolling average of gains and losses. 
            The default value is 14, which is the typical period used in most RSI calculations.
    
        Returns:
        -------
        pd.DataFrame
            The original DataFrame with an additional column named `<column_name>_rsi`, 
            which contains the calculated RSI values.
        
        """
        df = self.df
        delta = df[column_name].diff()
        
        gain = delta.where(delta > 0, 0)
        loss = -delta.where(delta < 0, 0)
        
        avg_gain = gain.rolling(window=window, min_periods=1).mean()
        avg_loss = loss.rolling(window=window, min_periods=1).mean()
        
        rs = avg_gain / avg_loss
        
        rsi = 100 - (100 / (1 + rs))    
        df[str(column_name)+'_rsi'] = rsi
        return df
    
    # a function for calculating macd with default args
    def create_macd_features(self, column_name, short_window:int=12, long_window:int=26, signal_window:int=9) -> pd.DataFrame:
        """
        Calculate the Moving Average Convergence Divergence (MACD) and related features for a given column in a DataFrame.
    
        The function computes the MACD, Signal Line, and Histogram for the specified column in the input DataFrame.
        The MACD is calculated by subtracting the long-term exponential moving average (EMA) from the short-term EMA.
        The Signal Line is the EMA of the MACD. The Histogram represents the difference between the MACD and the Signal Line.
    
        Parameters:
        ----------
        df : pd.DataFrame
            A pandas DataFrame containing time series data. The DataFrame should have at least 
            one numeric column with the specified `column_name` for which MACD will be calculated.
        
        column_name : str
            The name of the column in the DataFrame for which the MACD, Signal Line, and Histogram will be calculated.
        
        short_window : int, default=12
            The window (in days) used for calculating the short-term exponential moving average (EMA).
            The default value is 12, which is commonly used in most MACD calculations.
    
        long_window : int, default=26
            The window (in days) used for calculating the long-term exponential moving average (EMA).
            The default value is 26, which is typically used in MACD calculations.
    
        signal_window : int, default=9
            The window (in days) used for calculating the Signal Line, which is the EMA of the MACD.
            The default value is 9, which is a standard period for the Signal Line.
    
        Returns:
        -------
        pd.DataFrame
            The original DataFrame with the following new columns:
            - `<column_name>_macd`: The MACD value, which is the difference between the short-term and long-term EMAs.
            - `<column_name>_signal_line`: The Signal Line, which is the EMA of the MACD.
            - `<column_name>_histogram`: The Histogram, representing the difference between the MACD and the Signal Line.    
        """
        df = self.df
        ema_short = df[column_name].ewm(span=short_window, adjust=False).mean()
        ema_long = df[column_name].ewm(span=long_window, adjust=False).mean()    
        macd = ema_short - ema_long
        
        signal_line = macd.ewm(span=signal_window, adjust=False).mean()
        
        histogram = macd - signal_line
        
        df[str(column_name)+'_macd'] = macd
        df[str(column_name)+'_signal_line'] = signal_line
        df[str(column_name)+'_histogram'] = histogram
        return df

In [164]:
# adding feature engineering methods
feature_engineering = FeatureEngineering(df=final_merged_df)
df_added_time_series_features = feature_engineering.create_time_series_features()

df_added_rsi_feature = feature_engineering.create_rsi_features(column_name = 'close_orcl')
df_added_rsi_feature = feature_engineering.create_rsi_features(column_name = 'close_ibm')

df_added_macd_features = feature_engineering.create_macd_features(column_name = 'close_orcl')
df_added_macd_features = feature_engineering.create_macd_features(column_name = 'close_ibm')

df_added_features = df_added_macd_features.copy()

# drop nan variables
df_added_features.dropna(inplace=True)
df_added_features.tail()

Unnamed: 0_level_0,close_orcl,open_orcl,volume_orcl,close_ibm,open_ibm,volume_ibm,close_msft,open_msft,volume_msft,week_sin,...,quarter_sin,quarter_cos,close_orcl_rsi,close_ibm_rsi,close_orcl_macd,close_orcl_signal_line,close_orcl_histogram,close_ibm_macd,close_ibm_signal_line,close_ibm_histogram
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2025-01-27,158.279999,168.899994,42201600.0,224.130005,222.190002,4898400.0,434.559998,424.01001,35647800.0,0.0,...,1.0,1.0,44.665601,53.445089,-0.030118,-1.929834,1.899716,0.166577,-0.298608,0.465185
2025-01-28,164.0,162.990005,20319300.0,225.660004,224.320007,4485400.0,447.200012,434.600006,23491700.0,0.781831,...,1.0,1.0,48.949657,56.502838,-0.470843,-1.638036,1.167193,0.338213,-0.171244,0.509456
2025-01-29,162.020004,164.029999,9866600.0,228.630005,225.619995,7079800.0,442.329987,446.690002,23581400.0,0.974928,...,1.0,1.0,49.993656,59.464934,-0.968723,-1.504173,0.53545,0.705753,0.004156,0.701598
2025-01-30,170.380005,164.779999,14981700.0,258.269989,250.0,15381900.0,414.98999,418.769989,54586300.0,0.433884,...,1.0,1.0,54.208327,82.776029,-0.680865,-1.339512,0.658646,3.350113,0.673347,2.676766
2025-01-31,170.059998,170.410004,6430145.0,255.679993,256.049988,4547481.0,415.059998,418.730011,33325644.0,-0.433884,...,1.0,1.0,60.012867,84.095665,-0.473105,-1.16623,0.693125,5.177118,1.574101,3.603016


In [18]:
# final dateframe for training
df_added_features.describe()

Unnamed: 0,close_orcl,open_orcl,volume_orcl,close_ibm,open_ibm,volume_ibm,close_msft,open_msft,volume_msft,week_sin,...,quarter_sin,quarter_cos,close_orcl_rsi,close_ibm_rsi,close_orcl_macd,close_orcl_signal_line,close_orcl_histogram,close_ibm_macd,close_ibm_signal_line,close_ibm_histogram
count,9798.0,9798.0,9798.0,9798.0,9798.0,9798.0,9798.0,9798.0,9798.0,9798.0,...,9798.0,9798.0,9798.0,9798.0,9798.0,9798.0,9798.0,9798.0,9798.0,9798.0
mean,27.412283,27.407689,35164270.0,94.487395,94.450009,7189836.0,63.158106,63.14458,56212450.0,0.360677,...,-0.01102266,-0.01102266,52.967336,51.660787,0.120861,0.121337,-0.000476,0.134637,0.133994,0.000643
std,31.944608,31.952379,32638400.0,55.459996,55.423926,4542361.0,98.643352,98.627868,36827960.0,0.515456,...,0.7038013,0.7038013,16.0795,17.248504,0.864115,0.80755,0.274062,1.758767,1.643244,0.560121
min,0.041667,0.041667,388800.0,9.799235,9.799235,682829.0,0.090278,0.090278,2304000.0,-0.433884,...,-1.0,-1.0,5.002807,0.0,-5.686894,-4.931338,-2.654133,-11.371106,-9.583369,-3.904638
25%,3.203704,3.222222,14939280.0,32.952915,32.982792,4328400.0,5.882813,5.900391,31414100.0,0.0,...,-1.0,-1.0,41.612223,38.920324,-0.061694,-0.056576,-0.052168,-0.63672,-0.61244,-0.233865
50%,17.095,17.110001,29469250.0,97.390057,97.117588,6031446.0,27.49,27.43625,49462350.0,0.433884,...,-2.449294e-16,-2.449294e-16,53.168957,51.664749,0.016682,0.017158,1e-05,0.105677,0.096585,0.003031
75%,40.098436,40.09,45272450.0,136.852299,136.711288,8645740.0,47.59,47.590313,70271600.0,0.781831,...,1.224647e-16,1.224647e-16,64.701645,64.409734,0.269632,0.265975,0.049461,0.972415,0.94854,0.228896
max,192.429993,196.300003,1030963000.0,258.269989,256.049988,72639160.0,467.559998,467.0,788688000.0,0.974928,...,1.0,1.0,100.0,99.22648,8.117906,7.194285,3.520009,7.238391,6.813125,3.603016


In [88]:
# final training data info with columns
df_added_features.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 9798 entries, 1986-03-14 to 2025-01-31
Data columns (total 23 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   close_orcl              9798 non-null   float64
 1   open_orcl               9798 non-null   float64
 2   volume_orcl             9798 non-null   float64
 3   close_ibm               9798 non-null   float64
 4   open_ibm                9798 non-null   float64
 5   volume_ibm              9798 non-null   float64
 6   close_msft              9798 non-null   float64
 7   open_msft               9798 non-null   float64
 8   volume_msft             9798 non-null   float64
 9   week_sin                9798 non-null   float64
 10  week_cos                9798 non-null   float64
 11  month_sin               9798 non-null   float64
 12  month_cos               9798 non-null   float64
 13  quarter_sin             9798 non-null   float64
 14  quarter_cos           

# 3) LinearRegression Model and Online/Incremental Training Pipeline

In [165]:
def create_online_learning_pipeline(df:pd.DataFrame, l2_val:float, window_size:int=1, learning_rate:float=0.03, intercept_lr_val:float=0.1):
    
    df['close_msft_shifted'] = df['close_msft'].shift(-1)
    df.drop(['close_msft'], axis=1, inplace=True)
    df.dropna(inplace=True)
    data = df.copy()
    print(data.shape)
    # creating stream dataset
    # for prediction  the close price of the MSFT stock
    y = data.pop('close_msft_shifted')
    X_y_stream_dataset = iter_pandas(data, y)
    
    
    # --------------------  ONLINE/INCREMENTAL LEARNING MODEL  --------------------

    # creating Online/Incremental model with the River library
    # added columns for training 
    selected_columns = set(list(data.columns))
    print("columns for training")
    print(selected_columns)
    print('\n')
    model = compose.Select(*selected_columns)
    
    # scaling (Actually, you need to use another scaler because there are negative values in the data, but it worked in my tests.)
    model |= preprocessing.MinMaxScaler()
    
    #final model
    model |= LinearRegression(l2=l2_val, intercept_lr = intercept_lr_val, optimizer=optim.Adam(learning_rate))

    
    # --------------------  TRAINING  --------------------
    # train metrics (rolling metrics for online learning)
    mae_metric = Rolling(metrics.MAE() , window_size = window_size)
    rmse_metric = Rolling(metrics.RMSE() , window_size = window_size)

    dates = data.index
    y_trues = []
    y_preds = []
    
    # training loop
    for x, y in X_y_stream_dataset:

        y_pred = model.predict_one(x)
        if y_pred < 0: y_pred = 0  # for minus value validation
        #learn only one sample
        model.learn_one(x, y)
        mae_metric.update(y, y_pred)
        rmse_metric.update(y, y_pred)
    
        y_trues.append(y)
        y_preds.append(y_pred)
    
    # final dataframe with predictions
    final_df = pd.DataFrame({'time': data.index,'true': y_trues, 'prediction': y_preds}, index = data.index)
    print(str(mae_metric) + ' , ' + str(rmse_metric))
    return final_df, mae_metric, rmse_metric

# 4) Prediction Plotting (day by day prediction)

In [166]:
# a function for plotting predictions and ground truths
# data -> data frame
# plot_title -> title of the graph
def plot_predictions(data, plot_title):
    fig = px.line(data, x='time', y=['prediction','true'])
    
    fig.update_layout(xaxis_range=['2024-01-01','2025-01-30'], title_text= plot_title +" - after January 2024")
    
    fig.update_xaxes(rangeslider_visible=True)
    fig.show()

# 5) Training

In [167]:
# training, the best params with optuna
preds, mae, rmse = create_online_learning_pipeline(df_added_features,
                                                   l2_val=0.14297374376783614,
                                                   window_size=1,
                                                   learning_rate=0.25435208069550486, 
                                                   intercept_lr_val=0.4975315164402574)

(9797, 23)
columns for training
{'volume_msft', 'open_msft', 'volume_orcl', 'close_orcl_macd', 'open_orcl', 'close_ibm', 'close_ibm_signal_line', 'close_ibm_histogram', 'quarter_cos', 'volume_ibm', 'week_sin', 'close_ibm_rsi', 'close_orcl_histogram', 'month_cos', 'week_cos', 'quarter_sin', 'open_ibm', 'close_orcl_rsi', 'close_orcl', 'month_sin', 'close_ibm_macd', 'close_orcl_signal_line'}


MAE: 0.000713 , RMSE: 0.000713


# 6) Prediction and Results

In [168]:
# predictions plot
plot_predictions(preds, 'the window size is 1 (daily)')

In [169]:
# random 10 predictions
# each prediction is estimated according to the data up to itself
preds.sample(15, random_state = 999).sort_values(by='date')

Unnamed: 0_level_0,time,true,prediction
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1988-10-10,1988-10-10,0.345486,0.33972
1989-05-22,1989-05-22,0.393229,0.546137
1994-02-16,1994-02-16,2.457031,2.252445
1995-06-05,1995-06-05,5.195313,7.135154
1995-06-15,1995-06-15,5.4375,4.527081
1996-03-04,1996-03-04,6.132813,6.82008
2006-02-23,2006-02-23,26.629999,26.609608
2011-02-16,2011-02-16,27.209999,27.331648
2011-08-18,2011-08-18,24.049999,24.74903
2012-03-16,2012-03-16,32.200001,32.116457


In [170]:
# the last 10 day predictions
# each prediction is estimated according to the data up to itself

preds[-10:]

Unnamed: 0_level_0,time,true,prediction
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2025-01-16,2025-01-16,429.029999,425.610967
2025-01-17,2025-01-17,428.5,430.840192
2025-01-21,2025-01-21,446.200012,427.765458
2025-01-22,2025-01-22,446.709991,448.707456
2025-01-23,2025-01-23,444.059998,449.079851
2025-01-24,2025-01-24,434.559998,445.57059
2025-01-27,2025-01-27,447.200012,432.527996
2025-01-28,2025-01-28,442.329987,447.715415
2025-01-29,2025-01-29,414.98999,442.802445
2025-01-30,2025-01-30,415.059998,415.060711


# 6) BONUS: Hyperparameter Optimization with Optuna

In [148]:
def create_online_learning_pipeline_for_optuna(df:pd.DataFrame, l2_val:float, window_size:int=1, learning_rate:float=0.03, intercept_lr_val:float=0.1):
  
    data = df.copy()
    # creating stream dataset
    # for prediction  the close price of the MSFT stock
    y = data.pop('close_msft_shifted')
    X_y_stream_dataset = iter_pandas(data, y)
    
    
    # --------------------  ONLINE/INCREMENTAL LEARNING MODEL  --------------------

    # creating Online/Incremental model with the River library
    # added columns for training 
    model = compose.Select(*set(list(data.columns)))
    
    # scaling (Actually, you need to use another scaler because there are negative values in the data, but it worked in my tests.)
    model |= preprocessing.MinMaxScaler()
    
    #final model
    model |= LinearRegression(l2=l2_val, intercept_lr = intercept_lr_val, optimizer=optim.Adam(learning_rate))

    

    # --------------------  TRAINING  --------------------
    # train metrics (rolling metrics for online learning)
    mae_metric = Rolling(metrics.MAE() , window_size = window_size)
    rmse_metric = Rolling(metrics.RMSE() , window_size = window_size)

    dates = data.index
    y_trues = []
    y_preds = []
    
    # training loop
    for x, y in X_y_stream_dataset:

        y_pred = model.predict_one(x)
        if y_pred < 0: y_pred = 0  # for minus value validation
        #learn only one sample
        model.learn_one(x, y)
        mae_metric.update(y, y_pred)
        rmse_metric.update(y, y_pred)
    
        y_trues.append(y)
        y_preds.append(y_pred)


    return mae_metric.get(), rmse_metric.get()

In [153]:
# objective function
# I tried to minimize the average of MAE and RMSE values to minimize the lost function. -> MINIMIZE [(mae_metric+rmse_metric)/2]


df_for_tuning = df_added_features.copy()    
df_for_tuning['close_msft_shifted'] = df_for_tuning['close_msft'].shift(-1)
df_for_tuning.drop(['close_msft'], axis=1, inplace=True)
df_for_tuning.dropna(inplace=True)

def objective_func(trial):
    window_size = trial.suggest_int('window_size', 1, 14)
    learning_rate = trial.suggest_float('learning_rate', 1e-2, 0.5, log=True)
    intercept_lr_val = trial.suggest_float('intercept_lr_val', 1e-1, 1.2, log=True)
    l2_val = trial.suggest_float('l2_val', 1e-1, 1.5, log=True)

    rmse_metric, mae_metric = create_online_learning_pipeline_for_optuna(df_for_tuning,l2_val=l2_val, window_size=window_size,learning_rate=learning_rate,intercept_lr_val=intercept_lr_val)
    return (mae_metric+rmse_metric)/2

study = optuna.create_study(direction='minimize')

[I 2025-02-02 08:10:57,133] A new study created in memory with name: no-name-f34a8b9d-0f95-4efc-9ab7-3ead5333672a


In [162]:
# trials
study.optimize(objective_func, n_trials=2500)

[I 2025-02-02 09:05:38,896] Trial 2000 finished with value: 16.94573729082552 and parameters: {'window_size': 2, 'learning_rate': 0.32334029762994104, 'intercept_lr_val': 0.48968166162329185, 'l2_val': 0.141421676964721}. Best is trial 1815 with value: 0.0007130734483666856.
[I 2025-02-02 09:05:40,569] Trial 2001 finished with value: 1.8486105100861323 and parameters: {'window_size': 1, 'learning_rate': 0.26012165732250453, 'intercept_lr_val': 0.526914021758965, 'l2_val': 0.15442742132659615}. Best is trial 1815 with value: 0.0007130734483666856.
[I 2025-02-02 09:05:42,195] Trial 2002 finished with value: 1.4795553602873497 and parameters: {'window_size': 1, 'learning_rate': 0.29578996853052114, 'intercept_lr_val': 0.4743499416457335, 'l2_val': 1.269021084178561}. Best is trial 1815 with value: 0.0007130734483666856.
[I 2025-02-02 09:05:43,833] Trial 2003 finished with value: 0.025279723203709636 and parameters: {'window_size': 1, 'learning_rate': 0.2481053783652559, 'intercept_lr_val'

In [163]:
# the best params
study.best_params #0.0007130734483666856

{'window_size': 1,
 'learning_rate': 0.25435208069550486,
 'intercept_lr_val': 0.4975315164402574,
 'l2_val': 0.14297374376783614}