Symbolic Regression: 
We can generalise the formula for the GBM to: 


$$ dS = f(S)dt + g(S)dM$$

<br> We can use Symbolic Regression to describe what $f$ and $g$ are. 

$$S_t = S_0 exp [{(\mu - \frac{\sigma^2}{2})t + \sigma W_t}]$$



In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from gplearn.genetic import SymbolicRegressor
import sklearn
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sympy import *
from sklearn.utils.random import check_random_state
import graphviz
import time
import yfinance as yf
from pysr import PySRRegressor

[juliapkg] Found dependencies: c:\users\jayma\appdata\local\packages\pythonsoftwarefoundation.python.3.13_qbz5n2kfra8p0\localcache\local-packages\python313\site-packages\juliacall\juliapkg.json
[juliapkg] Found dependencies: c:\users\jayma\appdata\local\packages\pythonsoftwarefoundation.python.3.13_qbz5n2kfra8p0\localcache\local-packages\python313\site-packages\juliapkg\juliapkg.json
[juliapkg] Found dependencies: c:\users\jayma\appdata\local\packages\pythonsoftwarefoundation.python.3.13_qbz5n2kfra8p0\localcache\local-packages\python313\site-packages\pysr\juliapkg.json
[juliapkg] Locating Julia 1.10.3 - 1.11
[juliapkg] Querying Julia versions from https://julialang-s3.julialang.org/bin/versions.json
[juliapkg]   If you use juliapkg in more than one environment, you are likely to
[juliapkg]   have Julia installed in multiple locations. It is recommended to
[juliapkg]   install JuliaUp (https://github.com/JuliaLang/juliaup) or Julia
[juliapkg]   (https://julialang.org/downloads) yourself

In [37]:
ticker_appl = "AAPL"

ticker = yf.Ticker(ticker_appl)

hist_data = ticker.history("1y")

hist_closing = np.array(hist_data["Close"])

#Calculate daily returns
T =  1
dS = np.diff(hist_closing)

drift_approx_target = dS/ T
diffusion_approx = (dS **2) / T 


In [38]:
default_pysr_param = dict(
    populations = 30, 
    model_selection="best"
)

https://www.scribd.com/document/660463529/On-Numerical-Methods-for-Stochastic-SINDy-2023-25
<br>https://arxiv.org/html/2306.17814v2
<br>https://medium.com/@polanitzer/estimating-the-parameters-for-a-geometric-brownian-motion-stochastic-process-using-two-different-6c7cbdf20c8f


In [39]:
drift_approx = np.array(drift_approx.reshape(-1,1))

model_miu = PySRRegressor(
    niterations=40,  # < Increase me for better results
    binary_operators=["+", "*", "/", "-"],
    unary_operators=[
        "log",
        "exp",
        "sqrt"
    ],
    elementwise_loss="loss(prediction, target) = (prediction - target)^2",
    **default_pysr_param,
)
model_miu.fit(drift_approx, diffusion_approx)




Expressions evaluated per second: 2.010e+05
Progress: 1061 / 1200 total iterations (88.417%)
════════════════════════════════════════════════════════════════════════════════════════════════════
───────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           3.250e+03  0.000e+00  y = 18.707
3           1.231e-08  1.315e+01  y = x₀ * x₀
5           1.199e-08  1.300e-02  y = (x₀ * x₀) * 1
8           1.082e-08  3.442e-02  y = (6.9889e-13 / exp(x₀)) + (x₀ * x₀)
9           9.216e-09  1.600e-01  y = (x₀ * x₀) + (9.5981e-06 / (x₀ + 9.556))
11          9.207e-09  5.041e-04  y = -3.3956e-06 + ((x₀ * x₀) + (9.7972e-06 / (x₀ + 9.556))...
                                      )
13          9.207e-09  1.815e-05  y = (((x₀ * x₀) + (-4.1333e-06 / (x₀ + 9.5714))) + 0.44716...
                                      ) - 0.44716
15          9.028e-09  9.794e-03  y = ((x₀ * x₀) * 2.3204e-07) + ((x₀ * x₀) + (9.5981e

[ Info: Started!
[ Info: Final population:
[ Info: Results saved to:


0,1,2
,model_selection,'best'
,binary_operators,"['+', '*', ...]"
,unary_operators,"['log', 'exp', ...]"
,expression_spec,
,niterations,40
,populations,30
,population_size,27
,max_evals,
,maxsize,30
,maxdepth,


In [48]:
diff_approx = np.array(diffusion_approx.reshape(-1,1))

model_sigma = PySRRegressor(
    niterations=40,  # < Increase me for better results
    binary_operators=["+", "*", "/", "-"],
    unary_operators=[
        "log",
        "exp",
        "sqrt"
    ],
    elementwise_loss="loss(prediction, target) = (prediction - target)^2",
    **default_pysr_param,
)
model_sigma.fit(diff_approx, drift_approx_target)




Expressions evaluated per second: 2.320e+05
Progress: 1122 / 1200 total iterations (93.500%)
════════════════════════════════════════════════════════════════════════════════════════════════════
───────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           1.867e+01  0.000e+00  y = 0.19863
3           1.845e+01  5.970e-03  y = x₀ * 0.0085164
5           1.763e+01  2.270e-02  y = x₀ * (x₀ * 3.1644e-05)
6           1.602e+01  9.567e-02  y = exp(-6.6192 - (x₀ * -0.014235))
9           1.443e+01  3.493e-02  y = (-0.00030579 - (x₀ * -5.1646e-07)) * (x₀ * x₀)
11          1.406e+01  1.296e-02  y = (0.035476 - (((x₀ * -7.2439e-07) + 0.00049946) * x₀)) ...
                                      * x₀
13          1.405e+01  3.166e-04  y = ((((x₀ * -7.0245e-07) + 0.00048055) * x₀) - 0.032628) ...
                                      * (-4.2595 - x₀)
15          1.405e+01  2.444e-05  y = ((((x₀ * -7.0245e-0

[ Info: Started!
[ Info: Final population:
[ Info: Results saved to:


0,1,2
,model_selection,'best'
,binary_operators,"['+', '*', ...]"
,unary_operators,"['log', 'exp', ...]"
,expression_spec,
,niterations,40
,populations,30
,population_size,27
,max_evals,
,maxsize,30
,maxdepth,


  - outputs\20251112_232023_5RJVxP\hall_of_fame.csv


In [50]:
model_miu.sympy()
model_sigma.sympy()

x0*x0*(-(-5.1646066e-7)*x0 - 0.00030578606)

In [None]:
def geometricBrownianMotion(S0, mu, sigma, numofPaths, T, timeSteps):
    dt = T/timeSteps

    # Browniam increments
    dW = np.random.normal(0, np.sqrt(dt), size=(numofPaths, timeSteps)).T

    #Calculation for each step
    drift = (mu - ((sigma**2)/2)) * dt
    diffusion = sigma * dW
    increments = np.exp(drift + diffusion)


    St = np.vstack([np.ones(numofPaths), increments]).cumprod(axis=0) * S0

    return St