Symbolic Regression: 
We can generalise the formula for the GBM to: 


$$ dS = f(S)dt + g(S)dM$$

<br> We can use Symbolic Regression to describe what $f$ and $g$ are. 

$$S_t = S_0 exp [{(\mu - \frac{\sigma^2}{2})t + \sigma W_t}]$$



In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from gplearn.genetic import SymbolicRegressor
import sklearn
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sympy import *
from sklearn.utils.random import check_random_state
import graphviz
import time
import yfinance as yf
from pysr import PySRRegressor

[juliapkg] Found dependencies: c:\users\jayma\appdata\local\packages\pythonsoftwarefoundation.python.3.13_qbz5n2kfra8p0\localcache\local-packages\python313\site-packages\juliacall\juliapkg.json
[juliapkg] Found dependencies: c:\users\jayma\appdata\local\packages\pythonsoftwarefoundation.python.3.13_qbz5n2kfra8p0\localcache\local-packages\python313\site-packages\juliapkg\juliapkg.json
[juliapkg] Found dependencies: c:\users\jayma\appdata\local\packages\pythonsoftwarefoundation.python.3.13_qbz5n2kfra8p0\localcache\local-packages\python313\site-packages\pysr\juliapkg.json
[juliapkg] Locating Julia 1.10.3 - 1.11
[juliapkg] Querying Julia versions from https://julialang-s3.julialang.org/bin/versions.json
[juliapkg]   If you use juliapkg in more than one environment, you are likely to
[juliapkg]   have Julia installed in multiple locations. It is recommended to
[juliapkg]   install JuliaUp (https://github.com/JuliaLang/juliaup) or Julia
[juliapkg]   (https://julialang.org/downloads) yourself

In [37]:
ticker_appl = "AAPL"

ticker = yf.Ticker(ticker_appl)

hist_data = ticker.history("1y")

hist_closing = np.array(hist_data["Close"])

#Calculate daily returns
T =  1
dS = np.diff(hist_closing)

drift_approx_target = dS/ T
diffusion_approx = (dS **2) / T 


In [38]:
default_pysr_param = dict(
    populations = 30, 
    model_selection="best"
)

https://www.scribd.com/document/660463529/On-Numerical-Methods-for-Stochastic-SINDy-2023-25
<br>https://arxiv.org/html/2306.17814v2
<br>https://medium.com/@polanitzer/estimating-the-parameters-for-a-geometric-brownian-motion-stochastic-process-using-two-different-6c7cbdf20c8f


In [60]:
drift_approx = np.array(drift_approx.reshape(-1,1))

model_miu = PySRRegressor(
    niterations=40,  # < Increase me for better results
    binary_operators=["+", "*", "/", "-"],
    unary_operators=[
        "log",
        "exp",
        "sqrt"
    ],
    elementwise_loss="loss(prediction, target) = (prediction - target)^2",
    **default_pysr_param,
)
model_miu.fit(drift_approx, diffusion_approx)



───────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           3.250e+03  0.000e+00  y = 18.707
3           1.231e-08  1.315e+01  y = x₀ * x₀
5           1.214e-08  6.946e-03  y = exp(log(x₀ * x₀))
7           1.209e-08  1.901e-03  y = x₀ * ((x₀ + 1.1965e-06) * 1)
9           1.198e-08  4.802e-03  y = (x₀ / ((1 - x₀) + x₀)) * x₀
10          1.072e-08  1.111e-01  y = x₀ * ((exp(-33.99 - x₀) + 1) * x₀)
12          1.044e-08  1.322e-02  y = (exp(-33.843 - x₀) + 1) * ((x₀ + 4.431e-06) * x₀)
14          1.033e-08  5.080e-03  y = ((((exp(-33.843 - x₀) + 1) * x₀) + 1) * x₀) - x₀
───────────────────────────────────────────────────────────────────────────────────────────────────


[ Info: Started!
[ Info: Final population:
[ Info: Results saved to:


0,1,2
,model_selection,'best'
,binary_operators,"['+', '*', ...]"
,unary_operators,"['log', 'exp', ...]"
,expression_spec,
,niterations,40
,populations,30
,population_size,27
,max_evals,
,maxsize,30
,maxdepth,


  - outputs\20251112_233418_nsYKZn\hall_of_fame.csv


In [59]:
diff_approx = np.array(diffusion_approx.reshape(-1,1))

model_sigma = PySRRegressor(
    niterations=40,  # < Increase me for better results
    binary_operators=["+", "*", "/", "-"],
    unary_operators=[
        "log",
        "exp",
        "sqrt"
    ],
    elementwise_loss="loss(prediction, target) = (prediction - target)^2",
    **default_pysr_param,
)
model_sigma.fit(diff_approx, drift_approx_target)

[ Info: Started!
[ Info: Final population:
[ Info: Results saved to:



Expressions evaluated per second: 1.870e+05
Progress: 1054 / 1200 total iterations (87.833%)
════════════════════════════════════════════════════════════════════════════════════════════════════
───────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           1.867e+01  0.000e+00  y = 0.19863
3           1.845e+01  5.970e-03  y = x₀ * 0.0085249
4           1.821e+01  1.272e-02  y = exp(x₀ * 0.0043938)
5           1.594e+01  1.336e-01  y = exp(sqrt(x₀)) * 9.4462e-11
7           1.593e+01  2.711e-04  y = (exp(sqrt(x₀)) * 9.4462e-11) - -0.10384
9           1.526e+01  2.129e-02  y = (exp(sqrt(x₀)) * 1.1919e-10) + (x₀ * -0.024105)
11          1.442e+01  2.853e-02  y = (2.7183 - x₀) * (((x₀ * -5.4875e-07) + 0.00032593) * x...
                                      ₀)
12          1.438e+01  2.521e-03  y = ((sqrt(x₀) - x₀) * ((x₀ * -5.4875e-07) + 0.00032593)) ...
                                      * x₀


0,1,2
,model_selection,'best'
,binary_operators,"['+', '*', ...]"
,unary_operators,"['log', 'exp', ...]"
,expression_spec,
,niterations,40
,populations,30
,population_size,27
,max_evals,
,maxsize,30
,maxdepth,


In [50]:
model_miu.sympy()
model_sigma.sympy()

x0*x0*(-(-5.1646066e-7)*x0 - 0.00030578606)

In [None]:
def geometricBrownianMotion(S0, mu, sigma, numofPaths, T, timeSteps):
    dt = T/timeSteps

    # Browniam increments
    dW = np.random.normal(0, np.sqrt(dt), size=(numofPaths, timeSteps)).T

    #Calculation for each step
    drift = (mu - ((sigma**2)/2)) * dt
    diffusion = sigma * dW
    increments = np.exp(drift + diffusion)


    St = np.vstack([np.ones(numofPaths), increments]).cumprod(axis=0) * S0

    return St