# Symbolic regression with PySR



> PySR is an open-source library for practical symbolic regression, a type of machine learning
which aims to discover human-interpretable symbolic models. Symbolic Regression is a supervised
learning task where the model space is spanned by analytic expressions. In
this family of algorithms, instead of fitting concrete
parameters in some overparameterized general model,
one searches the space of simple analytic expressions
for accurate and interpretable models.\
> -- Cranmer, Interpretable Machine Learning for Science
with PySR and SymbolicRegression.jl

In [1]:
%load_ext autoreload
%autoreload 2

from LoadData import *
from pathlib import Path
import numpy as np
import torch 
import sys 
import time
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
from torch.utils.data import TensorDataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from random import randrange
from joblib import dump, load
import sympy
from matplotlib import pyplot as plt
from pysr import PySRRegressor
from sklearn.model_selection import train_test_split


plt.rcParams.update({'font.size': 22})
plt.interactive(True)
plt.close('all')



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Get input data for fully developed channel flow, $Re = 5200$

In [44]:
minYplus = 30
maxYplus = 2200

y_DNS, yplus_DNS, u_DNS, uu_DNS, vv_DNS, ww_DNS, uv_DNS, k_DNS, eps_DNS, dudy_DNS = GetInputData('FullyDevelopedChannel_Re550', minYplus, maxYplus)
c, a11_DNS, a33_DNS = GetC0andC2(k_DNS, eps_DNS, dudy_DNS, uu_DNS, vv_DNS, ww_DNS)
T = abs(k_DNS / eps_DNS)

Returning data from: FullyDevelopedChannel_Re550. Min yplus: 30. Max yplus: 2200
Returning c = [c0, c2], a11 and a33


![image-2.png](attachment:image-2.png) 
![image-3.png](attachment:image-3.png)


Lets start with the simplest case of the Reynolds stresses for a pure shear flow, $\overline{v_1^{'} v_2^{'}}$



In [45]:
X = np.array([T, dudy_DNS]).T
y = uv_DNS
default_pysr_params = dict(
    populations=50,
    model_selection="best",
)

# Learn equations
model = PySRRegressor(
    niterations=50,
    binary_operators=["*"],
    unary_operators=[],
    **default_pysr_params,
)

model.fit(X, y)


[ Info: Started!


In [41]:


y_DNS, yplus_DNS, u_DNS, uu_DNS, vv_DNS, ww_DNS, uv_DNS, k_DNS, eps_DNS, dudy_DNS = GetInputData('FullyDevelopedChannel_Re5200', minYplus, maxYplus)
c, a11_DNS, a33_DNS = GetC0andC2(k_DNS, eps_DNS, dudy_DNS, uu_DNS, vv_DNS, ww_DNS)
T = abs(k_DNS / eps_DNS)
X = np.array([T, dudy_DNS]).T
y = uv_DNS
# Learn equations
model = PySRRegressor(
    niterations=50,
    binary_operators=["*"],
    unary_operators=[],
    **default_pysr_params,
)

model.fit(X, y)


Returning data from: FullyDevelopedChannel_Re5200. Min yplus: 30. Max yplus: 2200
Returning c = [c0, c2], a11 and a33


[ Info: Started!


In [43]:


y_DNS, yplus_DNS, u_DNS, uu_DNS, vv_DNS, ww_DNS, uv_DNS, k_DNS, eps_DNS, dudy_DNS = GetInputData('BoundaryLayer', minYplus, maxYplus)
c, a11_DNS, a33_DNS = GetC0andC2(k_DNS, eps_DNS, dudy_DNS, uu_DNS, vv_DNS, ww_DNS)
T = abs(k_DNS / eps_DNS)
X = np.array([T, dudy_DNS]).T
y = uv_DNS
# Learn equations
model = PySRRegressor(
    niterations=50,
    binary_operators=["*"],
    unary_operators=[],
    **default_pysr_params,
)

model.fit(X, y)

Returning data from: BoundaryLayer. Min yplus: 30. Max yplus: 2200
Returning c = [c0, c2], a11 and a33


[ Info: Started!


We have not been given the numerical value of $c_\mu$ but PySR suggests that the expression should read:

$\overline{v_1^{'} v_2^{'}} = - c_\mu \tau \frac{\partial \overline{v}_1}{\partial \overline{x}_2}$ \
\
where 
| Parameter | Fully developed channel, $Re = 550$ | Fully developed channel, $Re = 5200$ | Boundary layer |
| --- | --- | --- | --- |
| $c_\mu $| 0.18145 | 0.18345 | 0.16505 |