# Environment setup

Install symbolfit via pip:

In [None]:
!pip install symbolfit

Then import PySR, which will install dependencies at first import (can take a few mins):

In [None]:
import pysr

After installation, import all other packages:

In [None]:
from symbolfit.symbolfit import *
from pysr import PySRRegressor

# A simple 1D dataset

Five inputs are needed, which can be python lists or numpy arrays:

1. ``x``: independent variable (bin center).

2. ``y``: dependent variable.

3. ``y_up``: upward uncertainty in y per bin.

4. ``y_down``: downward uncertainty in y per bin.

5. ``bin_widths_1d``: bin widths for x.

- Elements in both y_up and y_down should be non-negative values.
- These values are the "delta" in y,
  - y + y_up = y shifted up by one standard deviation,
  - y - y_down = y shifted down by one standard deviation.
- If no uncertainty in the dataset, one can set y_up and y_down to ones with the same shape as x.

In [None]:
# A simple 1D binned data.
x = [0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5]
y = [3, 2.8, 2.7, 2.7, 2.8, 2.6, 2.1, 1.7, 1]
y_up = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.05]
y_down = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.05]
bin_widths_1d = [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]

Plot the dataset to see what we will be fitting to:

In [None]:
fig, axes = plt.subplots(figsize = (6, 4))
plt.errorbar(np.array(x).flatten(),
             np.array(y).flatten(),
             yerr = [np.array(y_down).flatten(), np.array(y_up).flatten()],
             xerr = np.array(bin_widths_1d)/2,
             fmt = '.', c = 'black', ecolor = 'grey', capsize = 0,
            )

## Configure the fit

### Configure PySR to define the function space being searched for with symbolic regression

In [None]:
pysr_config = PySRRegressor(
    model_selection = 'accuracy',
    niterations = 100,
    maxsize = 40,
    binary_operators = [
        '+', '*', '/', '^'
                     ],
    unary_operators = [
        'exp',
        'tanh',
    ],
    nested_constraints = {
        'exp':    {'exp': 0, 'tanh': 0, '*': 2, '/': 1, '^': 1},
        'tanh':   {'exp': 0, 'tanh': 0, '*': 2, '/': 1, '^': 1},
        '*':      {'exp': 1, 'tanh': 1, '*': 2, '/': 1, '^': 1},
        '^':      {'exp': 1, 'tanh': 1, '*': 2, '/': 1, '^': 0},
        '/':      {'exp': 1, 'tanh': 1, '*': 2, '/': 0, '^': 1},
    },
    loss='loss(y, y_pred, weights) = (y - y_pred)^2 * weights',
)

Here, we allow four binary operators (+, *, /, pow) and two unary operators (exp, tanh) when searching for functional forms.

Nested constraints are imposed to prohibit, e.g., exp(exp(x))...

Loss function is a weighted MSE, where the weight is the sqaured uncertainty by default in SymbolFit.

For PySR options, please see:
- https://github.com/MilesCranmer/PySR
- https://astroautomata.com/PySR/

### Configure SymbolFit with the PySR config and for the re-optimization process

In [None]:
model = SymbolFit(
    # Dataset: x, y, y_up, y_down.
    x = x,
    y = y,
    y_up = y_up,
    y_down = y_down,
    # PySR configuration of function space.
    pysr_config = pysr_config,
    # Constrain the maximum function size and over-write maxsize in pysr_config.
    max_complexity = 25,
    # Whether to scale input x to be within 0 and 1 during fits for stability, as large x could lead to overflow.
    input_rescale = False,
    # Whether to scale y during fits for stability (when input_rescale is True): None / 'mean' / 'max' / 'l2'.
    scale_y_by = None,
    # Set a maximum standard error (%) for all parameters to avoid bad fits during re-optimization (will re-parameterize and re-fit with fewer parameters when too large errors).
    max_stderr = 20,
    # Consider y_up and y_down to weight the MSE loss during SR search and re-optimization.
    fit_y_unc = True,
    # Set a random seed for returning the same batch of functional forms every time (single-threaded), otherwise set None to explore more functions every time.
    random_seed = 12345,
    # Custome loss weight to replace y_up and y_down.
    loss_weights = None
)

## Symbol fit it!

Run the fit: SR fit for functional forms -> parameterization -> re-optimization fit for improved best-fits and uncertainty estimation -> evaluation.

In [None]:
model.fit()

## Save results to output files

Save results to csv tables:

- ``candidates.csv``: saves all candidate functions and evaluations in a csv table.
- ``candidates_reduced.csv``: saves a reduced version for essential information without intermediate results.

In [None]:
model.save_to_csv(output_dir = 'output_dir/')

Plot results to pdf files:

- ``candidates.pdf``: plots all candidate functions with associated uncertainties one by one for fit quality evaluation.
- ``candidates_sampling.pdf``: plots all candidate functions with total uncertainty coverage generated by sampling parameters.
- ``candidates_gof.pdf``: plots the goodness-of-fit scores.
- ``candidates_correlation.pdf``: plots the correlation matrices for the parameters of the candidate functions.

In [None]:
model.plot_to_pdf(
    output_dir = 'output_dir/',
    bin_widths_1d = bin_widths_1d,
    #bin_edges_2d = bin_edges_2d,
    plot_logy = False,
    plot_logx = False,
    sampling_95quantile = False
)

Download ``output_dir`` from the tab on the left and see the results!

# A simple 2D dataset