# Symbolic DAG Search

We are looking for computational DAG that has lowest loss for given input.

## Sklearn Interface for 1D Regression Problems

Note: Here $y$ has to be a 1D vector

**Example**:

Symbolic Regression for the function

\begin{align}
f(x) = \sin(x_0^2) + x_1
\end{align}

In [18]:
X = np.random.rand(100, 2)
y = np.sin(X[:, 0]**2) + X[:, 1]

In [19]:
est = dag_search.SDS(n_calc_nodes = 2, max_orders = int(1e4))
est.fit(X, y, verbose = 2)
est.model()

Creating evaluation orders


100%|██████████████████████████████████████████████████████████████████████████| 7200/7200 [00:00<00:00, 340117.00it/s]


Total orders: 405
Evaluating orders


 22%|██████████████▊                                                     | 88/405 [00:06<00:23, 13.44it/s, best_loss=0]

Found graph with loss 0.0





x_1 + sin(x_0**2)

## Advanced usage

**Example**: 

Symbolic Regression for the function

\begin{align}
f(x) = \begin{bmatrix}
0.5x\\
-x
\end{bmatrix}
\end{align}

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import dag_search

In [2]:
# Symbolic Regression: f(x) = 0.5x, -x
X = np.random.rand(100, 1)
c = 0.5
y = np.column_stack([c*X[:,0], -X[:,0]])


m = X.shape[1]
n = y.shape[1]
k = 1

loss_fkt = dag_search.MSE_loss_fkt(y)

### Exhaustive Search


Pro: 
- you catch every good solution

Con:
- only feasible for shallow computational graphs

In [4]:
params = {
    'X' : X,
    'n_outps' : n,
    'loss_fkt' : loss_fkt,
    'k' : k,
    'n_calc_nodes' : 1,
    'n_processes' : 1,
    'topk' : 5,
    'opt_mode' : 'grid_zoom',
    'verbose' : 2,
    'max_orders' : 10000, 
    'stop_thresh' : 1e-4
}

res = dag_search.exhaustive_search(**params)

100%|████████████████████████████████████████████████████████████████████████████| 864/864 [00:00<00:00, 151146.09it/s]


Total orders: 305
Evaluating orders


  6%|████▏                                                               | 19/305 [00:02<00:32,  8.83it/s, best_loss=0]


In [5]:
res['graphs'][0].evaluate_symbolic(c = res['consts'][0])

[0.5*x_0, -x_0]

In [6]:
res['losses'][0]

0.0

### Sampling

Pro: 
- you can go for much deeper graphs faster

Con:
- youre guaranteed to miss something

In [7]:
np.random.seed(0)
params = {
    'X' : X,
    'n_outps' : n,
    'loss_fkt' : loss_fkt,
    'k' : k,
    'n_calc_nodes' : 5,
    'n_processes' : 1,
    'topk' : 5,
    'opt_mode' : 'grid_zoom',
    'verbose' : 2,
    'n_samples' : 10000,
    'stop_thresh' : 1e-4
    
}
res = dag_search.sample_search(**params)

Generating graphs


100%|██████████████████████████████████████████████████████████████████████████| 10000/10000 [00:09<00:00, 1003.37it/s]


Evaluating graphs


  6%|███▎                                                     | 584/10000 [00:00<00:09, 1000.88it/s, best_loss=9.47e-7]


In [8]:
res['graphs'][0].evaluate_symbolic(c = res['consts'][0])

[0.497571047891727*x_0, -x_0]

In [9]:
res['losses'][0]

9.470318754244756e-07