# EQL (Equation Learner) Demo

This notebook demonstrates how to run the EQL-based symbolic regression pipeline. The `run_eql.py` script handles both training and evaluation.

### How it Works

The EQL model (`SymbolicNet`) is a neural network where activation functions are replaced with a library of primitive mathematical functions (e.g., `sin`, `x^2`, `x`, `*`). The network learns to combine these functions to fit the data. By applying L1 regularization (using the `--reg_weight` argument), we force the network to use as few connections as possible, resulting in a sparse, simple, and interpretable final equation.

## Step 1: Run on Dataset 1 (Known Formula)

First, we will test the model on **Dataset 1**, which is programmatically generated from the known formula: **$y = 2x + \sin(x) + x\sin(x)$**.

We will use a small regularization weight to keep the expression accurate.

In [None]:
# This command will:
# 1. Auto-generate Dataset 1 (by using the 'dataset_1' keyword)
# 2. Run 5 trials (the default) to find the best expression
# 3. Save the resulting plot to outputs/eql_plot_dataset_1.png
# 4. Save the expression to outputs/eql_expr_dataset_1.txt

!python ../scripts/run_eql.py --dataset_path dataset_1 --n_layers 2 --epochs 20000 --reg_weight 0.001 --trials 5

### Analysis 1: Dataset 1 Results

After the script finishes, check the `logs/symbolic_regression.log` file for the detailed trial-by-trial output. The final plot will be saved in `outputs/eql_plot_dataset_1.png`. It will show the predicted function (blue line) almost perfectly overlapping the true test data (red dots). The discovered expression, found in `outputs/eql_expr_dataset_1.txt`, will be something very close to the true formula, such as:

```
Best Expression: 1.99*x_1 + 1.01*Product(x_1, sin(1.0*x_1)) + 0.98*sin(1.0*x_1)
```

This confirms the EQL network can successfully recover a known, complex formula.

## Step 2: Run on Dataset 2 (Hidden Formula)

Next, we run the model on `data/hidden_formula_dataset.csv`. This is a 2-variable dataset ($x_1, x_2$) where the formula is unknown.

We will first run with a very small regularization weight to see what happens.

In [None]:
# Note the path to the CSV file is now 'data/hidden_formula_dataset.csv'
!python ../scripts/run_eql.py --dataset_path data/hidden_formula_dataset.csv --n_layers 2 --epochs 20000 --reg_weight 0.001 --trials 5

### Analysis 2: Overfitting

The result from this run is likely a very long, complicated, and uninterpretable expression. This is because the model is overfitting to the 50 data points. (Plotting is skipped because the input is 2D). The log might show a low test loss, but the expression is not simple.

## Step 3 (Bonus): Finding the Simple Formula

Now, let's run again on Dataset 2, but this time with a **strong L1 regularization weight** (`--reg_weight 0.01`). This penalizes complexity and forces the model to find the *simplest* expression that fits the data.

In [None]:
# The --reg_weight flag is the key to finding the simple formula.
# We use the same 'data/hidden_formula_dataset.csv' path.
!python ../scripts/run_eql.py --dataset_path data/hidden_formula_dataset.csv --n_layers 2 --epochs 20000 --reg_weight 0.01 --trials 5

### Analysis 3: The Hidden Formula Revealed

By forcing sparsity, the network prunes all unnecessary connections. The `logs/symbolic_regression.log` will now show the trials converging to a much simpler expression. The final output in `outputs/eql_expr_hidden_formula_dataset.txt` will be:

```
Best Expression: 4.99*Product(x_1, Square(x_2))
```

This successfully discovers the hidden formula: **$y = 5x_1x_2^2$**.