# Transformer (Seq2Seq) Demo

This notebook demonstrates the complete 3-stage pipeline for the Transformer-based symbolic regression model.

### How it Works

This model treats symbolic regression as a **translation task**. 

1.  **Encoder:** Reads the entire set of 50 `(x, y)` data points and summarizes them into a single "context vector".
2.  **Decoder:** Takes that context vector and "translates" it into a sequence of mathematical tokens (e.g., `mul`, `C`, `x1`, `sq`, `x2`).

This requires a 3-step process:
1.  **Generate Data:** Create thousands of `(data, expression)` pairs to teach the model how to translate.
2.  **Train Model:** Train the Transformer on this large dataset.
3.  **Evaluate Model:** Use the trained model to predict expressions for our two test datasets.

## Step 1: Generate Training Data

First, we must run the data generation script. This will create thousands of random mathematical expressions, sample 50 data points from each, and save them as `(data, expression)` pairs in the `data/transformer_pregen/` directory.

**Note: This step can take a long time (10-20 minutes).** It only needs to be run once.

In [None]:
# This will create ~7900 .npy (data) and .txt (expression) files
# in data/transformer_pregen/

!python ../scripts/generate_transformer_data.py --nb_trails 10000 --nb_sample_pts 50

## Step 2: Train the Transformer Model

Now that we have our training data, we can train the Transformer. This script will load the data from `data/transformer_pregen/`, split it into train/validation sets, and train the model.

**Note: This step takes a very long time (hours, depending on GPU).**

You can monitor progress by watching the console output or by checking the log file:
`tail -f logs/symbolic_regression.log`

In [None]:
# This will train for 100 epochs and save the best model to
# outputs/transformer_best.pth

!python ../scripts/train_transformer.py --epochs 100 --batch_size 128 --d_model 256

## Step 3: Evaluate the Trained Model

After training is complete, we can use our saved model (`outputs/transformer_best.pth`) to make predictions on our two datasets.

This script does two things:
1.  Generates the expression (which includes a placeholder constant `C`).
2.  Uses `scipy.optimize.minimize` to find the optimal numerical value for `C` that best fits the data.

In [None]:
!python ../scripts/evaluate_transformer.py --model_path outputs/transformer_best.pth

### Analysis: Transformer Results

The evaluation script will output its findings to the log and save plots/expressions to the `outputs/` directory.

**Dataset 1 (Known Formula: $y = 2x + \sin(x) + x\sin(x)$):**

* The model predicts a sequence like `(add, (add, (mul, C, x1), (sin, x1)), (mul, x1, (sin, x1)))`.
* The optimizer then solves for `C` and finds $C \approx 2.0$.
* The final plot `outputs/transformer_plot_dataset_1.png` shows a near-perfect fit, successfully recovering the known formula.

**Dataset 2 (Hidden Formula):**

* The model, having seen similar structures during training, predicts the correct form: `(mul, C, (mul, x1, (sq, x2)))`.
* The optimizer then solves for the constant `C`, finding that **$C \approx 5.0$**.
* The final expression is saved to `outputs/transformer_expr_dataset_2.txt`.

This demonstrates the Transformer's power: by learning from a vast, general dataset of equations, it can accurately identify the structure of a new, unseen problem, which a script can then fine-tune.