# üöÄ Quickstart

This guide walks you through using **Genetic Feature Synthesis (GFS)** to automatically generate interpretable features from tabular data.

By the end, you'll:

- Load a dataset
- Run genetic feature synthesis
- Extract and visualize new features
- Integrate with your ML pipeline

---

## üõ†Ô∏è Installation

Install `featuristic` via pip:

```bash
pip install featuristic
```

---

## üì¶ Step 1: Load a Dataset

```python
from featuristic.datasets import fetch_cars_dataset

X, y = fetch_cars_dataset()

X.head()
```

The X dataframe contains numeric features (e.g., cylinders, horsepower), and y is the target (e.g., miles-per-gallon).

## üß¨ Step 2: Run Evolutionary Feature Synthesis

```python
from featuristic import GeneticFeatureSynthesis

gfs = GeneticFeatureSynthesis(
    num_features=5,
    max_generations=20,
    population_size=100,
    verbose=True,
)

X_new = gfs.fit_transform(X, y)
```

This will evolve symbolic expressions over 20 generations and return a new dataset with 5 synthesized features.

---

## üîç Step 3: Explore Synthesized Features

```python
gfs.get_feature_info()
```

You'll get a table like:

| name       | formula                                   | fitness |
| ---------- | ----------------------------------------- | ------- |
| feature\_0 | log(abs(horsepower)) \* sin(weight)       | -0.87   |
| feature\_1 | (acceleration + displacement) / cylinders | -0.84   |
| ...        | ...                                       | ...     |

---

## üìà Step 4: Visualize Evolution History

```python
gfs.plot_history()
```

This plots the fitness score and parsimony coefficient across generations:

- Blue line: model simplicity pressure (parsimony)
- Orange line: best fitness achieved each generation
- Grey dashed line: early stopping point (if triggered)

---

## ü§ñ Step 5: Use New Features in Your Model

```python
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score

# Use synthesized features only
scores = cross_val_score(RandomForestRegressor(), X_new, y, cv=5)

print("CV R¬≤ Score with Synthesized Features:", scores.mean())
```

Or concatenate with original features:

```python
import pandas as pd

X_combined = pd.concat([X, X_new], axis=1)
```

## ‚úÖ Summary

You've just:

- Generated symbolic features using GFS
- Viewed their formulas and fitness scores
- Visualized the evolutionary process
- Integrated them into a scikit-learn workflow
