# üßÆ Fitness Functions in Feature Synthesis

In `GeneticFeatureSynthesis`, fitness functions are used to evaluate how well each symbolic program (i.e., candidate feature) models the target variable.

The better a program's prediction on the target y, the lower its fitness score. `GeneticFeatureSynthesis` minimizes this score during evolution, so lower = better.

---

## üß™ What Is a Fitness Function?

A fitness function takes:

- a symbolic program (tree),
- the true labels y_true,
- the predicted output y_pred,
- a parsimony penalty for program complexity,

and returns a single numeric value - the fitness.

All built-in fitness functions follow this signature:

```python
def fitness_func(program, parsimony: float, y_true, y_pred) -> float:
    ...

```

---

## üìö Built-in Fitness Functions

You can choose a fitness function by name:

```python
GeneticFeatureSynthesis(fitness_function="pearson")
```

Or pass a custom callable.

Here are the built-in options:

| Name            | Task                 | Description                                                 |
| --------------- | -------------------- | ----------------------------------------------------------- |
| `"pearson"`     | Regression           | Maximize absolute correlation between `y_true` and `y_pred` |
| `"r2"`          | Regression           | Maximize R¬≤ score                                           |
| `"mse"`         | Regression           | Minimize Mean Squared Error                                 |
| `"mutual"`      | Regression           | Maximize mutual information                                 |
| `"spearman"`    | Regression / Ranking | Maximize rank-order correlation                             |
| `"kendall"`     | Regression / Ranking | Maximize Kendall‚Äôs tau correlation                          |
| `"log_loss"`    | Classification       | Minimize log loss on binary probabilities                   |
| `"accuracy"`    | Classification       | Maximize classification accuracy                            |
| `"f1"`          | Classification       | Maximize F1 score (thresholded at 0.5)                      |

Use `featuristic.list_fitness_functions()` to see all registered options.

---

## ‚úÇÔ∏è Parsimony Penalty

To avoid bloated or overly complex symbolic formulas, every fitness function is automatically adjusted using a parsimony coefficient:

Where:

- `program_size` = number of nodes in the symbolic program
- `parsimony` = a small positive float (e.g., 0.001)
- Larger `parsimony` ‚Üí favors smaller programs
- If `adaptive_parsimony=True`, it adjusts based on average program size each generation

You can visualize parsimony pressure with `plot_history()`.

---

## üß† Choosing a Fitness Function

| Goal                      | Recommended Fitness           |
| ------------------------- | ----------------------------- |
| Predict a numeric target  | `"pearson"` or `"r2"`         |
| Rank or monotonic outputs | `"spearman"`, `"kendall"` |
| Classification (binary)   | `"log_loss"` or `"f1"`        |
| Custom modeling criteria  | Use a custom callable         |

---

## üîß Using a Custom Fitness Function

You can pass your own fitness function:

```python
from sklearn.metrics import median_absolute_error


def my_fitness(program, parsimony, y_true, y_pred):
    if y_pred.isna().any(): return float("inf")
    score = median_absolute_error(y_true, y_pred)
    penalty = node_count(program) ** parsimony
    return score * penalty

gfs = GeneticFeatureSynthesis(fitness_function=my_fitness)
```

---

## üß™ Registering a Custom Fitness Function (Optional)

You can also register a named fitness function globally:

```python
from featuristic import register_fitness

@register_fitness("mad")
def mad_fitness(program, parsimony, y_true, y_pred):
    ...
```

Then use it like:

```python
GeneticFeatureSynthesis(fitness_function="mad")
```

---

## üß¨ Fitness and Population Evolution

Fitness values are used by the evolutionary algorithm to:

- Select the best individuals (via tournament selection)
- Track progress across generations
- Apply early stopping when no improvement occurs

You can monitor this visually using:

```python
gfs.plot_history()
```

Which shows fitness scores and parsimony dynamics per generation.

---

## ‚úÖ Summary

| Concept          | Description                                  |
| ---------------- | -------------------------------------------- |
| Fitness function | Evaluates symbolic programs during evolution |
| Goal             | Minimize the fitness score                   |
| Parsimony        | Penalizes large programs to prevent bloat    |
| Custom functions | Fully supported via callable or registry     |


