# Regression  with NEAT

In this laboratory session, we will use **NEAT** to solve a regression problem. We will use the [Energy Efficiency Dataset](https://www.kaggle.com/datasets/elikplim/eergy-efficiency-dataset), which contains 8 attributes, denoted by **X1**, ..., **X8**, and 2 targets, denoted by y1 and y2. Specifically, we have:

- **X1** Relative Compactness
- **X2** Surface Area
- **X3** Wall Area
- **X4** Roof Area
- **X5** Overall Height
- **X6** Orientation
- **X7** Glazing Area
- **X8** Glazing Area Distribution
- **y1** Heating Load
- **y2** Cooling Load

Let us import some useful modules.

In [220]:
import neat
import random
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
import numpy as np
from sklearn.model_selection import train_test_split

random.seed(0)

We can load the data from the ```data``` directory.

In [None]:
data_path = "../data/ENB2012_data.xlsx"

df = pd.read_excel(data_path)
df.head(10)

Explore the dataset and perform some preprocessing, if necessary.

In [None]:
# CODE HERE

Split the dataset into 2 parts: training and validation set.

In [227]:
X_train, X_val, y_train, y_val = train_test_split(X.to_numpy(), y.to_numpy(), test_size=0.2, random_state=0)

Define a function ```eval_genomes``` to calculate the fitness of the genomes.

In [228]:
# Define the evaluation function for the entire population
def eval_genomes(genomes, config):
    for _, genome in genomes:
        # CODE HERE

Define a custom class ```MyReporter``` to save to a text file the fitness of the best individual, as well as its fitness calculated on the validation set, for each generation. You can leverage this class to save also some statistics of interest.

In [229]:
class MyReporter(neat.reporting.BaseReporter):
    def __init__(self, file_path):
        self.file_path = file_path
        self.generation = 0
        self.best_train_history = []
        self.best_val_history = []
        with open(self.file_path, 'w') as file:
            file.write(f"Generation,Train_fitness,Validation_fitness\n")

    def post_evaluate(self, config, population, species, best_genome):
        self.generation += 1

        # CODE HERE

Configure the parameters into the ```neat_config.txt```. You can try several settings and see which is better for the task ([documentation](https://neat-python.readthedocs.io/en/latest/config_file.html)).

In [230]:
config = neat.Config(neat.genome.DefaultGenome,
                     neat.reproduction.DefaultReproduction,
                     neat.species.DefaultSpeciesSet,
                     neat.stagnation.DefaultStagnation,
                     "./neat_config.txt")

Define the initial population with the chosen configuration.

In [None]:
population = neat.Population(config)

history_path = "./history.txt"
population.add_reporter(MyReporter(history_path))

We can row run our NEAT algorithm for a given number of generation.

In [232]:
winner = population.run(eval_genomes, 100)

Visualize the results and plot the history of both training and validation fitness.

In [None]:
# CODE HERE

In [None]:
n_outputs = 
n_inputs = 
g = nx.DiGraph()
for name in range(-n_inputs, 0):
    g.add_node(name, node_type=0)
for name in winner.nodes.keys():
    if name < n_outputs:
        node_type = 2
    else:
        node_type = 1
    g.add_node(name, node_type=node_type)
for i,j in winner.connections.keys():
    g.add_edge(i, j)
pos = nx.multipartite_layout(g, subset_key="node_type")
nx.draw(g, pos=pos)