# SLIM-GSGP LIBRARY TESTING + EXERCISES

This notebook is dedicated to testing the SLIM library and working on selected exercises looking to better understand it's organization.

## **For all 3 algorithms (GP, GSGP, SLIM):**

### 1. Add a constant set and use it during the evolution process, show the final individual to make sure it has constants

The current library only has a defined function set, not considering the possibility of constant values being used during the evolution of programs. Adding a constant set opens space to a more complete evolution of the individuals in the population.

#### **GP**:

##### Libraries:

In [4]:
from slim_gsgp.main_gp import gp  # import the slim_gsgp library
from slim_gsgp.datasets.data_loader import load_ppb  # import the loader for the dataset PPB
from slim_gsgp.evaluators.fitness_functions import rmse  # import the rmse fitness metric
from slim_gsgp.utils.utils import train_test_split  # import the train-test split function

##### Dataset:

In [5]:
# Load the PPB dataset
X, y = load_ppb(X_y=True)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, p_test=0.4)

# Split the test set into validation and test sets
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, p_test=0.5)

##### MODEL:

**Baseline with no constant set:**

In [6]:
# Apply the GP algorithm
final_tree_f =  gp(X_train=X_train, y_train=y_train,
                X_test=X_val, y_test=y_val,
                dataset_name='ppb', pop_size=100, n_iter=100)

Verbose Reporter
-----------------------------------------------------------------------------------------------------------------------------------------
|         Dataset         |  Generation  |     Train Fitness     |       Test Fitness       |        Timing          |      Nodes       |
-----------------------------------------------------------------------------------------------------------------------------------------
|     ppb                 |       0      |   40.56873321533203   |   39.31814956665039      |   0.061817169189453125 |      3           |
|     ppb                 |       1      |   36.83185958862305   |   38.91961669921875      |   0.010618925094604492 |      3           |
|     ppb                 |       2      |   34.91804504394531   |   36.0217399597168       |   0.010009050369262695 |      3           |
|     ppb                 |       3      |   30.349258422851562  |   33.08704376220703      |   0.007938861846923828 |      3           |
|     ppb        

KeyboardInterrupt: 

Getting representation:

In [4]:
# Show the best individual structure at the last generation
final_tree_f.print_tree_representation()

add(
  add(
    x534
    add(
      x457
      add(
        x457
        add(
          x524
          add(
            x457
            add(
              x15
              divide(
                x605
                add(
                  x407
                  add(
                    x79
                    add(
                      x457
                      add(
                        x524
                        add(
                          x407
                          subtract(
                            x379
                            x100
                          )
                        )
                      )
                    )
                  )
                )
              )
            )
          )
        )
      )
    )
  )
  add(
    x457
    add(
      x524
      add(
        x457
        add(
          x15
          divide(
            x605
            add(
              x407
              add(
                x79
                add(
          

Getting RMSE to check performance:

In [5]:
# Get the prediction of the best individual on the test set
predictions = final_tree_f.predict(X_test)

# Compute and print the RMSE on the test set
print(float(rmse(y_true=y_test, y_pred=predictions)))

28.124900817871094


**Model with constant set:**

After exploring the code, a few things are noted:
- The main implementation of GP (*main_gp.py*) already includes the possibility of a constant set, as it makes it possible to add a constant set to the search space / dictionary (pi_init).
- All we need to actually do is get to the **gp_config.py** and change the configuration of the search space (*gp_pi_init*) to have a probability different from 0 (it was set to 0, so no constants were added).

Testing for **p_c = 0.3**:
- Simply changing the probability of being constant gives back an error: *"mul(): argument 'input' (position 1) must be Tensor, not dict"*, needed to correct **apply_tree from tree.py** and change the **CONSTANTS** dictionary from **gp_config.py** (*abs(): argument 'input' (position 1) must be Tensor, not function*). 

In [3]:
final_tree_c = gp(X_train=X_train, y_train=y_train,
                X_test=X_val, y_test=y_val,
                dataset_name='ppb', pop_size=100, n_iter=100, p_xo = 0)

Verbose Reporter
-----------------------------------------------------------------------------------------------------------------------------------------
|         Dataset         |  Generation  |     Train Fitness     |       Test Fitness       |        Timing          |      Nodes       |
-----------------------------------------------------------------------------------------------------------------------------------------
|     ppb                 |       0      |   43.867095947265625  |   46.88303756713867      |   0.07063102722167969  |      3           |
|     ppb                 |       1      |   43.867095947265625  |   46.88303756713867      |   0.015156984329223633 |      3           |
|     ppb                 |       2      |   43.867095947265625  |   46.88303756713867      |   0.016156911849975586 |      3           |
|     ppb                 |       3      |   43.867095947265625  |   46.88303756713867      |   0.016138076782226562 |      3           |
|     ppb        

In [4]:
final_tree_c.print_tree_representation()

add(
  add(
    x427
    subtract(
      x496
      divide(
        x545
        divide(
          subtract(
            constant_3.0
            add(
              x45
              x166
            )
          )
          multiply(
            x220
            divide(
              constant_3.0
              divide(
                constant_5.0
                x139
              )
            )
          )
        )
      )
    )
  )
  multiply(
    x553
    add(
      x481
      multiply(
        constant_3.0
        x602
      )
    )
  )
)



In [5]:
# Get the prediction of the best individual on the test set
predictions_c = final_tree_c.predict(X_test)

# Compute and print the RMSE on the test set
print(float(rmse(y_true=y_test, y_pred=predictions_c)))

30.261333465576172


#### **GSGP**:

In [6]:
from slim_gsgp.main_gsgp import gsgp  # import the slim library
from slim_gsgp.datasets.data_loader import load_ppb  # import the loader for the dataset PPB
from slim_gsgp.evaluators.fitness_functions import rmse  # import the rmse fitness metric
from slim_gsgp.utils.utils import train_test_split  # import the train-test split function
from slim_gsgp.utils.utils import generate_random_uniform  # import the mutation step function

In [7]:
# Load the PPB dataset
X, y = load_ppb(X_y=True)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, p_test=0.4)

# Split the test set into validation and test sets
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, p_test=0.5)

In [8]:
# Apply the Standard GSGP algorithm
final_tree = gsgp(X_train=X_train, y_train=y_train,
                  X_test=X_val, y_test=y_val,
                  dataset_name='ppb', pop_size=100, n_iter=100,
                  reconstruct=True, ms_lower=0, ms_upper=1)


# Get the prediction of the best individual on the test set
predictions = final_tree.predict(X_test)

# Compute and print the RMSE on the test set
print(float(rmse(y_true=y_test, y_pred=predictions)))

Verbose Reporter
-----------------------------------------------------------------------------------------------------------------------------------------
|         Dataset         |  Generation  |     Train Fitness     |       Test Fitness       |        Timing          |      Nodes       |
-----------------------------------------------------------------------------------------------------------------------------------------
|     ppb                 |       0      |   57.22980880737305   |   60.118648529052734     |   0.1320788860321045   |      3           |
|     ppb                 |       1      |   57.012939453125     |   59.90546417236328      |   0.030529022216796875 |      9           |
|     ppb                 |       2      |   57.012939453125     |   59.90546417236328      |   0.030605792999267578 |      9           |
|     ppb                 |       3      |   56.786285400390625  |   59.68736267089844      |   0.03196072578430176  |      15          |
|     ppb        

With GSGP, we don't check tree representation as individuals usually get too big to be interpretable.

#### **SLIM**:

With SLIM, we already have the porblem of different dimension, must fix it with new version of individual.

In [9]:
from slim_gsgp.main_slim import slim  # import the slim library
from slim_gsgp.datasets.data_loader import load_ppb  # import the loader for the dataset PPB
from slim_gsgp.evaluators.fitness_functions import rmse  # import the rmse fitness metric
from slim_gsgp.utils.utils import train_test_split, show_individual  # import the train-test split function
from slim_gsgp.utils.utils import generate_random_uniform  # import the mutation step function

In [10]:
X, y = load_ppb(X_y=True)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, p_test=0.4)

# Split the test set into validation and test sets
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, p_test=0.5)

In [12]:
# Apply the SLIM GSGP algorithm
final_tree = slim(X_train=X_train, y_train=y_train,
                  X_test=X_val, y_test=y_val,
                  dataset_name='ppb', slim_version='SLIM+SIG2', pop_size=100, n_iter=100,
                  ms_lower=0, ms_upper=1, p_inflate=0.5)

# Show the best individual structure at the last generation
print(show_individual(final_tree, operator='sum'))

# Get the prediction of the best individual on the test set
predictions = final_tree.predict(X_test)

# Compute and print the RMSE on the test set
print(float(rmse(y_true=y_test, y_pred=predictions)))

Verbose Reporter
-----------------------------------------------------------------------------------------------------------------------------------------
|         Dataset         |  Generation  |     Train Fitness     |       Test Fitness       |        Timing          |      Nodes       |
-----------------------------------------------------------------------------------------------------------------------------------------
|     ppb                 |       0      |   44.30246353149414   |   46.856666564941406     |   0.052320003509521484 |      3           |
|     ppb                 |       1      |   44.30246353149414   |   46.856666564941406     |   0.018263816833496094 |      3           |
|     ppb                 |       2      |   44.30246353149414   |   46.856666564941406     |   0.01672816276550293  |      3           |
|     ppb                 |       3      |   44.18924331665039   |   46.75093078613281      |   0.02145099639892578  |      14          |
|     ppb        

-----

### 2. Change the functions set by adding a new function with arity 1 (ex. cos) and a new function with arity 2 (ex.mean)

For this question, we simply need to change the function set in the **xx_config.py** file.

#### **GP**

In [1]:
from slim_gsgp.main_gp import gp  # import the slim library
from slim_gsgp.datasets.data_loader import load_ppb  # import the loader for the dataset PPB
from slim_gsgp.evaluators.fitness_functions import rmse  # import the rmse fitness metric
from slim_gsgp.utils.utils import train_test_split  # import the train-test split function

In [3]:
# Load the PPB dataset
X, y = load_ppb(X_y=True)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, p_test=0.4)

# Split the test set into validation and test sets
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, p_test=0.5)

In [3]:
# Apply the GP algorithm
final_tree_f = gp(X_train=X_train, y_train=y_train,
                X_test=X_val, y_test=y_val,
                dataset_name='ppb', pop_size=100, n_iter=100)

Verbose Reporter
-----------------------------------------------------------------------------------------------------------------------------------------
|         Dataset         |  Generation  |     Train Fitness     |       Test Fitness       |        Timing          |      Nodes       |
-----------------------------------------------------------------------------------------------------------------------------------------
|     ppb                 |       0      |   40.56873321533203   |   39.31814956665039      |   0.045274972915649414 |      3           |
|     ppb                 |       1      |   36.83185958862305   |   38.91961669921875      |   0.006830930709838867 |      3           |
|     ppb                 |       2      |   34.91804504394531   |   36.0217399597168       |   0.006289005279541016 |      3           |
|     ppb                 |       3      |   30.349258422851562  |   33.08704376220703      |   0.0052111148834228516|      3           |
|     ppb        

In [4]:
final_tree_f.print_tree_representation()

mean(
  add(
    add(
      x201
      add(
        x201
        multiply(
          x103
          cosine(
            add(
              x518
              add(
                x201
                add(
                  x201
                  cosine(
                    subtract(
                      x390
                      cosine(
                        subtract(
                          x390
                          divide(
                            x578
                            x332
                          )
                        )
                      )
                    )
                  )
                )
              )
            )
          )
        )
      )
    )
    add(
      x201
      add(
        x201
        cosine(
          add(
            x518
            add(
              x201
              add(
                x201
                add(
                  x201
                  add(
                    x201
                    cosine(
  

#### **GSGP**

In [5]:
from slim_gsgp.main_gsgp import gsgp  # import the slim library
from slim_gsgp.datasets.data_loader import load_ppb  # import the loader for the dataset PPB
from slim_gsgp.evaluators.fitness_functions import rmse  # import the rmse fitness metric
from slim_gsgp.utils.utils import train_test_split  # import the train-test split function
from slim_gsgp.utils.utils import generate_random_uniform  # import the mutation step function

In [6]:
# Apply the Standard GSGP algorithm
final_tree = gsgp(X_train=X_train, y_train=y_train,
                  X_test=X_val, y_test=y_val,
                  dataset_name='ppb', pop_size=100, n_iter=100,
                  reconstruct=True, ms_lower=0, ms_upper=1)

# Get the prediction of the best individual on the test set
predictions = final_tree.predict(X_test)

# Compute and print the RMSE on the test set
print(float(rmse(y_true=y_test, y_pred=predictions)))

Verbose Reporter
-----------------------------------------------------------------------------------------------------------------------------------------
|         Dataset         |  Generation  |     Train Fitness     |       Test Fitness       |        Timing          |      Nodes       |
-----------------------------------------------------------------------------------------------------------------------------------------
|     ppb                 |       0      |   57.22980880737305   |   60.118648529052734     |   0.1385810375213623   |      3           |
|     ppb                 |       1      |   57.012939453125     |   59.90546417236328      |   0.03135108947753906  |      9           |
|     ppb                 |       2      |   57.012939453125     |   59.90546417236328      |   0.030820846557617188 |      9           |
|     ppb                 |       3      |   56.786285400390625  |   59.68736267089844      |   0.03261423110961914  |      15          |
|     ppb        

#### **SLIM**

In [1]:
from slim_gsgp.main_slim import slim  # import the slim library
from slim_gsgp.datasets.data_loader import load_ppb  # import the loader for the dataset PPB
from slim_gsgp.evaluators.fitness_functions import rmse  # import the rmse fitness metric
from slim_gsgp.utils.utils import train_test_split, show_individual  # import the train-test split function
from slim_gsgp.utils.utils import generate_random_uniform  # import the mutation step function

In [4]:
# Apply the SLIM GSGP algorithm
final_tree = slim(X_train=X_train, y_train=y_train,
                  X_test=X_val, y_test=y_val,
                  dataset_name='ppb', slim_version='SLIM+SIG2', pop_size=100, n_iter=100,
                  ms_lower=0, ms_upper=1, p_inflate=0.5)

Verbose Reporter
-----------------------------------------------------------------------------------------------------------------------------------------
|         Dataset         |  Generation  |     Train Fitness     |       Test Fitness       |        Timing          |      Nodes       |
-----------------------------------------------------------------------------------------------------------------------------------------
|     ppb                 |       0      |   44.542259216308594  |   57.63134765625         |   0.059311866760253906 |      3           |
|     ppb                 |       1      |   44.542259216308594  |   57.63134765625         |   0.018822908401489258 |      3           |
|     ppb                 |       2      |   44.470584869384766  |   57.602725982666016     |   0.016718626022338867 |      13          |
|     ppb                 |       3      |   44.470584869384766  |   57.602725982666016     |   0.016634225845336914 |      13          |
|     ppb        

In [5]:
show_individual(final_tree, 'sum')

"(np.str_('subtract'), np.str_('x0'), np.str_('x251')) + f((np.str_('multiply'), np.str_('x370'), np.str_('x476')) - (np.str_('subtract'), np.str_('x160'), np.str_('x86'))) + f((np.str_('divide'), np.str_('constant_2.0'), np.str_('x292')) - (np.str_('multiply'), np.str_('constant_3.0'), np.str_('constant__1.0'))) + f((np.str_('multiply'), np.str_('x313'), np.str_('x326')) - (np.str_('cosine'), np.str_('constant_2.0'))) + f((np.str_('multiply'), np.str_('x531'), np.str_('x245')) - (np.str_('subtract'), np.str_('x327'), np.str_('constant_5.0'))) + f((np.str_('multiply'), np.str_('x316'), np.str_('x379')) - (np.str_('cosine'), np.str_('x175'))) + f((np.str_('subtract'), np.str_('x417'), np.str_('x511')) - (np.str_('subtract'), np.str_('x139'), np.str_('constant_2.0'))) + f((np.str_('add'), np.str_('x496'), np.str_('x592')) - (np.str_('subtract'), np.str_('x150'), np.str_('x276'))) + f((np.str_('multiply'), np.str_('x132'), np.str_('x248')) - (np.str_('subtract'), np.str_('x616'), np.str_(

In [7]:
# Get the prediction of the best individual on the test set
predictions = final_tree.predict(X_test)

# Compute and print the RMSE on the test set
print(float(rmse(y_true=y_test, y_pred=predictions)))

50.721195220947266


----

### 3. Upload and use a new dataset, download it from: https://epistasislab.github.io/pmlb/

Actuall Github directory: https://github.com/EpistasisLab/pmlb/blob/master/datasets/1089_USCrime/1089_USCrime.tsv.gz

In [1]:
import pandas as pd
import os
import torch
from slim_gsgp.utils.utils import train_test_split
from slim_gsgp.datasets.data_loader import load_uscrime


In [None]:
X, y = load_uscrime(X_y=True)

In [3]:
X ,y

(tensor([[7.9100e+01, 1.5100e+02, 1.0000e+00, 9.1000e+01, 5.8000e+01, 5.6000e+01,
          5.1000e+02, 9.5000e+02, 3.3000e+01, 3.0100e+02, 1.0800e+02, 4.1000e+01,
          3.9400e+02],
         [1.6350e+02, 1.4300e+02, 0.0000e+00, 1.1300e+02, 1.0300e+02, 9.5000e+01,
          5.8300e+02, 1.0120e+03, 1.3000e+01, 1.0200e+02, 9.6000e+01, 3.6000e+01,
          5.5700e+02],
         [5.7800e+01, 1.4200e+02, 1.0000e+00, 8.9000e+01, 4.5000e+01, 4.4000e+01,
          5.3300e+02, 9.6900e+02, 1.8000e+01, 2.1900e+02, 9.4000e+01, 3.3000e+01,
          3.1800e+02],
         [1.9690e+02, 1.3600e+02, 0.0000e+00, 1.2100e+02, 1.4900e+02, 1.4100e+02,
          5.7700e+02, 9.9400e+02, 1.5700e+02, 8.0000e+01, 1.0200e+02, 3.9000e+01,
          6.7300e+02],
         [1.2340e+02, 1.4100e+02, 0.0000e+00, 1.2100e+02, 1.0900e+02, 1.0100e+02,
          5.9100e+02, 9.8500e+02, 1.8000e+01, 3.0000e+01, 9.1000e+01, 2.0000e+01,
          5.7800e+02],
         [6.8200e+01, 1.2100e+02, 0.0000e+00, 1.1000e+02, 1.1800e

In [4]:
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, p_test=0.4)

# Split the test set into validation and test sets
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, p_test=0.5)

#### Testing on SLIM:

In [5]:
from slim_gsgp.main_slim import slim  # import the slim library
from slim_gsgp.datasets.data_loader import load_ppb  # import the loader for the dataset PPB
from slim_gsgp.evaluators.fitness_functions import rmse  # import the rmse fitness metric
from slim_gsgp.utils.utils import show_individual  # import the train-test split function
from slim_gsgp.utils.utils import generate_random_uniform 

In [6]:
# Apply the SLIM GSGP algorithm
final_tree = slim(X_train=X_train, y_train=y_train,
                  X_test=X_val, y_test=y_val,
                  dataset_name='ppb', slim_version='SLIM+SIG2', pop_size=100, n_iter=100,
                  ms_lower=0, ms_upper=1, p_inflate=0.5)

Verbose Reporter
-----------------------------------------------------------------------------------------------------------------------------------------
|         Dataset         |  Generation  |     Train Fitness     |       Test Fitness       |        Timing          |      Nodes       |
-----------------------------------------------------------------------------------------------------------------------------------------
|     ppb                 |       0      |   86.95361328125      |   67.63053131103516      |   0.023955345153808594 |      3           |
|     ppb                 |       1      |   86.95361328125      |   67.63053131103516      |   0.011518001556396484 |      3           |
|     ppb                 |       2      |   86.95361328125      |   67.63053131103516      |   0.010486125946044922 |      3           |
|     ppb                 |       3      |   86.93455505371094   |   67.63024139404297      |   0.010703086853027344 |      16          |
|     ppb        

In [7]:
show_individual(final_tree, 'sum')

"(np.str_('add'), np.str_('x10'), np.str_('x8')) + f((np.str_('divide'), np.str_('x8'), np.str_('x2')) - (np.str_('multiply'), np.str_('x0'), np.str_('constant__1.0'))) + f((np.str_('divide'), np.str_('x3'), np.str_('constant_3.0')) - (np.str_('cosine'), np.str_('x3'))) + f((np.str_('subtract'), (np.str_('multiply'), np.str_('x7'), np.str_('constant_2.0')), np.str_('x12')) - (np.str_('subtract'), np.str_('constant_5.0'), (np.str_('multiply'), np.str_('x1'), np.str_('x11')))) + f((np.str_('mean'), np.str_('x10'), np.str_('x2')) - (np.str_('subtract'), (np.str_('cosine'), np.str_('x4')), np.str_('x7'))) + f((np.str_('multiply'), np.str_('constant_3.0'), np.str_('x0')) - (np.str_('subtract'), (np.str_('mean'), np.str_('x2'), np.str_('x9')), np.str_('x6'))) + f((np.str_('divide'), np.str_('constant_3.0'), (np.str_('add'), np.str_('constant__1.0'), (np.str_('mean'), np.str_('x11'), np.str_('x3')))) - (np.str_('subtract'), (np.str_('cosine'), (np.str_('cosine'), np.str_('x7'))), (np.str_('me

In [9]:
# Get the prediction of the best individual on the test set
predictions = final_tree.predict(X_test)

# Compute and print the RMSE on the test set
print(float(rmse(y_true=y_test, y_pred=predictions)))

96.54297637939453


---

## **Part 2: SLIM_GSGP**

### 1. Create a new logger level, log = 10, where you store the fitness and size of the second best individual and the worst individual, make sure that everything that is being saved with log = 1 is also saved.

What needs to be done:
- log selection is changed on **slim_config.py** *slim_gsgp_solve_parameters*
- log == 10 was added into slim_gsgp.solve().

Testing on SLIM:

In [1]:
from slim_gsgp.main_slim import slim  # import the slim library
from slim_gsgp.datasets.data_loader import load_airfoil, load_ppb # import the loader for the dataset PPB
from slim_gsgp.evaluators.fitness_functions import rmse  # import the rmse fitness metric
from slim_gsgp.utils.utils import train_test_split, show_individual  # import the train-test split function
from slim_gsgp.utils.utils import generate_random_uniform  # import the mutation step function

In [4]:
X, y = load_airfoil(X_y=True)
#X, y = load_ppb(X_y=True)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, p_test=0.4)

# Split the test set into validation and test sets
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, p_test=0.5)

In [5]:
# Apply the SLIM GSGP algorithm
final_tree = slim(X_train=X_train, y_train=y_train,
                  X_test=X_val, y_test=y_val,
                  dataset_name='ppb', slim_version='SLIM+SIG2', pop_size=100, n_iter=100,
                  ms_lower=0, ms_upper=1, p_inflate=0.5)

# Show the best individual structure at the last generation
print(show_individual(final_tree, operator='sum'))

# Get the prediction of the best individual on the test set
predictions = final_tree.predict(X_test)

# Compute and print the RMSE on the test set
print(float(rmse(y_true=y_test, y_pred=predictions)))

Verbose Reporter
-----------------------------------------------------------------------------------------------------------------------------------------
|         Dataset         |  Generation  |     Train Fitness     |       Test Fitness       |        Timing          |      Nodes       |
-----------------------------------------------------------------------------------------------------------------------------------------
|     ppb                 |       0      |   38.03727722167969   |   39.9749641418457       |   0.029391765594482422 |      3           |
|     ppb                 |       1      |   38.03727722167969   |   39.9749641418457       |   0.017300844192504883 |      3           |
|     ppb                 |       2      |   37.77348327636719   |   39.68068313598633      |   0.015926837921142578 |      14          |
|     ppb                 |       3      |   37.53711700439453   |   39.4355354309082       |   0.013244867324829102 |      46          |
|     ppb        

### 2. Create and use a crossover operator swaps the first block of the two parent trees.


- To create a Crossover Operator that swaps the first of the two blocks of the two parent trees, we first need to understand the way trees are saved in SLIM.
- SLIM is using GSGP's Geometric Crossover, so the approach should be similar to that.

If we check *slim_gsgp.py*, we see that crossover is not yet applied, so this algorithm is always running without crossover. This being said, part 1 of implementing a new crossover is to make sure regular geometric_xo works correctly on SLIM_GSGP.

**STEP 1: Adding Crossover to SLIM_GSGP algorithm code:**

For this approach, we will consider the most general way possible to implement crossover in slim_gsgp. What this means it that, independetly of the crossover used, the goal is to be able to simply create/change crossover operator and let the algorithm be fully functional, with no need to change the base algorithm.

To do so, the most common inputs of crossovers need to be considered, namely: random_tree, tree1, tree2, if we are testing (testing), if there is new_data (new_data).\
In the same way, we also need to consider the possibility of having only 1 output tree (geometric_semantic_crossover) or 2.

**Possible solutions:**
1.  Consider always the most outside level possible, that is, always generate all necessary inputs specified above.
2. Check if there is any possibility of checking the necessary inputs of the operator used.

Start by checking **option 2**:

This way, we can easily iterate over each needed parameter.

**SWAP_BASE_CROSSOVER:**

- The only thing I have not done is check the **max_depth** after crossover, as I don't know how to deal with it (with inflate we simply do a copy_parent or deflate).

In [1]:
from slim_gsgp.main_slim import slim  # import the slim library
from slim_gsgp.datasets.data_loader import load_airfoil, load_ppb # import the loader for the dataset PPB
from slim_gsgp.evaluators.fitness_functions import rmse  # import the rmse fitness metric
from slim_gsgp.utils.utils import train_test_split, show_individual  # import the train-test split function
from slim_gsgp.utils.utils import generate_random_uniform  # import the mutation step function

In [2]:
#X, y = load_airfoil(X_y=True)
X, y = load_ppb(X_y=True)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, p_test=0.4)

# Split the test set into validation and test sets
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, p_test=0.5)#

In [3]:
# Apply the SLIM GSGP algorithm
final_tree = slim(X_train=X_train, y_train=y_train,
                  X_test=X_val, y_test=y_val,
                  dataset_name='ppb', slim_version='SLIM+SIG2', pop_size=100, n_iter=100,
                  ms_lower=0, ms_upper=1, p_inflate=0.5)

# Show the best individual structure at the last generation
print(show_individual(final_tree, operator='sum'))

# Get the prediction of the best individual on the test set
predictions = final_tree.predict(X_test)

# Compute and print the RMSE on the test set
print(float(rmse(y_true=y_test, y_pred=predictions)))

Verbose Reporter
-----------------------------------------------------------------------------------------------------------------------------------------
|         Dataset         |  Generation  |     Train Fitness     |       Test Fitness       |        Timing          |      Nodes       |
-----------------------------------------------------------------------------------------------------------------------------------------
|     ppb                 |       0      |   44.542259216308594  |   57.63134765625         |   0.09506916999816895  |      3           |
|     ppb                 |       1      |   44.540802001953125  |   57.631412506103516     |   0.024104833602905273 |      13          |
|     ppb                 |       2      |   44.540802001953125  |   57.631412506103516     |   0.021932125091552734 |      13          |
|     ppb                 |       3      |   44.520267486572266  |   57.63651657104492      |   0.02266216278076172  |      13          |
|     ppb        

**GEOMETRIC_CROSSOVER**

My biggest question is how geometric_crossover actually works with the Individuals class, that have multiple trees inside [T1, T2, T3...]

- Where i am today is that i was assuming that we were always dealing with trees of size = 1, that is, using only 1 tree, when in fact SLIM_GSGP works with individuals that are lists of trees and we should take into consideration all the trees we have. This being said, i have actually no idea how Geometric_XO is aplied when we have multiple trees in an individual.... if we have [T1, T2, T3] and [T4, T5], how do we crossover these to individuals, creating only one final individual?
-  A possible solution is to select a tree from each individual and perform geometric xo to it, but it doesnt really make sense as you would always have individuals with size = 1.

### 3. Create and use a new selection algorithm that takes into account both the fitness and the number of nodes (could be nested tournament selection)

To take into account both fitness (ind.fitness) and number of nodes (ind.nodes_count) we can simply use a nested tournament selection.\
In this selection, we will first select a population of individuals that have the "best fitness" and from that population, we select the indiidual with the smallest number of nodes. In this case, if the pool_size is 2, we select the 2 individuals with best fitness and from those, the node with the smallest number of nodes.

**NOTE:** To actually apply nested_selection, you need to change both consig and main slim....

In [1]:
from slim_gsgp.main_slim import slim  # import the slim library
from slim_gsgp.datasets.data_loader import load_airfoil, load_ppb # import the loader for the dataset PPB
from slim_gsgp.evaluators.fitness_functions import rmse  # import the rmse fitness metric
from slim_gsgp.utils.utils import train_test_split, show_individual  # import the train-test split function
from slim_gsgp.utils.utils import generate_random_uniform  # import the mutation step function

In [2]:
#X, y = load_airfoil(X_y=True)
X, y = load_ppb(X_y=True)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, p_test=0.4)

# Split the test set into validation and test sets
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, p_test=0.5)#

In [3]:
# Apply the SLIM GSGP algorithm
final_tree = slim(X_train=X_train, y_train=y_train,
                  X_test=X_val, y_test=y_val,
                  dataset_name='ppb', slim_version='SLIM+SIG2', pop_size=100, n_iter=100,
                  ms_lower=0, ms_upper=1, p_inflate=0.5)

# Show the best individual structure at the last generation
print(show_individual(final_tree, operator='sum'))

# Get the prediction of the best individual on the test set
predictions = final_tree.predict(X_test)

# Compute and print the RMSE on the test set
print(float(rmse(y_true=y_test, y_pred=predictions)))

Verbose Reporter
-----------------------------------------------------------------------------------------------------------------------------------------
|         Dataset         |  Generation  |     Train Fitness     |       Test Fitness       |        Timing          |      Nodes       |
-----------------------------------------------------------------------------------------------------------------------------------------
|     ppb                 |       0      |   44.542259216308594  |   57.63134765625         |   0.06141996383666992  |      3           |
|     ppb                 |       1      |   44.540802001953125  |   57.631412506103516     |   0.026263713836669922 |      13          |
|     ppb                 |       2      |   44.540802001953125  |   57.631412506103516     |   0.020376920700073242 |      13          |
|     ppb                 |       3      |   44.540802001953125  |   57.631412506103516     |   0.02352309226989746  |      13          |
|     ppb        