### Introduction
The purpose of this code is to demonstrate how to define and use custom primitives in the Evolutionary Forest regression model.

Evolutionary Forest is a machine learning algorithm that combines genetic programming with decision trees to create an ensemble model for regression tasks. It uses genetic programming to evolve a set of complex features, with each feature represented as a computer program. Each program contains primitives, which represent functions or operators that can be used to build complex features.

By default, Evolutionary Forest comes with a set of built-in primitives, such as addition, subtraction, multiplication, and division. However, users can define their own custom primitives to build more complex features.

The code first trains an instance of the Evolutionary Forest regression model using the default set of primitives. It then defines a new custom primitive, `Sin`, which computes the sine of a given value. The `custom_primitive` argument of the `EvolutionaryForestRegressor` constructor is used to pass a dictionary to specify custom primitives.

To begin, we trained an instance of the Evolutionary Forest regression model using the default set of primitives. The experimental results show an $R^2$ score of 0.72.

In [2]:
import sys
sys.path.insert(0, '../')

import random
import numpy as np
from sklearn.datasets import make_friedman1
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

from evolutionary_forest.forest import EvolutionaryForestRegressor

random.seed(0)
np.random.seed(0)

# Generate dataset
X, y = make_friedman1(n_samples=100, n_features=5, random_state=0)
# Split dataset
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train Evolutionary Forest
r = EvolutionaryForestRegressor(max_height=5, normalize=True, select='AutomaticLexicase',
                                gene_num=10, boost_size=100, n_gen=20, n_pop=200, cross_pb=1,
                                base_learner='Random-DT', verbose=True, n_process=1)
r.fit(x_train, y_train)
print(r2_score(y_test, r.predict(x_test)))

data shape (80, 5) (80,)
   	      	                                                          fitness                                                           	                                  size                                  
   	      	----------------------------------------------------------------------------------------------------------------------------	------------------------------------------------------------------------
gen	nevals	25%         	75%         	avg         	gen	max         	median      	min         	nevals	std         	25%	75%	avg  	gen	max	median	min	nevals	std     
0  	200   	[0.19012237]	[0.43453516]	[0.27484596]	0  	[0.66708961]	[0.33572612]	[-0.7157414]	200   	[0.25761531]	4  	4  	3.995	0  	6  	4     	3  	200   	0.644186
defaultdict(<class 'int'>, {'1': 200})
P value of different population 0.14328738474412253
analytical_quotient(-1, analytical_quotient(-1, 1))
multiply(ARG1, ARG0)
add(analytical_quotient(ARG1, ARG4), analytical_quotient(ARG0, ARG0))

### Custom Defined Primitives

Now, we use the `custom_primitive` argument of the `EvolutionaryForestRegressor` constructor to pass a dictionary that maps each custom primitive name to a tuple containing the corresponding function and the number of arguments it takes.

Next, we train a new instance of the Evolutionary Forest regression model using the custom primitives and evaluate its performance on the test set using the $R^2$ score. The experimental results demonstrate an improvement in performance, with the $R^2$ score increasing from 0.72 to 0.73. This improvement indicates the effectiveness of using custom primitives in the Evolutionary Forest regression model to enhance its performance on regression tasks.

In [3]:
# Define Custom Primitives
r = EvolutionaryForestRegressor(max_height=5, normalize=True, select='AutomaticLexicase',
                                gene_num=10, boost_size=100, n_gen=20, n_pop=200, cross_pb=1,
                                base_learner='Random-DT', verbose=True, n_process=1,
                                basic_primitive='Add,Mul,Div',
                                custom_primitive={
                                    'Sin': (np.sin, 1)
                                })
r.fit(x_train, y_train)
print(r2_score(y_test, r.predict(x_test)))

data shape (80, 5) (80,)
   	      	                                                          fitness                                                           	                                  size                                  
   	      	----------------------------------------------------------------------------------------------------------------------------	------------------------------------------------------------------------
gen	nevals	25%         	75%         	avg         	gen	max         	median      	min          	nevals	std         	25%	75%	avg 	gen	max	median	min	nevals	std     
0  	200   	[0.14329716]	[0.42592341]	[0.25911083]	0  	[0.62619761]	[0.33537471]	[-0.81933259]	200   	[0.24130037]	4  	4  	4.02	0  	6  	4     	3  	200   	0.647765
defaultdict(<class 'int'>, {'1': 200})
P value of different population 2.3541591284105273e-05
subtract(0, ARG3)
analytical_quotient(-1, ARG3)
subtract(1, ARG4)
subtract(ARG3, ARG0)
analytical_quotient(multiply(ARG0, ARG1), subtract(A

In summary, this code demonstrates how to extend the functionality of the Evolutionary Forest regression model by defining and using custom primitives. This is useful for tackling more complex regression problems.