# Solving supervised machine learning problems (from the perspective of inductive programming).
Inductive program synthesis (aka inductive programming) is a subfield in the program synthesis that studies program generation from incomplete information, namely from the examples for the desired input/output behavior of the program. Genetic programming (GP) is one of the numerous approaches for the inductive synthesis characterized by performing the search in the space of syntactically correct programs of a given programming language.

In the context of supervised machine learning (SML) problem-solving, one can define the task of a GP algorithm as the program/function induction from input/output examples that identifies the mapping $f:S\mapsto R$ in the best possible way, generally measured through solution’s generalization ability on previously unseen data.

## SML problem type.
Given the definitions provided above and in order to make it possible to perform automatic induction of programs from the input/output-examples, we have conceptualized a module called ``inductive_programming`` which contains different problem types, materialized as classes. One of them, called ``SML``, a subclass of ``Problem``, aims at supporting the SML problem-solving (specifically the symbolic regression and binary classification by means of GP).

<img src="https://upload.wikimedia.org/wikipedia/commons/7/77/Genetic_Program_Tree.png" alt="Drawing" style="width: 300px;"/>

# 1. Create an instance of ``SML``.

Loads the necessary classes and functions.

In [67]:
# Imports PyTorch
import torch
# Imports problems
from torch.utils.data import TensorDataset, DataLoader
from gpol.problems.inductive_programming import SML
from gpol.utils.datasets import load_boston
from gpol.utils.utils import train_test_split, rmse
from gpol.utils.inductive_programming import function_map
# Imports metaheuristics 
from gpol.algorithms.random_search import RandomSearch
from gpol.algorithms.local_search import HillClimbing, SimulatedAnnealing
from gpol.algorithms.genetic_algorithm import GeneticAlgorithm
# Imports operators
from gpol.operators.initializers import rhh, prm_full, grow
from gpol.operators.selectors import prm_tournament
from gpol.operators.variators import prm_subtree_mtn, swap_xo

Creates an instance of ``SML`` problem. The search space (*S*) of an instance of ``SML`` problem consists of the following key-value pairs:
- ``"n_dims"`` is the number of input features (aka input dimensions) in the underlying ``SML`` problem's instance;
- ``"function set"`` is the set of primitive functions;
- ``"constant set"`` is the set of constants to draw terminals from;
- ``"p_constants"`` is the probability of generating a constant when sampling a terminal;
-  ``max_init_depth`` is the trees’ maximum depth during the initialization;
-  ``max_depth`` is the trees’ maximum depth during the evolution; and
-  ``n_batches`` is number of batches to use when evaluating solutions (more than one can be used).

In [68]:
# Defines the processing device and random state 's seed
device, seed, p_test = 'cpu', 0, 0.3
# Loads the data
X, y = load_boston(X_y=True)
# Defines parameters for the data usage
batch_size, shuffle, p_test = 50, True, 0.3
# Performs train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, p_test=p_test, seed=seed)
# Creates training and test data sets
ds_train = TensorDataset(X_train, y_train)
ds_test = TensorDataset(X_test, y_test)
# Creates training and test data loaders
dl_train = DataLoader(dataset=ds_train, batch_size=batch_size, shuffle=shuffle)
dl_test = DataLoader(dataset=ds_test, batch_size=batch_size, shuffle=shuffle)
# Characterizes the program elements: function and constant sets
function_set = [function_map["add"], function_map["sub"], function_map["mul"], function_map["div"]]
constant_set = torch.tensor([-1.0, -0.5, 0.5, 1.0], device=device)
# Defines the search space
sspace = {"n_dims": X.shape[1], "function_set": function_set, "constant_set": constant_set, "p_constants": 0.1, 
          "max_init_depth": 5, "max_depth": 15, "n_batches": 1}
# Creates an instanec of SML
pi = SML(sspace, rmse, dl_train, dl_test, min_=True)

# 2. Choose and parametrize the algorithms.

## 2.1. Random search (RS).
The random search (RS) can be seen as thethe first rudimentary stochastic metaheuristic for problem-solving. Its strategy, far away from being *intelligent*, consists of randomly sampling $S$ for a given number of iterations. As such, the only search-parameter of an instance of ``RandomSearch`` is the initialization function (the ``initializer``). The function ``grow`` stands for the Grow initialization technique and returns a single tree with maximum initial depth equal to ``max_init_depth``; the method assumes the probability of sampling a program element from the set of functions is the same as from the set of terminals until achieving the maximum depth.

The cell in below creates a dictionary called ``pars`` which stores algorithms' parameters. Each key-value pair stores the algorithm's type and a dictionary of respective search-parameters. The first key-value pair regards the RS.

In [69]:
# Defines a single-point (SP) initializer
sp_init = grow
# Defines RS's parameters
pars = {RandomSearch: {"initializer": sp_init}}

## 2.2. Hill climbing (HC).
The local search (LS) algorithms can be seen among the first intelligent search strategies that improve the functioning of the RS. They rely upon the concept of neighborhood which is explored at each iteration by sampling from $S$ a limited number of neighbors of the best-so-far solution. Usually, the LS algorithms are divided in two branches. In the first branch, called hill climbing (HC), or hill descent for the minimization problems, the best-so-far solution is replaced by its neighbor when the latter is at least as good as the former.

The cell in below adds ``HillClimbing`` to ``pars``. Note that, unlike it was for ``RandomSearch``, an instance of ``HillClimbing`` requires also the specification of a neighbor-generation function (``"nh_function"``) and the neighborhood's size (``"nh_size"``). Note that the very same initialization function is used for both ``RandomSearch`` and ``HillClimbing``. Also, ``"nh_size"`` equals ``pop_size`` to foster the equivalency between LS algorithms and Population-Based (PB); for the same reason, the parametrized neighbor-generation function is stored in a variable called ``mutator`` (details will be given in few sections below). 

The cell in below adds ``HillClimbing`` to ``pars``. Note that, unlike it was for ``RandomSearch``, an instance of ``HillClimbing`` also requires the specification of a neighbor-generation function (``"nh_function"``) and the neighborhood's size (``"nh_size"``). In this example, the so-called *subtree mutation* is used, an operator which replaces a random subtree in the individual by another, randomly generated tree; the latter is generated by means of Full initialization technique using the $S$ defined for the problem's instance (``initializer=prm_full(sspace)``). Note that the very same initialization function is used for both ``RandomSearch`` and ``HillClimbing``. 

In [70]:
# Defines the size of the population/neighborhood 
nh_size = 500
# Defines neighbor-generation function with the respective parameters
nh_function = prm_subtree_mtn(initializer=prm_full(sspace))
# Defines HC's parameters
pars[HillClimbing] = {"initializer": sp_init, "nh_function": nh_function, "nh_size": nh_size}

## 2.3. Simulated annealing (SA).
The second branch, called simulated annealing (SA), extends HC by formulating a non-negative probability of replacing the best-so-far solution by its neighbor when the latter is worse. Traditionally, such a probability is small and decreases as the search advances. The strategy adopted by SA is especially useful when the search is prematurely tagnated at a locally sub-optimal point in $S$.

The cell in below adds ``SimulatedAnnealing`` to ``pars``. 

In [71]:
# Defines SA's parameters
pars[SimulatedAnnealing] = {"initializer": sp_init, "nh_function": nh_function, "nh_size": nh_size, "control": 1.0, "update_rate": 0.9}

## 2.4. Genetic programming (GP) as genetic algorithm (GA).
Based on the number of candidate solutions they handle at each step, the metaheuristics can be categorized into single-point (SP) and population-based (PB) approaches. 
The search procedure in the SP metaheuristics is generally guided by the information provided by a single candidate solution from $S$, usually the best-so-far solution, that is gradually evolved in a well-defined manner in hope to find the global optimum. The abovementioned HC and SA are examples of SP metaheuristics as the search is performed by exploring the neighborhood $N(i)$, where $i$ is the current best solution. Contrarily, the search procedure in PB metaheuristics is generally guided by the information shared by a set of candidate solutions and the exploitation of its collective behavior in different ways. In abstract terms, one can say that every PB metaheuristics shares, at least, the following two features: an object representing the set of simultaneously exploited candidate solutions (i.e., the population), and a procedure to *move* them across $S$.

Genetic Algorithm (GAs) is a meta-heuristic introduced by J. Holland which was strongly inspired by Darwin's theory of evolution by means of natural selection. Conceptually, the algorithm starts with a random-like population of candidate solutions (called *chromosomes*). Then, by mimicking the natural selection and genetically inspired variation operators, such as the crossover and the mutation, the algorithm breeds a population of the next-generation candidate solutions (called the *offspring population*), that replaces the previous population (a.k.a. the *parent population*). This procedure is iterated until reaching some stopping criteria, like a maximum
number of iterations (also called *generations*).

Genetic programming (GP) is a PB metaheuristic, proposed and popularized by J. Koza, which extends GAs to allow the exploration of the space of computer programs. Similar to other evolutionary algorithms (EAs), GP evolves a set of candidate solutions (the population) by mimicking the basic principles of Darwinian evolution. The evolutionary process involves fitness-based selection of the candidate solutions and their variation by means of genetically-inspired operators (such as the crossover and the mutation). If abstracted from some implementation details, GP can be seen as
GA, in which initialization and variation operators were specifically adjusted to work upon tree-based representations of candidate solutions (this idea was inspired by the LISP programming language, in which programs and data structures are represented as trees). Concretely, programs are defined using two sets: a set of primitive functions, which appear as the internal nodes of the trees, and a set of terminals, which represent the leaves of the trees. In the context of SML problem-solving, the trees represent mathematical expressions in the so-called Polish prefix notation, in which the operators (primitive functions) precede their operands (terminals). Given that the initialization, selection, and variation operators are provided as constructor parameters to solve a specific problem type, one can create an instance of GeneticAlgorithm to solve potentially any kind of problem, whether it is of continuous, combinatorial, or inductive program synthesis nature. The only two things one has to take into consideration are the correct specification of the problem-specific S and the operators. Following this perspective, by creating an instance of the class GeneticAlgorithm with, for example, ramped half-and-half (RHH) initialization, tournament selection, swap crossover and sub-tree mutation, all of them implemented in this library, one obtains a standard GP algorithm. Recall that a similar flexible behaviour is present in the branch of LS algorithms. By providing HC or SA with, for example, grow initialization and sub-tree mutation, one obtains a LS-based program induction algorithm.

The cell in below adds ``GeneticAlgorithm`` to ``pars``. It uses the Ramped Half-n-Half initialization (``rhh``), that returns a list of randomly generated trees. Note that the ``"mutator"`` key is assigned the very same function as the ``"nh_function"`` in the aforementioned LS algorithms, and the population's size is equivalent to neighborhood's size. 

In [72]:
# Defines a population-based (PB) initializer
pb_init = rhh
# Defines GA's parameters
pars[GeneticAlgorithm] = {"pop_size": nh_size, "initializer": pb_init, "selector": prm_tournament(pressure=0.1), "mutator": nh_function,
                          "crossover": swap_xo, "p_m": 0.3, "p_c": 0.7, "elitism": True, "reproduction": False}

# 3. Executes the experiment.

Note that *many* parameters and functions are shared across different algorithms in the experiment. This allows to increase the control and comparability between different algorithmic approaches when solving a given problem's instance.

In [73]:
for isa_type, isa_pars in pars.items():
    print(isa_type)
    for p_name, p_val in isa_pars.items():
        print("\t", p_name, p_val)

<class 'gpol.algorithms.random_search.RandomSearch'>
	 initializer <function grow at 0x0000023F77957430>
<class 'gpol.algorithms.local_search.HillClimbing'>
	 initializer <function grow at 0x0000023F77957430>
	 nh_function <function prm_subtree_mtn.<locals>.subtree_mtn at 0x0000023F77B20820>
	 nh_size 500
<class 'gpol.algorithms.local_search.SimulatedAnnealing'>
	 initializer <function grow at 0x0000023F77957430>
	 nh_function <function prm_subtree_mtn.<locals>.subtree_mtn at 0x0000023F77B20820>
	 nh_size 500
	 control 1.0
	 update_rate 0.9
<class 'gpol.algorithms.genetic_algorithm.GeneticAlgorithm'>
	 pop_size 500
	 initializer <function rhh at 0x0000023F77957670>
	 selector <function prm_tournament.<locals>.tournament at 0x0000023F77B20280>
	 mutator <function prm_subtree_mtn.<locals>.subtree_mtn at 0x0000023F77B20820>
	 crossover <function swap_xo at 0x0000023F779530D0>
	 p_m 0.3
	 p_c 0.7
	 elitism True
	 reproduction False


Defines the computational resources for the experiment: the number of iterations.

In [74]:
n_iter = 30

Loops the afore-defined ``pars`` dictionary containing algorithms' and the underlying parameters. Note that besides algorithm-specific parameters, the constructor of an instance of a search algorithm also receives the random state to initialize a pseudorandom number generator (called ``seed``), and the specification of the processing ``device`` (either CPU or GPU).

The ``solve`` method has the same signature for all the search algorithms and, in this example, includes the following parameters: 
-  ``n_iter``: number of iterations to conduct the search;
-  ``tol``: minimum required fitness improvement for ``n_iter_tol`` consecutive iterations to continue the search. When the fitness is not improving by at least ``tol`` for ``n_iter_tol`` consecutive iterations, the search will be automatically interrupted;
-  ``n_iter_tol``: maximum number of iterations to not meet ``tol`` improvement;
-  ``verbose``: verbosity's detail-level;
-  ``log``: log-files' detail-level (if exists).

In [75]:
for isa_type, isa_pars in pars.items():
    isa = isa_type(pi=pi, **isa_pars, seed=seed, device=device)
    # n_iter*pop_size if isinstance(isa, RandomSearch) else n_iter  # equivalency for the RS
    isa.solve(n_iter=n_iter, tol=0.1, n_iter_tol=5, test_elite=True, verbose=2, log=0)
    print("Algorithm: {}".format(isa_type.__name__))
    print("Best solution's fitness: {:.3f}".format(isa.best_sol.fit))
    print("Best solution:", isa.best_sol.repr_, end="\n\n")

------------------------------------------------------------------
           |                    Best solution                    |
------------------------------------------------------------------
Generation   Length   Fitness          Test Fitness         Timing
0            7        32.4791          36.3134               0.005
1            7        32.4791          35.7816               0.001
2            7        32.4791          32.71                 0.001
3            15       15.4184          14.1555               0.001
4            15       15.4184          10.8436               0.001
5            15       15.4184          13.3441               0.001
6            15       15.4184          16.0714               0.001
7            15       15.4184          13.7092               0.001
8            15       15.4184          14.2241               0.001
Algorithm: RandomSearch
Best solution's fitness: 15.418
Best solution: [mul, mul, tensor(1.), 4, sub, add, 1, sub, add, 8, 3, div