# Control parameters

This notebook explains and demonstrates use of control parameters
available in `ProcessOptimizer`.

Control parameters are the independent variables that can be controlled in the optimization.
Examples include the amount of an ingredient to add, how long time a certain certain
process is performed, at what temperature, and with what equipment. Control parameters
are contrasted by quality parameters, which are dependent variables measuring the outcome
of the process.

The complete set of parameters needed to define one experiment is a member of a 
search space of the class `Space`. A `Space` consists of a number of `Dimension`s.
Each `Dimension` represents the possible values for one independent variable.
Independent variables of types `Real`, `Integer` and `Categorical` are supported.

`Dimension`s can be created directly or through the creation of a `Space`. `Space`s
can either be initialized explicitly, or be created when initializing an `Optimizer`.

You can sample from `Space`s in different ways, e.g. by random sampling with
the method `Space.rvs`.

In [1]:
import ProcessOptimizer as po

explicit_dimensions = po.Space(
    dimensions=[
        po.Real(1., 10.),
        po.Integer(1, 10),
        po.Categorical(["cat", "dog", "elephant"]),
    ]
)
print(f"Explicitly defined dimensions:\n{explicit_dimensions}\n")

space_definition = [[1., 10.], [1, 10], ["cat", "dog", "elephant"]]
explicit_space = po.Space(dimensions=space_definition)
print(f"Explicitly defined space:\n{explicit_space}\n")

opt = po.Optimizer(dimensions=space_definition)
print(f"Implicitly defined space:\n{opt.space}")

random_samples = explicit_space.rvs(5)
print(f"\nRandom sampling from the space:\n{random_samples}")


Explicitly defined dimensions:
Space([Real(low=1.0, high=10.0, prior='uniform', transform='identity'),
       Integer(low=1, high=10),
       Categorical(categories=('cat', 'dog', 'elephant'), prior=None)])

Explicitly defined space:
Space([Real(low=1.0, high=10.0, prior='uniform', transform='identity'),
       Integer(low=1, high=10),
       Categorical(categories=('cat', 'dog', 'elephant'), prior=None)])

Implicitly defined space:
Space([Real(low=1.0, high=10.0, prior='uniform', transform='normalize'),
       Integer(low=1, high=10),
       Categorical(categories=('cat', 'dog', 'elephant'), prior=None)])

Random sampling from the space:
[[9.679015411860078, 4, 'dog'], [3.773620807245952, 6, 'dog'], [7.372177924314652, 5, 'elephant'], [5.690141650801657, 8, 'cat'], [3.2249261316345867, 9, 'cat']]


## Real dimensions

A `Real` dimension represents the variable as a real (floating point) number
between an upper and a lower limit. It has a prior probability that is either
uniform (default) or logarithmically uniform.

In [2]:
import numpy as np
import ProcessOptimizer as po
from bokeh.io import output_notebook
from bokeh.plotting import figure, show

output_notebook()

space_with_uniform_prior = po.Space(
    dimensions=[
        po.Real(low=1.0, high=11.0, prior="uniform")
    ]
)
print(f"Space with one real dimension with uniform prior:\n{space_with_uniform_prior}")
space_with_log_uniform_prior = po.Space(
    [
        po.Real(low=1.0, high=11.0, prior="log-uniform")
    ]
)
print(f"Space one real dimension with log-uniform prior:\n{space_with_log_uniform_prior}\n")

random_sample_uniform = space_with_uniform_prior.rvs(n_samples=5)
print(f"Random samples with uniform prior:\n{random_sample_uniform}")
random_sample_log = space_with_log_uniform_prior.rvs(n_samples=5)
print(f"Random samples with log-uniform prior:\n{random_sample_log}\n")

lhs_sample_uniform = space_with_uniform_prior.lhs(n=5)
print(f"LHS with uniform prior:\n{lhs_sample_uniform}")
lhs_sample_log = space_with_log_uniform_prior.lhs(n=5)
print(f"LHS with log-uniform prior:\n{lhs_sample_log}")

# Making histograms of random samples
large_random_sample_uniform = space_with_uniform_prior.rvs(n_samples=10000)
bins = np.linspace(1, 11, 100)
hist, edges = np.histogram(large_random_sample_uniform, bins=bins, density=True)
fig = figure(title="Histogram of 10 000 random samples with uniform prior", x_axis_label="Value", y_axis_label="Density")
fig.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], line_color="white")
show(fig)

large_random_sample_log = space_with_log_uniform_prior.rvs(n_samples=10000)
bins = np.linspace(1, 11, 100)
hist, edges = np.histogram(large_random_sample_log, bins=bins, density=True)
fig = figure(title="Histogram of 10 000 random samples with log-uniform prior", x_axis_label="Value", y_axis_label="Density")
fig.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], line_color="white")
show(fig)


Space with one real dimension with uniform prior:
Space([Real(low=1.0, high=11.0, prior='uniform', transform='identity')])
Space one real dimension with log-uniform prior:
Space([Real(low=1.0, high=11.0, prior='log-uniform', transform='identity')])

Random samples with uniform prior:
[[7.48806983290016], [3.879737661907339], [6.055068096842593], [2.6039451751048817], [5.573384349207629]]
Random samples with log-uniform prior:
[[4.388502720891111], [2.0864651631288313], [9.758679188199931], [2.0783183555900258], [1.948339697994781]]

LHS with uniform prior:
[[10.0], [6.0], [8.0], [4.0], [2.0]]
LHS with log-uniform prior:
[[8.654727864164494], [3.3166247903554003], [5.357656669484114], [2.053136413658844], [1.2709816152101407]]


# Integer dimensions

An integer dimension represents the variable as an integer between and upper and lower limit. Both limits are inclusive.

In [3]:
import numpy as np
import ProcessOptimizer as po
from bokeh.io import output_notebook
from bokeh.plotting import figure, show

output_notebook()

integer_space = po.Space([po.Integer(low=1, high=10)])
print(f"Space with one integer dimension:\n{integer_space}\n")

random_sample_integer = integer_space.rvs(n_samples=20)
print(f"Random samples from space with one integer dimension:\n{random_sample_integer}")

large_integer_random_sample = integer_space.rvs(n_samples=1000)
bins = np.linspace(1, 10, 100)
hist, edges = np.histogram(large_integer_random_sample, bins=bins)
fig = figure(title="Histogram of 1000 random integer samples", x_axis_label="Value", y_axis_label="Number of occurrences")
fig.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], line_color="white")
show(fig)


Space with one integer dimension:
Space([Integer(low=1, high=10)])

Random samples from space with one integer dimension:
[[4], [3], [1], [7], [8], [3], [1], [6], [7], [2], [5], [3], [5], [3], [3], [2], [10], [8], [2], [5]]


# Categorical dimensions

A categorical dimension allows for the variable to have one of a list of values.

In [4]:
import numpy as np
import ProcessOptimizer as po
from bokeh.io import output_notebook
from bokeh.plotting import figure, show

output_notebook()

categories = ["cat", "dog", "elephant"]
categorical_space = po.Space([categories])
print(f"Categorical space:\n{categorical_space}\n")

random_sample_categorical = categorical_space.rvs(n_samples=10)
print(f"Random categorical samples:\n{random_sample_categorical}")

lhs_sample_categorical = categorical_space.lhs(n=5)
print(f"Categorical LHS:\n{lhs_sample_categorical}")

large_categorical_random_sample = categorical_space.rvs(n_samples=300)
unique, counts = np.unique(large_categorical_random_sample, return_counts=True)
counts = [counts[unique == category][0] for category in categories]
fig = figure(x_range = categories, title="Histogram of 300 random categorical samples", x_axis_label="Category", y_axis_label="Number of occurrences")
fig.vbar(x=unique, top=counts, width=0.5)
show(fig)

Categorical space:
Space([Categorical(categories=('cat', 'dog', 'elephant'), prior=None)])

Random categorical samples:
[['cat'], ['elephant'], ['dog'], ['dog'], ['elephant'], ['elephant'], ['dog'], ['dog'], ['elephant'], ['elephant']]
Categorical LHS:
[['elephant'], ['dog'], ['elephant'], ['cat'], ['cat']]


## Transformation

Modelling often works better in a transformed space. `Space` has the ability to transform
the points in control parameter space into a space that works better with modelling,
typically a hyper-cube of the form [0;1]^n. By default, no transform is used. If you want
a transformed space, you can use the function `po.space.normalize_dimensions`. This 
process is performed automatically if the `Space` is used in an `Optimizer` that has uses
a Gaussian Process regressor.

`Space` has two methods, `transform` and `reverse_transform` to switch between parameter
space and transformed space.

Note that `Real` and `Integer` dimensions are transformed to a one dimensional real
subspace, while `Categorical` dimensions by default transform into a subspace with as
many real dimensions as there are possible values (on-hot encoding).

Also note that `reverse_transform` on a categorical dimension isn't one-to-one
(injective), i.e. different values in transformed space can revers transform into the same
point in parameter space.

In [5]:
import numpy as np
import ProcessOptimizer as po

numerical_space = po.Space(
    dimensions=[
        po.Real(1., 10.),
        po.Real(1., 10.),
        po.Integer(1, 10),
    ]
)
space = po.space.normalize_dimensions(numerical_space)

random_point = numerical_space.rvs(1, random_state=2)
transformed_point = numerical_space.transform(random_point)

print("In a purely numerical space, transformed and original points have the same dimensionality:\n")
print(f"Random point from the numerical space with dimensionality {len(random_point[0])}:\n{random_point}\n")
print(f"Transformed point with dimensionality {len(transformed_point[0])}:\n{transformed_point}\n\n")

space = po.Space(
    dimensions=[
        po.Real(1., 10.),
        po.Integer(1, 10),
        po.Categorical(["cat", "dog", "elephant"]),
    ]
)
space = po.space.normalize_dimensions(space)

random_point = space.rvs(1, random_state=2)
transformed_point = space.transform(random_point)

print("In a space with categorical dimension, transformed and original points do not have the same dimensionality:\n")
print(f"Random point from the space with dimensionality {len(random_point[0])}:\n{random_point}\n")
print(f"Transformed point with dimensionality {len(transformed_point[0])}:\n{transformed_point}\n\n")

native_transformed_point = np.array([[0.1, 0, 0.2, 0.5, 0.1]])
inverse_transformed_point = space.inverse_transform(native_transformed_point)

print("Inverse transformation of a point in a space with categorical dimension:\n")
print(f"Native transformed point:\n{native_transformed_point}\n")
print(f"Inverse transformed point:\n{inverse_transformed_point}")


In a purely numerical space, transformed and original points have the same dimensionality:

Random point from the numerical space with dimensionality 3:
[[3.3545092082438477, 3.68642029072711, 9]]

Transformed point with dimensionality 3:
[[3.35450921 3.68642029 9.        ]]


In a space with categorical dimension, transformed and original points do not have the same dimensionality:

Random point from the space with dimensionality 3:
[[3.3545092082438477, 3, 'elephant']]

Transformed point with dimensionality 5:
[[0.26161213 0.22222222 0.         0.         1.        ]]


Inverse transformation of a point in a space with categorical dimension:

Native transformed point:
[[0.1 0.  0.2 0.5 0.1]]

Inverse transformed point:
[[1.9, 1, 'dog']]


# Parameter names

The parameters can have names to make the easier to tell apart. These names are carried
on to some of the plots by default.

In [6]:
import ProcessOptimizer as po
space = po.Space(
    dimensions=[
        po.Real(90., 120., name = "temperature"),
        po.Integer(1, 10, name = "number_of_eggs"),
        po.Categorical(["pot", "pan", "skillet"], name="cooking_device"),
    ]
)
