# Sampling control parameters

This notebook explains the different sampling strategies for control parameters that are
built in.

Note that as far as possible, priors are respected in sampling.

## Random sampling

From a `Space`, it is possible to do random sampling, which gives a sample according to
the `prior` distribution of the `Space`:

In [1]:
import ProcessOptimizer as po

# Define the space
space = po.Space(
    dimensions=[
        po.Real(1, 1000, prior='uniform'),
        po.Real(1, 1000, prior='log-uniform'),
        po.Integer(1, 10),
        po.Categorical(["cat", "dog", "elephant"]),
    ]
)

# Generate random samples
random_sample_list = space.rvs(n_samples=10)

# Print the random samples
for i in range(10):
    print(f"{i}'th random sample: {random_sample_list[i]}")

0'th random sample: [54.174760171587415, 674.1542254907736, 1, 'cat']
1'th random sample: [716.9616655661121, 152.88667160503823, 3, 'cat']
2'th random sample: [656.481264360899, 41.233140642938295, 4, 'dog']
3'th random sample: [754.7391984415862, 18.011168456210566, 8, 'elephant']
4'th random sample: [229.02961612859752, 63.82480252005361, 10, 'dog']
5'th random sample: [526.8215344510819, 6.351362792457992, 9, 'dog']
6'th random sample: [232.80536356182446, 1.0621496189803472, 3, 'cat']
7'th random sample: [189.36602317461546, 3.573226370662693, 4, 'cat']
8'th random sample: [239.02737445097802, 2.718447964983453, 1, 'elephant']
9'th random sample: [772.7076878631481, 40.61510332081704, 5, 'cat']


# Latin hypercube sampling

Random sampling is not a good starting point for doing Bayesian optimisation. It is better
to have the starting samples distributed over the dimensions in a controlled manner. This
is ensured by Latin Hypercube sampling, which provides samples that are guaranteed to be
equally distributed on each dimension with a uniform prior (but not on combinations of
dimensions). On dimensions with non-uniform priors, the prior is respected. This means
that the points sampled along a given dimension will have the distrubtion specified by the
prior.

In [2]:
# Generate LHS samples
LHS_sample_list = space.lhs(n=10)

# Print the LHS samples
for i in range(10):
    print(f"{i}'th LHS sample: {LHS_sample_list[i]}")

0'th LHS sample: [550.45, 22.387211385683397, 1, 'dog']
1'th LHS sample: [650.35, 354.8133892335753, 10, 'cat']
2'th LHS sample: [50.95, 5.62341325190349, 9, 'cat']
3'th LHS sample: [750.25, 89.12509381337456, 5, 'elephant']
4'th LHS sample: [350.65, 44.668359215096324, 8, 'elephant']
5'th LHS sample: [250.75, 707.9457843841375, 3, 'dog']
6'th LHS sample: [450.55, 177.82794100389225, 4, 'elephant']
7'th LHS sample: [950.05, 11.22018454301963, 6, 'dog']
8'th LHS sample: [150.85, 1.4125375446227544, 7, 'cat']
9'th LHS sample: [850.15, 2.8183829312644537, 2, 'dog']


# Random states

Both random value sampling and Latin hypercube sampling supports taking a
random seed to allow for reproducible sampling. They support a variety of
formats, or `None` for true randomness.

Random value sampling is random by default, while Latin hypercube sampling is
pseudo-random. Note that randomising the Latin hypercube sampling results in
(mostly) different points being sampled, but the sampled values for each
dimension are the same.

In [3]:
# Define the space
space_definition = [[1., 10.], [1, 10], ["cat", "dog", "elephant"]]
space = po.Space(space_definition)

# Generate random samples and print them
for i in range(5):
    print(f"{i+1}'th random sample: {space.rvs(n_samples=1)}")

print("\n")

# Generate pseudo-random samples and print them
for i in range(5):
    print(f"{i+1}'th pseudo-random sample: {space.rvs(n_samples=1, random_state=2)}")

print("\n")

# Generate LHS samples and print them
print(f"First Latin hypercube sampling:  {space.lhs(n=5)}")
print(f"Second Latin hypercube sampling: {space.lhs(n=5)}")
print(f"LHS sampling with different seed:  {space.lhs(n=5, seed=2)}")

1'th random sample: [[4.799630764832692, 9, 'cat']]
2'th random sample: [[6.868553144430436, 1, 'cat']]
3'th random sample: [[5.198243894739806, 6, 'elephant']]
4'th random sample: [[8.773373646979952, 1, 'cat']]
5'th random sample: [[2.5516215654736087, 2, 'dog']]


1'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]
2'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]
3'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]
4'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]
5'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]


First Latin hypercube sampling:  [[9.1, 8, 'dog'], [5.5, 2, 'elephant'], [7.3, 4, 'cat'], [3.6999999999999997, 6, 'cat'], [1.9, 10, 'elephant']]
Second Latin hypercube sampling: [[9.1, 8, 'dog'], [5.5, 2, 'elephant'], [7.3, 4, 'cat'], [3.6999999999999997, 6, 'cat'], [1.9, 10, 'elephant']]
LHS sampling with different seed:  [[5.5, 6, 'elephant'], [9.1, 2, 'elephant'], [7.3, 10, 'dog'], [1.9, 8, 'cat'], 