# Sampling control parameters

This notebook explains the different sampling strategies for control parameters that are
built in.

Note that as far as possible, priors are respected in sampling.

## Random sampling

From a `Space`, it os possible to do random sampling, which gives a sample according to
the prior distribution of the `Space`:

In [6]:
import ProcessOptimizer as po

space = po.Space(
    dimensions=[
        po.Real(1, 1000, prior='uniform'),
        po.Real(1, 1000, prior='log-uniform'),
        po.Integer(1, 10),
        po.Categorical(["cat", "dog", "elephant"]),
    ]
)

random_sample_list = space.rvs(n_samples=10)
for i in range(10):
    print(f"{i}'th random sample: {random_sample_list[i]}")

0'th random sample: [739.593012223842, 15.457662369351741, 2, 'dog']
1'th random sample: [68.95816738762488, 13.915524343015452, 1, 'cat']
2'th random sample: [828.5451197510599, 288.91928387972035, 10, 'elephant']
3'th random sample: [836.7336112909996, 210.87625626774764, 8, 'dog']
4'th random sample: [491.93654908949105, 161.44928464330687, 4, 'dog']
5'th random sample: [293.97383241199447, 349.49202940768765, 2, 'dog']
6'th random sample: [540.2237486887326, 51.33297577507535, 1, 'cat']
7'th random sample: [411.6171736917781, 96.08847722772703, 7, 'elephant']
8'th random sample: [31.8077228608918, 10.834477554913208, 10, 'cat']
9'th random sample: [262.7062962324372, 6.022598002141888, 3, 'dog']


# Latin Hypercube sampling

Random sampling is not a good starting point for doing Bayesian optimisation. It is better
to have the starting samples spread out over the dimensions. This is ensured by Latin
Hypercube sampling, which provides samples that are guaranteed to the equally distributed
on each dimension with a uniform prior (but not on combinations of dimensions). On
dimensions with non-uniform priors, the prior is respected, while the wide distribution is
preserved:

In [9]:
LHS_sample_list = space.lhs(n=10)
for i in range(10):
    print(f"{i}'th LHS sample: {LHS_sample_list[i]}")

0'th LHS sample: [550.45, 22.387211385683397, 1, 'dog']
1'th LHS sample: [650.35, 354.8133892335753, 10, 'cat']
2'th LHS sample: [50.95, 5.62341325190349, 9, 'cat']
3'th LHS sample: [750.25, 89.12509381337456, 5, 'elephant']
4'th LHS sample: [350.65, 44.668359215096324, 8, 'elephant']
5'th LHS sample: [250.75, 707.9457843841375, 3, 'dog']
6'th LHS sample: [450.55, 177.82794100389225, 4, 'elephant']
7'th LHS sample: [950.05, 11.22018454301963, 6, 'dog']
8'th LHS sample: [150.85, 1.4125375446227544, 7, 'cat']
9'th LHS sample: [850.15, 2.8183829312644537, 2, 'dog']


: 

# Random states

Both random value sampling and Latin hypercube sampling supports taking a
random seed to allow for reproducible sampling. They support a variety of
formats, or `None` for true randomness.

Random value sampling is random by default, while Latin hypercube sampling is
pseudo-random. Note that randomising the Latin hypercube sampling results in
(mostly) different points being sampled, but the sampled values for each
dimension are the same.

In [1]:
space_definition = [[1., 10.], [1, 10], ["cat", "dog", "elephant"]]
space = po.Space(space_definition)

for i in range(10):
    print(f"{i+1}'th random sample: {space.rvs(n_samples=1)}")

print("\n")

for i in range(10):
    print(f"{i+1}'th pseudo-random sample: {space.rvs(n_samples=1, random_state=2)}")

print("\n")

print(f"First Latin hypercube sampling:  {space.lhs(n=5)}")
print(f"Second Latin hypercube sampling: {space.lhs(n=5)}")
print(f"LHS sampling with different seed:  {space.lhs(n=5, seed=2)}")

1'th random sample: [[7.390529359011395, 9, 'elephant']]
2'th random sample: [[1.442210836475473, 2, 'cat']]
3'th random sample: [[8.107401471071999, 9, 'cat']]
4'th random sample: [[1.7276142925588749, 2, 'cat']]
5'th random sample: [[4.1641050842516965, 1, 'dog']]
6'th random sample: [[8.955956661217495, 2, 'cat']]
7'th random sample: [[2.9798107393062985, 5, 'elephant']]
8'th random sample: [[8.10343485571568, 9, 'cat']]
9'th random sample: [[8.268070627926466, 5, 'elephant']]
10'th random sample: [[9.219540236774463, 8, 'dog']]


1'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]
2'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]
3'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]
4'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]
5'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]
6'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]
7'th pseudo-random sample: [[3.3545092082438477, 3, 'elephant']]
8'th 