### ML4CE Coursework 2025: Data-driven Optimisation of a Bioprocess 

#### Coursework Breif
This coursework involves the optimisation of a simulated bioprocess at process scale utilising CHO cells to produce a desired protein. Experimentally, this would involve a resource-intensive screening campaign involving the growth and feeding of cells under precise conditions (temperature, pH, feed amount, cell type, etc.) to maximize the production of a desired product. This coursework offers a simulated method of mapping bioprocess input parameters to a final predicted titre concentration: a measure of cell productivity. The simulations are based on various kinetic parameters which are unique to the type of cells used. For this coursework, a different set of cell kinetic parameters will be used to evaluate your algorithm. 

#### Inputs and Outputs
Inputs to the bioprocess include 6 variables: temperature [°C], pH, the concentration of feed [mM] at 3 different timepoints over 150 minutes, and cell type. The output is the concentration of the titre (desired product) [g/L]. The goal is to design and create a data-driven optimisation algorithm to obtain input variables that correspond to the highest obtained titre. 

The bounds of the inputs are as follows: (5 continuous and 1 categorical variable)

```
temperature [°C]               -> 30 - 40
pH                             -> 6 - 8
first feed concentration [mM]  -> 0 - 50
second feed concentration [mM] -> 0 - 50
third feed concentration [mM]  -> 0 - 50
cell type                      -> 'celltype_1', 'celltype_2', 'celltype_3'
```

#### Running the virtual experiment
To run an experiment, one can use the `vl.conduct_experiment(X)` function from the 'virtual_lab' package. This is your objective function. The input to this function is a matrix of shape (N, 6) where N is the number of data points and 6 refers to the total number of variables in the following order: `[temperature, pH, feed1, feed2, feed3, cell type]`. An example is shown below. 

``` python
import MLCE_CWBO2025.virtual_lab as virtual_lab
import numpy as np

def objective_func(X):
	return (np.array(virtual_lab.conduct_experiment(X)))

X_initial = [[33, 6.25, 10, 20, 20, 'celltype_1'],
            [38, 8, 20, 10, 20, 'celltype_2']]
Y_initial = objective_func(X_initial)

print(Y_initial)
```

#### Imports and Packages
For your final submission, you are only allowed the following imports and packages - please do not perform additional imports. 

```python 
import MLCE_CWBO2025.virtual_lab as virtual_lab
import numpy as np
import scipy
import random
import sobol_seq
import time
from datetime import datetime
```

#### BO Class Objectives and Constraints
----------------------------------------------------

Important Notice: 

The coursework was originally constructed to only allow bacth Bayesian Optimisation as the only optimizer. The following instructions were built with this intention. However, instead of only batch BO, the coursework can now be adapted for any and all data-driven methods or optimizers learnt from the course. Although some of the following sections would be more/less important depending on your chosen optimizer, you *MUST* still follow the objectives, constrains and conventions listed in this document. For example, your class is required to be termed `BO` and instantiated with the object name `BO_m` regardless of chosen optimizer. 

----------------------------------------------------

Your goal is to develop a batch Bayesian Optimisation class `class BO:` to obtain the set of inputs which maximizes the titre. You will have to assign two attributes, `self.Y` and `self.time`, to record your results and runtime. 

You must include `start_time = datetime.timestamp(datetime.now())` as the first line within the initialisation `__init__(self, ...)` of your BO class. Runtime must be evaluated using the datetime package and can be evaluated as the difference between timestamps once the initailisation or a batch of evaluations has finished. It is imperative that your `self.Y` and `self.time` are of the same size as these attributes will be used for your final grading.

You are allowed and must perform 15 iterations of your optimizer and a maximum batch size of 5. The initialisation/training points do not count towards this limit - see the section 'Initialisation/Training Data Constraints'. Your final `self.Y` should have `len(self.Y) = 81`.

An example BO class outline is as follows. 

```python
#BO class
class BO: 
    def __init__(self, ...):
        start_time = datetime.timestamp(datetime.now())

        self.Y    = []     # output data appended after evaluating input conditions with the objective function 
        self.time = []     # runtime data appended after each loop

        ### enter BO algorithm here ###

        self.Y    += [evaluations]

        self.time += [datetime.timestamp(datetime.now())-start_time] # for singleton BO run, time is recorded in units of ms
        # or
        self.time += [datetime.timestamp(datetime.now())-start_time] # if batches are used, remember to append the appropriate number of 0 ms 
        self.time += [0]*(len(batch_size)-1)                         # len(self.Y) is required to be equal to len(self.time)
        
        start_time = datetime.timestamp(datetime.now()) # resetting start_time at the end of the BO iteration to capture time taken for each iteration
```

Your outputs, evaluated from `self.Y`, will only be considered up to a maximum runtime of 60 seconds. Please be aware of how large the search space can become with 6 dimensions. Please set an appropriate number of batches and loops according to the constraints to run such that the code does not exceed the allocated runtime in large excess. 

#### Initialisation/Training Data Constraints
You are allowed and must initialise with 6 initial training/initialisation points. Remember, you have to have at least 2 points for each continuous variable for a standard covariance matrix to be calculated for BO.

#### Submission
You are expected to produce and submit a .py file which includes: 
1. Group Details.
2. Any helper functions or classes. 
3. The BO class.
4. The BO execution block. 

Please do not include package imports. 

Please submit your group's .py file with a name according to the convention: MLCE_{group_name}_BO.py 
You can choose your group's name. 
Submission name example: MLCE_BOChampions_BO.py

#####  1. Group Details
Your submission should include the full names of all group members, CID numbers and a 0 or 1 indication to which coursework you would like to be examined for your oral assessment. 1 indicating you would like to use this batch BO coursework for your oral assessment. 

```python
# Example
group_names     = ['Name 1','Name 2']
cid_numbers     = ['000000','111111']
oral_assessment = [0, 1]
```

##### 2. Helper functions or classes. 
There are no restrictions on the length or number of helper functions or classes. An example of what you might want to include is a class to contain your Gaussian Process. 

##### 3. The BO class. 
Please see the above section on what to include in the BO class. Again, you must include the attributes `self.Y` and `self.time` corresponding to all of your evaluated outputs and runtime as this will be used for scoring. 

##### 4. The BO execution block. 
There is great flexibility in how you want to perform the initialisation and build the search space. However, you must execute your BO class and assign it to the object with the variable `BO_m`. (`BO_m = BO(...)`). 


##### Example file submission

```python
# Your submission should look something like the following - see MLCE_ExampleSubmission_BO.py for a working example

# Group information
group_names     = ['Name 1','Name 2']
cid_numbers     = ['000000','111111']
oral_assessment = [0, 1]

# Helper classes/functions
class GP: 
    def __init__(self, ...):
        ...

# BO class
class BO: 
    def __init__(self, ...):
        start_time = datetime.timestamp(datetime.now())

        self.Y    = []  # output data appended after evaluating input conditions with the objective function 
        self.time = []  # runtime data appended after each loop

        self.Y    += [evaluations]
        self.time += [datetime.timestamp(datetime.now())-start_time] # remember to append the appropriate number of 0 ms according to batch size
        start_time = datetime.timestamp(datetime.now()) 

# BO Execution Block
X_training   = [...]
X_seachspace = [...]

BO_m = BO(...)
```

#### Scoring
Your final implimentation score will be based on the average output (from `self.Y`) under 60 secs (from `self.time`). The marking will be based on the best function value you have discovered so far, starting from Batch 3 onwards. How the marking works:

- Initialization + Batches 1 and 2 (first 16 evaluations):These do not contribute to your mark. Their purpose is purely for initialization and early exploration.
- Batch 3: After you submit the five evaluations from Batch 3, We will look at all evaluations so far (16 from earlier batches + 5 from Batch 3). Your score for Batch 3 will be the highest single function value among all these points.
- Batch 4: You submit five new samples. We evaluate them, and then consider all evaluations to date (16 + 5 + 5). Your score for Batch 4 will again be the best function value found so far, even if that point was found in an earlier batch.
- Batch 5-onwards: Same procedure: you submit five new points, they are evaluated, and we take the best value so far as your Batch score. 
- At the end we add your scores for each batch, and this is your final score. If your algorithm stops before finishing all interactions due to time, we take your highest evaluation and use that for the missing batches.

What this means for you:

- Your algorithm can explore: exploratory samples will not penalize you, since the score is based on the best-so-far value.
- The marking is such that:
- Algorithms that find good points early will be rewarded.
- Algorithms that achieve high values in general will also be rewarded.
- Your score for each assessed batch is simply: the highest evaluation value your algorithm has achieved up to that point.

Your algorithm will be run with 5 repetitions. Your final average score will be graded and curved against the cohort to produce your final grade. 

----------------------------------------------------------------------------------------------------------------------------------
### Example .py submission file
See below for an example submission file. This is based on a randomized selection of input variables.

The following imports and helper functions will be imported when evaluating your code. Please do not include the following imports or perform any additional package imports.
```python 
import MLCE_CWBO2025.virtual_lab as virtual_lab
import numpy as np
import scipy
import random
import sobol_seq
import time
from datetime import datetime

def objective_func(X): 
    return(np.array(virtual_lab.conduct_experiment(X)))
```

#### > Submission example:

MLCE_ExampleSubmission_BO.py

```python 
# Group information
group_names     = ['Name 1','Name 2']
cid_numbers     = ['000000','111111']
oral_assessment = [0, 1]

# Helper Class
class RandomSelection:
    def __init__(self, X_searchspace, objective_func, batch): 
        self.X_searchspace = X_searchspace
        self.batch         = batch

        random_searchspace = [self.X_searchspace[random.randrange(len(self.X_searchspace))] for c in range(batch)]
        self.random_Y      = objective_func(random_searchspace)


# BO class
class BO:
    def __init__(self, X_initial, X_searchspace, iterations, batch, objective_func):
        start_time = datetime.timestamp(datetime.now())

        self.X_initial     = X_initial
        self.X_searchspace = X_searchspace
        self.iterations    = iterations
        self.batch         = batch

        self.Y     = objective_func(self.X_initial)
        self.time  = [datetime.timestamp(datetime.now())-start_time]
        self.time += [0]*(len(self.X_initial)-1)
        start_time = datetime.timestamp(datetime.now())
        
        for self.iteration in range(iterations):
            random_selection = RandomSelection(self.X_searchspace, objective_func, self.batch)
            self.Y           = np.concatenate([self.Y, random_selection.random_Y])
            self.time        += [datetime.timestamp(datetime.now())-start_time]
            self.time        += [0]*(len(random_selection.random_Y)-1)
            start_time = datetime.timestamp(datetime.now())

# BO Execution Block

X_initial = ([[33, 6.25, 10, 20, 20, 'celltype_1'],
              [38, 8, 20, 10, 20, 'celltype_3'],
              [37, 6.8, 0, 50, 0, 'celltype_1'],
              [36, 6.0, 20, 20, 10, 'celltype_3'],
              [36, 6.1, 20, 20, 10, 'celltype_2'],
              [38, 6.0, 30, 50, 10, 'celltype_1']])

temp = np.linspace(30, 40, 5)
pH   = np.linspace(6, 8, 5)
f1   = np.linspace(0, 50, 5)
f2   = np.linspace(0, 50, 5)
f3   = np.linspace(0, 50, 5)
celltype = ['celltype_1','celltype_2','celltype_3']

X_searchspace = [[a,b,c,d,e,f] for a in temp for b in pH for c in f1 for d in f2 for e in f3 for f in celltype]

BO_m = BO(X_initial, X_searchspace, 15, 5, objective_func)
```

---------------------------------
#### Feedback Example
Several plots will be produced showing the performance of your code. Only outputs up to 60000 ms will be considered in the final score. See below as an example using the above submission. 

1. Plots showing the performance of your code.

<img src="group1_individual.png" width=750  height=1050>

2. Plot showing performance of your code against your cohort. 
- Example group in a cohort of 5 (with 5 repeats each).

<img src="group1_vscohort.png" width=750  height=500>

3. Plots showing scoring of individual runs and final average score. 

<img src="group1_scoring.png" width=750  height=1150>

----------------------------------

### Checking Your Enviroment

In [1]:
# Check your enviroment - run the following package imports and code to see if enviroment was succesfully built

import MLCE_CWBO2025.virtual_lab as virtual_lab
import numpy as np
import scipy
import random
import sobol_seq
import time
from datetime import datetime

def objective_func(X):
	return (np.array(virtual_lab.conduct_experiment(X)))

X_initial = [[33, 6.25, 10, 20, 20, 'celltype_1'],
            [38, 8, 20, 10, 20, 'celltype_2']]
Y_initial = objective_func(X_initial)

print(Y_initial)

[27.61197099  1.29258542]
