<div align=left style="width: 200; height: 80px; overflow: hidden">
    <img src=http://static1.squarespace.com/static/571446ff60b5e92c3a2b4249/57d8a40b9de4bb459f731cf3/58cb2f229de4bb4a049d38c2/1505340359463/teselaGenlogo.jpg align=right width=200>
</div>

# Closing the DBTL loop with TeselaGen

With TeselaGen's platform you can close the Design-Build-Test-Learn (DBTL) cycle  using machine learning algorithms that automatically learn from your data. The DISCOVER  module is capable of suggesting new candidates that can optimize your results given your previous experimental rounds. This document shows how to enable those candidates as new designs at the DESIGN module to perform the next DBTL cycle.

**Inputs**: Evolutions algorithm's result at an DISCOVER module instance

**Outputs**: New designs created at DESIGN module

#### Requirements: 

* Access permissions to the lab where the evolutions results are stored
* Have Python3 installed in your local computer with Pandas and TG's *api-client*

First, we start making all required imports

In [1]:
import platform

from IPython.core.display import display
from IPython.core.display import HTML
import pandas as pd

from teselagen.api import TeselaGenClient
from teselagen.utils.candidates_to_design import build_design_from_candidates

print(f"python version     : {platform.python_version()}")
print(f"pandas version     : {pd.__version__}")

python version     : 3.9.6
pandas version     : 1.3.0


## Look for your Evolution results

Here, the concept of *closing the DBTL loop* refers to the ability to generate designs out of what was learned from previous experiments. Those designs can be used to conduct new experimental rounds. This notebook assumes you've already trained an *Evolution* model.

The results of an *Evolution* model contain a set of ranked candidates that may outperform your current measurements. Each of the proposed candidates is a combination of the parts (and possibly other variables) you have already tested within the designs in your experiments. These new combinations were evaluated and ranked by a machine learning algorithm and we will generate proper designs with them. 

This guide starts at the output of the Evolutions tool at DISCOVER. The next cell connects the notebook with DISCOVER and selects the empty lab (`Common`) which holds our sample experiment:

In [2]:
# Connect to your teselagen instance by passing it as the 'host_url' argument of EVOLVECLient(host_url=host_url)
# client = EVOLVEClient(host_url="https://your-instance-name.teselagen.com")
client = TeselaGenClient()
client.login()
client.select_laboratory(lab_name="The Test Lab")

Connection Accepted
Selected Lab: The Test Lab


Next, we find the `evolutive` model with name `Teselagen Example Evolutive Model`:

In [3]:
search_for_name = "Teselagen Example Evolutive Model"
evolution_models_info = client.discover.get_models_by_type('evolutive')
model_id = -1

for info in evolution_models_info:
    if info['name'] == search_for_name:
        model_id = info['id']
        print(f"Model id {info['id']}, name: {info['name']}")

if model_id == -1:
    raise OSError("Didn't found model")

Model id 101, name: Teselagen Example Evolutive Model


And get the models' results. The results objects contain predictions for several untested combinations. We will focus on the rows with valid priority values, which are the better candidates suggested by the algorithm:

In [4]:
results = client.discover.get_model_datapoints(
    model_id=model_id,
    datapoint_type="output",
    batch_size=400,
    batch_number=1,
)

data = pd.DataFrame([el for el in results['data']])
data = data.dropna(subset=['priority']).reset_index(drop=True)
display(data)

Unnamed: 0,Teselagen Enzyme A,Teselagen Enzyme B,sigma,acq,in_batch,priority,Production,prediction
0,Variant A1,Variant B5,2.85876,0.2897875,True,0.0,,6.531618
1,Variant A4,Variant B3,2.569104,0.1767566,True,1.0,,6.267417
2,Variant A5,Variant B4,2.315206,0.134317,True,2.0,,6.349708
3,Variant A1,Variant B3,3.147615,0.1329677,True,3.0,,4.887956
4,Variant A0,Variant B5,3.406001,0.1957839,True,4.0,,5.043227
5,Variant A3,Variant B5,2.005093,0.1634227,True,5.0,,7.06181
6,Variant A4,Variant B4,2.243716,0.1266118,True,6.0,,6.403903
7,Variant A5,Variant B2,2.484564,0.03890292,True,7.0,,4.70956
8,Variant A5,Variant B5,2.032704,0.2400916,True,8.0,,7.443619
9,Variant A2,Variant B3,2.284042,0.07849138,True,9.0,,5.825411


Note the algorithm doesn't suggest candidates you've already tested. That's why the `Production` column, the *unknown* variable for untested combinations in this example, contains only `NaN` values.

## Build the designs json

Now we need to generate a json file with the candidates in order to be imported from DESIGN. We've added an utility for this at the api-client library that is called `build_design_from_candidates`. This utility receives a list of dictionaries as input and it requires to explicitly declare the columns that should be interpreted as bins. Following with the example:

In [5]:
design = build_design_from_candidates(
    candidates_data=data.to_dict(orient="records"),
    bin_cols=['Enzyme A', 'Enzyme B'],
    name="Closing DBTL Example",
    priority_col='priority',
)

Generating design using 26 candidates


The `design` variable contains a dictionary representation of the design. This representation can be easily stored as a json file and then uploaded into DESIGN. To do this, we need to create a `DESIGNClient` instance:

And upload the design. The method `post_design` returns the id of the generated DESIGN in case of success:

In [6]:
response = client.design.post_design(design=design)
display(response)

{'id': '1303'}

The new design should be created and look like this:

<div align=left style="">
    <img src=https://downloads.intercomcdn.com/i/o/236560355/5b4d66d19c53bc8b31dc202e/image.png>
</div>


Uncomment and run the following cell to get the design link:

In [7]:
# design_url = f"{design_client.host_url}/design/client/designs/{response['id']}"
# display(HTML(f"""<a href="{design_url}">{design_url}</a>"""))