<a href="https://colab.research.google.com/github/manulera/ShareYourCloning_backend/blob/master/RSE_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Install pip dependencies
!pip install git+https://github.com/BjornFJohansson/pydna@4fd760d075f77cceeb27969e017e04b42f6d0aa3
!pip install regex
!pip install requests

Collecting git+https://github.com/BjornFJohansson/pydna@4fd760d075f77cceeb27969e017e04b42f6d0aa3
  Cloning https://github.com/BjornFJohansson/pydna (to revision 4fd760d075f77cceeb27969e017e04b42f6d0aa3) to /tmp/pip-req-build-zggt6wxy
  Running command git clone --filter=blob:none --quiet https://github.com/BjornFJohansson/pydna /tmp/pip-req-build-zggt6wxy
  Running command git rev-parse -q --verify 'sha^4fd760d075f77cceeb27969e017e04b42f6d0aa3'
  Running command git fetch -q https://github.com/BjornFJohansson/pydna 4fd760d075f77cceeb27969e017e04b42f6d0aa3
  Resolved https://github.com/BjornFJohansson/pydna to commit 4fd760d075f77cceeb27969e017e04b42f6d0aa3
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [29]:
# Get files to import functions
import os
if not os.path.isfile('dna_functions.py'):
  !wget 'https://raw.githubusercontent.com/manulera/ShareYourCloning_backend/master/dna_functions.py'
if not os.path.isfile('pydantic_models.py'):
  !wget 'https://raw.githubusercontent.com/manulera/ShareYourCloning_backend/master/pydantic_models.py'
if not os.path.isfile('assembly2.py'):
  !wget 'https://raw.githubusercontent.com/manulera/ShareYourCloning_backend/master/assembly2.py'

--2024-02-29 15:25:10--  https://raw.githubusercontent.com/manulera/ShareYourCloning_backend/master/assembly2.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39860 (39K) [text/plain]
Saving to: ‘assembly2.py’


2024-02-29 15:25:10 (11.9 MB/s) - ‘assembly2.py’ saved [39860/39860]



# Intro

Simplistically, we can think of DNA sequences as text strings. Molecular cloning would be the experimental manipulations that allow us to copy-paste fragments of different sequences to generate a new sequence. The problem is that there is not a recognised standard or data model to represent these manipulations, and therefore no way to document the provenance of sequences.

[ShareYourCloning](https://shareyourcloning.netlify.app/) is a project that aims to develop:
* A data model to represent the "source" of a sequence. A source can be either:
  * A minimal description of an experimental manipulation that also links the child sequence to its parents
  * Metadata indicating the provenance of externally imported or naturally occurring DNA sequences (e.g. an identifier in a plasmid repository, or the genome coordinates of a gene of interest).
* A web application for researchers to plan molecular cloning and export it in that data model format.

You can see a [video](https://www.youtube.com/watch?v=HRQb6s8m8_s&t=2s&ab_channel=Genestorian) of an older version of the app, or visit the [hosted version](https://shareyourcloning.netlify.app/) to see how it looks. It has a [FastAPI python backend](https://github.com/manulera/ShareYourCloning_backend) and a [React Frontend](https://github.com/manulera/ShareYourCloning_frontend) (repos linked).

## This notebook

This notebook is a followup to [this issue](https://github.com/manulera/ShareYourCloning_backend/issues/83) where I describe the kind of questions that I have. In this network I give some practical examples of what I am trying to do, and how it could map to a data model / schema.

The thing to keep in mind while going through this notebook is that I want to use [LinkML](https://linkml.io/linkml/index.html), a framework to define data models because:
  * You describe your schema using yaml, but it can turn your schema into other things (json schema, SQL...) and it has a framework for migration of the schema.
  * It can produce pydantic models that are used by FastAPI as API inputs and return values (FastAPI parses the request payload json into pydantic objects and serializes pydantic objects into json for the response).
  * The other side of the coin is that if I did not use LinkML-generated pydantic models and I wrote my own, I could add methods to them that are convenient for parsing some inputs, and some fields would have non-natural data types (for instance, restriction enzyme class from Biopython instead of a list of strings containing enzyme names).

I have included `> Note` in comments where I think input would be useful, and I have added some questions at the end.



# Representing sources of sequences

For now, sources of sequences are represented as pydantic models, they are all children of the class "Source" that has the following
fields:

In [5]:
from pydantic_models import Source

Source.model_fields


{'id': FieldInfo(annotation=Union[int, NoneType], required=False, description='Unique identifier of the source'),
 'kind': FieldInfo(annotation=str, required=False, default='source', description='The kind entity (always equal to "source"). Should probably be removed.'),
 'input': FieldInfo(annotation=list[int], required=False, default=[], description="Identifiers of the sequences that are an input to this source.                              If the source represents external import of a sequence, it's empty."),
 'output': FieldInfo(annotation=Union[int, NoneType], required=False, description='Identifier of the sequence that is an output of this source.'),
 'type': FieldInfo(annotation=Union[SourceType, NoneType], required=True, description='The type source (PCR, restriction, etc.)'),
 'info': FieldInfo(annotation=dict, required=False, default={}, description='Additional information about the source (not used much yet, and probably should be removed)')}

# Representing the sequences themselves

I don't pay much attention to this, because most of the info of a sequence is stored as genbank (in a string), but there is a pydantic model to represent sequences in order to:
* Assign them a unique identifier, so they can be linked to sources as inputs or outputs.
* Document whether they have sequence overhangs.
  > What are Overhangs? Normally, the DNA sequences that we use in the lab are double-stranded DNA, but sometimes they can have single-stranded DNA at one of their ends, and this is call an overhang. This can happen when cutting with a [restriction enzyme](https://images.ctfassets.net/d49zkrle08v9/7dmzFrgzvfdgg0IK0jZ6v6/64d89fb0722f498561bf1c3be9072bf4/Restriction_enzyme_cuts.png).

In [6]:
from pydantic_models import SequenceEntity, GenbankSequence
import pprint

pp = pprint.PrettyPrinter(depth=4)
pp.pprint(SequenceEntity.model_fields)
print()
pp.pprint(GenbankSequence.model_fields)



{'id': FieldInfo(annotation=Union[int, NoneType], required=False, description='Unique identifier of the sequence'),
 'kind': FieldInfo(annotation=str, required=False, default='entity', description='The kind entity (always equal to "entity"). Should probably be removed.'),
 'sequence': FieldInfo(annotation=Union[GenbankSequence, NoneType], required=True, description='The sequence in genbank format + some extra info that is not captured by the genbank format')}

{'file_content': FieldInfo(annotation=str, required=False, default=''),
 'file_extension': FieldInfo(annotation=str, required=False, default='gb'),
 'overhang_crick_3prime': FieldInfo(annotation=int, required=False, default=0, description="Taken from pydna's `dseq::ovhg`        An integer describing the length of the        crick strand overhang in the 5' of the molecule, or 3' of the crick strand"),
 'overhang_watson_3prime': FieldInfo(annotation=int, required=False, default=0, description='The equivalent of `overhang_crick_3pri

# Example 1 - Restriction cut

## 0 What's a restriction cut


Let's look at a first example: how we simulate and document the cut by a restriction enzyme.

Restriction enzymes cut the DNA at particular sequences. For instance, the enzyme `EcoRI` cuts on sequence `G^AATT_C`, meaning that if the sequence
`GAATTC` is present, it will cut after the `G` on the top strand, and after the
second `T` on the bottom strand.

![](https://images.ctfassets.net/d49zkrle08v9/7dmzFrgzvfdgg0IK0jZ6v6/64d89fb0722f498561bf1c3be9072bf4/Restriction_enzyme_cuts.png)


These are the extra fields that the source representing enzyme cuts contains, they will be explained in detail below.

In [7]:
from pydantic_models import RestrictionEnzymeDigestionSource

restriction_fields = RestrictionEnzymeDigestionSource.model_fields.copy()

for key in Source.model_fields:
  restriction_fields.pop(key)

restriction_fields

{'left_edge': FieldInfo(annotation=Union[tuple[int, int], NoneType], required=False, description='The left edge of the cut, in the format (cut_watson, ovhg)'),
 'right_edge': FieldInfo(annotation=Union[tuple[int, int], NoneType], required=False, description='The right edge of the cut, in the format (cut_watson, ovhg)'),
 'restriction_enzymes': FieldInfo(annotation=List[Union[str, NoneType]], required=True, description='Enzymes associated with the left and right sides of the cut. It can contain None to represent the edge the sequence in linear sequences.', metadata=[Len(min_length=1, max_length=None)])}

The web API receives as input:
* The sequence to be cut
* A `RestrictionEnzymeDigestionSource` object where only the `restriction_enzymes` field is populated with a list of enzymes as strings

## 1 API inputs

In [8]:
from pydna.dseqrecord import Dseqrecord
from Bio.Restriction.Restriction import RestrictionBatch


# The input sequence actually comes as a pydantic object and then
# would be transformed into a Dseqrecord (Biopython object that represents
# a double-stranded sequence), for now let's skip that.
input_sequence = Dseqrecord('ACGAATTCTAGAATTCAA')

# Note that the source that is submitted to the server still does not
# have an id (it will be assigned at the frontend once the user picks the desired
# fragment)
input_source = RestrictionEnzymeDigestionSource(
            input=[1],
            restriction_enzymes=['EcoRI'],
        )

# We convert the list of enzyme strings into a set of enzyme objects
# using the Biopython library.
enzymes = RestrictionBatch(first=[e for e in input_source.restriction_enzymes])
# > Note: This could be done directly when the object is received by the API,
# the field `restriction_enzymes` could be of type `RestrictionBatch`, and one
# can write a parser that converts the json input into that. That would be the
# ideal scenario, but not sure that is compatible with LinkML.


# See how this object represents a double stranded sequence
input_sequence.seq

Dseq(-18)
ACGAATTCTAGAATTCAA
TGCTTAAGATCTTAAGTT

## 2 Simulating the cloning + keeping track of what we do

We can simulate this using the cloning library, as shown below

In [9]:
print('parent', input_sequence.seq.__repr__(), sep='\n')
print()
child1, child2, child3 = input_sequence.cut(enzymes)
print('child1', child1.seq.__repr__(), sep='\n')
print()
print('child2', child2.seq.__repr__(), sep='\n')
print()
print('child3', child3.seq.__repr__(), sep='\n')

parent
Dseq(-18)
ACGAATTCTAGAATTCAA
TGCTTAAGATCTTAAGTT

child1
Dseq(-7)
ACG
TGCTTAA

child2
Dseq(-12)
AATTCTAG
    GATCTTAA

child3
Dseq(-7)
AATTCAA
    GTT


In the actual API call, we do this in several steps, to keep track also of where the cut happens.



### Find cutsites



First, we get the cutsites present in the sequence for the given enzymes. A cutsite is represented as `((cut_watson, ovhg), enz)`:

- `cut_watson` is a positive integer contained in `[0,len(seq))`, where `seq` is the sequence that will be cut. It represents the position of the cut on the watson strand.
- `ovhg` is the overhang left after the cut. For the example `EcoRI`, the value is -4 (4 bases missing from top strand in the left fragment after the cut).
- `enz` is the enzyme object. It's not necessary to perform the cut, but can be used to keep track of which enzyme was used.

In [10]:
cutsites = input_sequence.seq.get_cutsites(*enzymes)
cutsites
# > Note: these cutsites are defined as tuples, but this is probably not very
# data model-friendly, would it make sense to turn them into a class?


[((3, -4), EcoRI), ((11, -4), EcoRI)]

### Pairing cutsites

A fragment produced by restriction is represented by a tuple of length 2 that may contain cutsites or `None`:

- Two cutsites: represents the extraction of a fragment between those two cutsites, in that orientation.
- `None`, cutsite: represents the extraction of a fragment between the left edge of linear sequence and the cutsite.
- cutsite, `None`: represents the extraction of a fragment between the cutsite and the right edge of a linear sequence.


In [11]:
cutsite_pairs = input_sequence.seq.get_cutsite_pairs(cutsites)

for s in cutsite_pairs:
  print(s)

(None, ((3, -4), EcoRI))
(((3, -4), EcoRI), ((11, -4), EcoRI))
(((11, -4), EcoRI), None)


This information is sufficient to instantiate a `RestrictionEnzymeDigestionSource` describing the provenance of each fragment, so we create one for each:

In [12]:
sources = list()
for (left_cut, right_cut) in cutsite_pairs:
  sources.append(
      RestrictionEnzymeDigestionSource.from_cutsites(
          left=left_cut,
          right=right_cut,
          input=input_source.input, # This we already knew
          id=input_source.id
      )
  )

# Let's print them
for s in sources:
  print(s)



id=None kind='source' input=[1] output=None type=<SourceType.restriction: 'restriction'> info={} left_edge=None right_edge=(3, -4) restriction_enzymes=[None, 'EcoRI']
id=None kind='source' input=[1] output=None type=<SourceType.restriction: 'restriction'> info={} left_edge=(3, -4) right_edge=(11, -4) restriction_enzymes=['EcoRI', 'EcoRI']
id=None kind='source' input=[1] output=None type=<SourceType.restriction: 'restriction'> info={} left_edge=(11, -4) right_edge=None restriction_enzymes=['EcoRI', None]


### Executing the cut

Now we want to generate all possible fragments

In [13]:
fragments = list()
for (left_cut, right_cut) in cutsite_pairs:
  fragments.append(input_sequence.apply_cut(left_cut,right_cut))

# Verify that they are the same as before:
for f in fragments:
  print(f.seq.__repr__())
  print()

Dseq(-7)
ACG
TGCTTAA

Dseq(-12)
AATTCTAG
    GATCTTAA

Dseq(-7)
AATTCAA
    GTT



## 3 Returning values to user

Now we simply return a response containing a list of possible sources and sequences, and the user picks one.

In [14]:
from dna_functions import format_sequence_genbank
# format_sequence_genbank converts the sequence object into json
formatted_fragments = [format_sequence_genbank(f) for f in fragments]

# This would be the return value of the API.
return_value = {'sequences': formatted_fragments, 'sources': sources}

# Example 2 - A Gibson assembly

## 0 What is a Gibson Assembly

In essence, a Gibson assembly can join sequences based on "common substrings". Imagine we have three DNA sequences `ataCCC` `CCCttaTTT` `TTTgcg`. One possible Gibson assembly would be:

```
ataCCC
   |||
   CCCttaTTT
         |||
         TTTgcg

Produces:
ataCCCttaTTTgcg
```

> Info: Gibson assemblies join sequences only if these common substrings are at the end, but other types of assemblies can produce joins like this:
> ```
> CCCttaTTTaa
>       |||
>     ccTTTgcg
>  
> Output: CCCttaTTTgcg (aa and cc are lost)
> ```

Consider that these sequences are double-stranded DNA, so they could be joined in either orientation, let's take the middle sequence as an example:

```
Forward orientation            Reverse orientation

CCCttaTTT                      AAAtaaGGG
|||||||||                      |||||||||
GGGaatAAA                      TTTattCCC
```


The `GibsonAssemblySource` represents a Gibson assembly, and contains two extra fields as shown below.


In [25]:
from pydantic_models import GibsonAssemblySource

gibson_fields = GibsonAssemblySource.model_fields.copy()

for key in Source.model_fields:
  gibson_fields.pop(key)

gibson_fields

{'assembly': FieldInfo(annotation=Union[Annotated[List[tuple[int, int, str, str]], Len], NoneType], required=False, description='The assembly plan as a list of tuples (part_1_id, part_2_id, loc1, loc2)'),
 'circular': FieldInfo(annotation=Union[bool, NoneType], required=False, description='Whether the assembly is circular or not')}


The list of tuples `assembly` contains the information of how to join subfragments of the parent sequences to produce the child sequence.
As mentioned in the previous [GitHub issue](https://github.com/manulera/ShareYourCloning_backend/issues/83), if the three example sequences above had ids 10, 11 and 12. The field assembly would contain the following:

```json
{
    "assembly": [
        [1, 2, "3..6", "0..3"],
        [2, 3, "6..9", "0..3"]
    ],
}
```

The `assembly` field is an array of 4-length arrays, each of them representing the join between two fragments:

- The first and second integers represent the index (one-based) of the joined fragments in the input list. The sign of the integer represents the orientation of each fragment, positive for forward orientation, negative for reverse orientation.
- The strings represent the location of the overlap in the first and second fragment. This is standard sequence location syntax.
- The assembly can be a loop in some cases if it starts and finishes with the same fragment, the `circular` fields indicates that.




## 1 API Inputs



In [26]:
# The API receives a list of sequences as pydantic objects, below we construct
# those objects using the function

sequences= [
  Dseqrecord('ataCCC'),
  Dseqrecord('CCCttaTTT'),
  Dseqrecord('TTTgcg')
]

input_sequences = [format_sequence_genbank(s) for s in sequences]
ids = [10, 11, 12]

for seq, seq_id in zip(input_sequences, ids):
  seq.id = seq_id

# This is how the seq pydantic object looks (note that the entire gb file is
# contained in the sequence.file_content field)
input_sequences[0]

SequenceEntity(id=10, kind='entity', sequence=GenbankSequence(type='file', file_extension='gb', file_content='LOCUS       name                       6 bp    DNA     linear   UNK 01-JAN-1980\nDEFINITION  description.\nACCESSION   id\nVERSION     id\nKEYWORDS    .\nSOURCE      .\n  ORGANISM  .\n            .\nFEATURES             Location/Qualifiers\nORIGIN\n        1 ataccc\n//', overhang_crick_3prime=0, overhang_watson_3prime=0))

In [27]:
# The other input is a GibsonAssemblySource where the only set field is input
input_source = GibsonAssemblySource(
            input=[10, 11, 12],
        )


Basically, the payload of the request is:
```json
{
  "source": input_source,
  "sequences": input_sequences
}
```

Note that this is not great, because from the start there is no way to guarantee that the ids in `input_sequences` correspond to the ones in `input_source.input`.
> One obvious fix for this particular use-case is to not send the source at all in the payload, and just send the sequences. The issue is that the same endpoint can be used to get all possible assemblies (as we are using now, when only the `input` field is set), as well as to execute a particular known assembly (if `assembly` and `circular` fields are set), so that problem would also exist there.

In [28]:
from dna_functions import read_dsrecord_from_json

# Because of the issue with ids in input not necessarily matching those of the
# sequences, this is how I process the inputs:
fragments = [next((read_dsrecord_from_json(seq) for seq in input_sequences if seq.id == id), None) for id in input_source.input]
if any(f is None for f in fragments):
  # Commented out because it needs api library for special error
  # raise HTTPException(400, f'Invalid fragment id in input')
  pass

# Fragments now contains the dseqrecords of the input sequences:
fragments


[Dseqrecord(-6), Dseqrecord(-9), Dseqrecord(-6)]

## 2 Simulating the cloning - keeping track of what we do

We can simulate this using the cloning library, as shown below

In [52]:
from assembly2 import Assembly, gibson_overlap, assembly2str, assemble

# this is a parameter that the user sets, describing the minimal homology
# required to join fragments, for this dummy example we use 3, in reality
# more would be needed
minimal_homology = 3

# This object contains all assemblies
asm = Assembly(fragments, algorithm=gibson_overlap, use_fragment_order=False, use_all_fragments=True, limit=minimal_homology)

# We want all possible assemblies
possible_assemblies = asm.assemble_linear() + asm.assemble_circular()

# Only the one we mentioned above is returned (it's the only possible one)
print('possible assemblies length:',len(possible_assemblies))
print(possible_assemblies[0].seq.__repr__())



possible assemblies length: 1
Dseq(-15)
ATACCCTTATTTGCG
TATGGGAATAAACGC


### Why are assemblies represented like that?

To understand why the assemblies are represented the way they are, it may make sense to understand how the possible assemblies are calculated.

The `Assembly` class contains a directed graph, where nodes represent fragments and edges represent overlaps between fragments:
- The node keys are integers, representing the index of the fragment in the input list of fragments. The sign of the node key represents the orientation of the fragment, positive for forward orientation, negative for reverse orientation.
- The edges contain the locations of the overlaps in the fragments. For an edge `(u, v, key)`:
    - `u` and `v` are the nodes connected by the edge.
    - key is a string that represents the location of the overlap. In the format introduced above:
    `u[start:end](strand):v[start:end](strand)`.
    - Edges have a `locations` attribute, which is a list of two `Location` objects, representing the location of the overlap in the `u` and `v` fragment.
    - You can think of an edge as a representation of the join of two fragments.

Let's look at how the edges of the previous graph look

In [53]:
print(*asm.G.edges, sep='\n')

(1, 2, '1[3:6]:2[0:3]')
(2, 3, '2[6:9]:3[0:3]')
(-2, -1, '-2[6:9]:-1[0:3]')
(-3, -2, '-3[3:6]:-2[0:3]')


In [54]:

# we can also get the linear assemblies represented as explained above
assemblies = asm.get_linear_assemblies()
print('possible assemblies:', len(assemblies))

# Note that here, the locations of the assembly are not strings (unlike in the
# GibsonAssemblySource), they are Location objects (from Biopython)
# > Note: this is a case where adding a custom parser to the pydantic assembly
# pydantic objects would reduce the need to convert.
print(assemblies[0])


possible assemblies: 1
((1, 2, SimpleLocation(ExactPosition(3), ExactPosition(6)), SimpleLocation(ExactPosition(0), ExactPosition(3))), (2, 3, SimpleLocation(ExactPosition(6), ExactPosition(9)), SimpleLocation(ExactPosition(0), ExactPosition(3))))


In [55]:
# For a concise representation we use this function:
print(assembly2str(assemblies[0]))

('1[3:6]:2[0:3]', '2[6:9]:3[0:3]')


In [56]:
# See how it changes when one of the elements is inverted (that's what we get with
# reverse_complement)
fragments2 = [fragments[0], fragments[1].reverse_complement(), fragments[2]]
asm = Assembly(fragments2, algorithm=gibson_overlap, use_fragment_order=False, use_all_fragments=True, limit=minimal_homology)
assembly_plan = asm.get_linear_assemblies()[0]
print(assembly_plan)


((1, -2, SimpleLocation(ExactPosition(3), ExactPosition(6)), SimpleLocation(ExactPosition(0), ExactPosition(3))), (-2, 3, SimpleLocation(ExactPosition(6), ExactPosition(9)), SimpleLocation(ExactPosition(0), ExactPosition(3))))


In [61]:
# When to want to generate the sequence given by a particular assembly you use
# the function assemble:
assembled_seq = assemble(fragments=fragments2, assembly=assembly_plan, is_circular=False)

assembled_seq.seq


Dseq(-15)
ATACCCTTATTTGCG
TATGGGAATAAACGC

Using the indexes in the fragment input list to the `Assembly` class is convenient because we want to apply the following constrains:
* Assemblies should contain all fragments in the input only once
* Each possible assembly must be returned only once.

We can use a graph library to find paths between nodes and easily apply these constrains:
* Representing a fragment and its reverse complement as 1,-1 2,-2 etc. makes it easy to apply the constrain that each fragment must be present only once.
* The linear assembly from above `('1[3:6]:2[0:3]', '2[6:9]:3[0:3]')` is identical to `('-3[3:6]:-2[0:3]', '-2[6:9]:-1[0:3]')`, it's just the reverse complement (inverted fragments in inverted order). We apply the constrain that the first element always must be always in the forward orientation in linear assemblies.
* We also apply this in circular assembly, with the extra constrain that the first element in the assembly is the one with the smallest index. Otherwise, a circular assembly of 3 elements could be represented in three different ways.




### Back to the cloning

In [62]:
# To keep track of the cloning information, this is how I do it

asm = Assembly(fragments, algorithm=gibson_overlap, use_fragment_order=False, use_all_fragments=True, limit=minimal_homology)

# We want all possible assemblies
possible_assemblies = asm.get_linear_assemblies() + asm.get_circular_assemblies()

# This is a lambda function to shorten the code in the loop, since we use it
# twice to instantiate GibsonAssemblySource objects
create_source = lambda a, is_circular : GibsonAssemblySource.from_assembly(assembly=a, circular=is_circular, id=input_source.id, input=input_source.input)

asm = Assembly(fragments, algorithm=gibson_overlap, use_fragment_order=False, use_all_fragments=True, limit=minimal_homology)
out_sources = [create_source(a, True) for a in asm.get_circular_assemblies()]
out_sources += [create_source(a, False) for a in asm.get_linear_assemblies()]

# Important thing to notice here that out_source is a list of
# GibsonAssemblySource (pydantic objects) which means that they contain
# locations as strings, but below we need the locations again as objects
# to generate the output sequences, so we convert back those strings into
# locations again.

out_sequences = [format_sequence_genbank(assemble(fragments, s.get_assembly_plan(), s.circular)) for s in out_sources]

### 3 Returning values to user

In [63]:

# This would be the return value of the API.
return_value = {'sequences': out_sequences, 'sources': out_sources}

# Questions / Input

All this works and