In [None]:
%%capture
# Install pydna (only when running on Colab)
import sys
if 'google.colab' in sys.modules:
    %pip install opencloning

# Gibson Assembly

Let's imagine we want to clone a gene of interest into a plasmid vector using Gibson Assembly. In this case, we want to clone the gene ase1 from _S. cerevisiae_ into the vector pREP42-MCS+ (AddGene ID: 52691).

## Importing sequences

You can use OpenCloning to fetch the sequences directly, or provide your own files. Let's see how we can do this.

### Getting the AddGene plasmid

In [None]:
from opencloning.endpoints.external_import import get_from_repository_id_addgene
from opencloning.pydantic_models import AddGeneIdSource

# To load the plasmid from AddGene, first we initialize a source
addgene_source = AddGeneIdSource(
    id=0, # For now, leave a zero here, we will revisit this
    repository_id='52691',
    repository_name='addgene',
    )

# Then we use the api endpoint function request_from_addgene (same function called when you make a request)
# to fetch the plasmid sequence
# Why await? Request functions use asynchronous code, so we need to await the response
resp = await get_from_repository_id_addgene(addgene_source)

# The return value of endpoints is always a dictionary with keys 'sources' and 'sequences',
# which are lists of the respective objects
addgene_source = resp['sources'][0]
plasmid_seq = resp['sequences'][0]

# Let's pretty-print see what we have
print(plasmid_seq.model_dump_json(indent=2))
print(addgene_source.model_dump_json(indent=2))



{
  "id": 0,
  "type": "TextFileSequence",
  "sequence_file_format": "genbank",
  "overhang_crick_3prime": 0,
  "overhang_watson_3prime": 0,
  "file_content": "LOCUS       pREP42-MCS+             8315 bp    DNA     circular SYN 26-AUG-2024\nDEFINITION  Same as pREP42, with better MCS.\nACCESSION   .\nVERSION     .\nKEYWORDS    .\nSOURCE      synthetic DNA construct\n  ORGANISM  synthetic DNA construct\n            .\nREFERENCE   1  (bases 1 to 8315)\n  TITLE     various\n  JOURNAL   Unpublished\nREFERENCE   2  (bases 1 to 8315)\n  AUTHORS   .\n  TITLE     Direct Submission\n  JOURNAL   Exported Aug 26, 2024 from SnapGene Server 7.0.3\n            https://www.snapgene.com\nCOMMENT     SGRef: number: 1; type: \"Journal Article\"; journalName:\n            \"Unpublished\"\nFEATURES             Location/Qualifiers\n     source          1..8315\n                     /mol_type=\"other DNA\"\n                     /organism=\"synthetic DNA construct\"\n     promoter        531..1685\n         

'LOCUS       sequence-210876-        8315 bp DNA     circular SYN 26-AUG-2024\n'
Found locus 'sequence-210876-' size '8315' residue_type 'DNA'
Some fields may be wrong.


The types of the objects are `AddGeneIdSource` (now updated with extra fields) and `TextFileSequence`, respectively
These are the same types you will find in the json data model, but `TextFileSequence` is not very useful if you want to explore the sequence. Let's see how to convert it to a `Dseqrecord` object from [pydna](https://github.com/pydna-group/pydna), which is a subclass of `SeqRecord` from Biopython.

In [18]:
from opencloning.dna_functions import read_dsrecord_from_json

plasmid_seq_pydna = read_dsrecord_from_json(plasmid_seq)

# You can do things like:

print('Is circular?', plasmid_seq_pydna.circular)
print('Length:', len(plasmid_seq_pydna))
print('ID:', plasmid_seq_pydna.id)
print('Description:', plasmid_seq_pydna.description)
print('features:')
for feature in plasmid_seq_pydna.features:
    print()
    print(feature)


Is circular? True
Length: 8315
ID: 0
Description: Same as pREP42, with better MCS
features:

type: source
location: [0:8315](+)
qualifiers:
    Key: mol_type, Value: ['other DNA']
    Key: organism, Value: ['synthetic DNA construct']


type: promoter
location: [530:1685](+)
qualifiers:
    Key: label, Value: ['nmt1 P41 promoter']
    Key: note, Value: ['mutant nmt1 promoter from Schizosaccharomyces pombe, conferring medium strength thiamine-repressible expression']


type: rep_origin
location: [2801:3585](+)
qualifiers:
    Key: label, Value: ['ars1']
    Key: note, Value: ['Schizosaccharomyces pombe autonomously replicating sequence ars1']


type: primer_bind
location: [3934:3952](-)
qualifiers:
    Key: label, Value: ['M13 Forward']
    Key: note, Value: ['In lacZ gene. Also called M13-F20 or M13 (-21) Forward']


type: primer_bind
location: [3934:3951](-)
qualifiers:
    Key: label, Value: ['M13 fwd']
    Key: note, Value: ['common sequencing primer, one of multiple similar variants

### Getting the locus sequence

This assumes you know the coordinates of the locus. You can get that using the OpenCloning website, or automate this using the [NCBI datasets API](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/api/rest-api/).

In [19]:
from opencloning.endpoints.external_import import genome_coordinates
from opencloning.pydantic_models import GenomeCoordinatesSource

locus_source = GenomeCoordinatesSource(
    id= 0,
    type= "GenomeCoordinatesSource",
    assembly_accession= "GCF_000146045.2",
    sequence_accession= "NC_001147.6",
    locus_tag= "YOR058C",
    gene_id= 854223,
    start= 432688,
    end= 437345,
    strand= -1
)

resp = await genome_coordinates(locus_source)

locus_seq = resp['sequences'][0]
locus_source = resp['sources'][0]

# Let's pretty-print see what we have
print(locus_seq.model_dump_json(indent=2))
print(locus_source.model_dump_json(indent=2))


{
  "id": 0,
  "type": "TextFileSequence",
  "sequence_file_format": "genbank",
  "overhang_crick_3prime": 0,
  "overhang_watson_3prime": 0,
  "file_content": "LOCUS       NC_001147               4658 bp    DNA     linear   CON 05-MAR-2025\nDEFINITION  Saccharomyces cerevisiae S288C chromosome XV, complete sequence.\nACCESSION   NC_001147\nVERSION     NC_001147.6\nDBLINK      BioProject: PRJNA128\n            Assembly: GCF_000146045.2\nKEYWORDS    RefSeq.\nSOURCE      Saccharomyces cerevisiae S288C\n  ORGANISM  Saccharomyces cerevisiae S288C\n            Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina;\n            Saccharomycetes; Saccharomycetales; Saccharomycetaceae;\n            Saccharomyces.\nREFERENCE   1  (bases 1 to 4658)\n  AUTHORS   Engel,S.R., Wong,E.D., Nash,R.S., Aleksander,S., Alexander,M.,\n            Douglass,E., Karra,K., Miyasato,S.R., Simison,M., Skrzypek,M.S.,\n            Weng,S. and Cherry,J.M.\n  TITLE     New data and collaborations at the Saccharomyce

### Loading a file

In [22]:
from fastapi import UploadFile, Response
from opencloning.endpoints.external_import import read_from_file

with open('pFA6a-hphMX6.gb', 'rb') as f:
    dummy_resp = Response()
    # It's a bit annoying, but you cannot use default values of read_from_file because they
    # are of type Query, and are not correctly handled when calling the function directly.
    resp = await read_from_file(dummy_resp, UploadFile(file=f, filename='pFA6a-hphMX6.gb'), None, None, True, None)

file_source = resp['sources'][0]
file_seq = resp['sequences'][0]

# Let's pretty-print see what we have
print(file_seq.model_dump_json(indent=2))
print(file_source.model_dump_json(indent=2))

{
  "id": 0,
  "type": "TextFileSequence",
  "sequence_file_format": "genbank",
  "overhang_crick_3prime": 0,
  "overhang_watson_3prime": 0,
  "file_content": "LOCUS       pFA6a-hphMX6            4157 bp    DNA     circular SYN 24-DEC-2013\nDEFINITION  Plasmid for yeast gene deletion using the hphMX6 selectable marker\n            conferring hygromycin resistance.\nACCESSION   .\nVERSION     .\nKEYWORDS    pFA6a-hphMX6.\nSOURCE      synthetic DNA construct\n  ORGANISM  synthetic DNA construct\n            .\nREFERENCE   1  (bases 1 to 4157)\n  AUTHORS   Hentges P, Van Driessche B, Tafforeau L, Vandenhaute J, Carr AM.\n  TITLE     Three novel antibiotic marker cassettes for gene disruption and\n            marker switching in Schizosaccharomyces pombe.\n  JOURNAL   Yeast 2005;22:1013-9.\n   PUBMED   16200533\nREFERENCE   2  (bases 1 to 4157)\n  AUTHORS   EUROSCARF\n  TITLE     Direct Submission\n  JOURNAL   Exported Thursday, Feb 4, 2021 from SnapGene 5.2.4\n            https://www.snap

## Writing a cloning strategy json file

Now that we have these three sequences, we can write a cloning strategy json file.

In [24]:
from opencloning.pydantic_models import BaseCloningStrategy

# We create an empty cloning strategy
cloning_strategy = BaseCloningStrategy(
    sequences=[],
    sources=[],
    primers=[],
    description='My example cloning strategy',
)

# Then add the sequences and sources that produce them
cloning_strategy.add_source_and_sequence(file_source, file_seq)
cloning_strategy.add_source_and_sequence(locus_source, locus_seq)
cloning_strategy.add_source_and_sequence(addgene_source, plasmid_seq)

# Notice that now, unique ids have been assigned to the sources and sequences
print(file_source.id)
print(file_seq.id)
print(locus_source.id)
print(locus_seq.id)
print(addgene_source.id)
print(plasmid_seq.id)


1
2
3
4
5
6


We can now export the cloning strategy to a json file

In [25]:
with open('first_strategy.json', 'w') as f:
    f.write(cloning_strategy.model_dump_json(indent=2))


And if you load it to OpenCloning website, you should see something like this:

<img width="400px" src="first_strategy.png" alt="OpenCloning website screenshot">


## Doing PCR and Gibson Assembly

Here, the principle is basically the same as before, but we use sources that take sequences and primers as inputs.