# Cookbook for pydna

Björn Johansson
CBMA
University of Minho
Braga
Portugal

<div>
<img src="logo.png" width="15%"/>
</div>

<a target="_blank" href="https://colab.research.google.com/github/pydna-group/pydna/blob/master/docs/cookbook/cookbook.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## What is pydna?

Pydna is a python package that provides functions and data types to deal with double stranded DNA. It depends mostly on Biopython (a python bioinformatics package), networkx (a graph theory package).

## What does Pydna provide?

Pydna provides classes and functions for molecular biology using python. Notably, PCR, cut and paste cloning (sub-cloning) and homologous recombination between linear DNA fragments are supported. 

Most functionality is implemented as methods for the double stranded DNA sequence record classes “Dseq” and "Dseqrecord", which are a subclasses of the Biopython Seq and SeqRecord classes, respectively.

Pydna was designed to semantically imitate how sub-cloning experiments are typically documented in scientific literature. One use case for pydna is to create executable documentation for a sub-cloning experiment. 

The advantage of documenting with pydna is that the pydna code unambiguously describe the experiment and can be executed to yield the sequence of the of the resulting DNA molecule(s) and all intermediary steps. Pydna code describing a sub cloning is reasonably compact and also meant to be easily readable.

Look [here](https://github.com/MetabolicEngineeringGroupCBMA/pydna-examples?tab=readme-ov-file#pydna-examples) for examples.

### Example 1: Sub cloning by restriction digestion and ligation

The construction of the vector YEp24PGK_XK is described on page 4250 in the publication below:

[Johansson et al., “Xylulokinase Overexpression in Two Strains of Saccharomyces cerevisiae Also Expressing Xylose Reductase and Xylitol Dehydrogenase and Its Effect on Fermentation of Xylose and Lignocellulosic Hydrolysate” Applied and Environmental Microb](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC93154/)

Briefly, the XKS1 gene from _Saccharomyces cerevisiae_ was amplified by PCR using two primers called primer1 and primer3.

The primers add restriction sites for BamHI to the ends of the  XKS1 gene. 

The gene is digested with BamHI and ligated to the YEp24PGK plasmid that has previously been digested with BglII which cut the plasmid in one location. BamHI and BglII are compatible so fragments cut with either enzyme can be ligated together. 

Fig 1 shows an image outlining the strategy. BamHI is the blue enzyme and BglII is shown in yellow.

<div>
<img src="figure1.png" width="50%"/>
</div>

In [None]:
%%capture
#Install pydna for colab.
!pip install pydna



In [6]:
from pydna.genbank import Genbank

In [7]:
gb = Genbank("myemail@mydomain.com")

In [8]:
YEp24PGK = gb.nucleotide("KC562906.1")

The representation of the YEp24PGK object includes a link to the record on Genbank.

In [9]:
YEp24PGK

In [10]:
YEp24PGK.seq

Dseq(o9637)
GAAT..TCAA
CTTA..AGTT

In [11]:
from pydna.parsers import parse_primers

In [12]:
p1, p3 = parse_primers(
    '''
>primer1
GCGGATCCTCTAGAATGGTTTGTTCAGTAATTCAG
>primer3
AGATCTGGATCCTTAGATGAGAGTCTTTTCCAG'''
)

In [13]:
XKS1 = gb.nucleotide("Z72979.1").rc()

In [14]:
XKS1

In [15]:
XKS1.seq

Dseq(-3140)
ATGA..AAAA
TACT..TTTT

In [16]:
from pydna.amplify import pcr

In [17]:
PCR_prod = pcr(p1, p3, XKS1)

Primer1 and 3 add restriction sites to the PCR product. The stuffer fragments are removed after digestion.

In [18]:
PCR_prod.figure()

                    5TGTTCAGTAATTCAG...CTGGAAAAGACTCTCATCTAA3
                                       |||||||||||||||||||||
                                      3GACCTTTTCTGAGAGTAGATTCCTAGGTCTAGA5
5GCGGATCCTCTAGAATGGTTTGTTCAGTAATTCAG3
                     |||||||||||||||
                    3ACAAGTCATTAAGTC...GACCTTTTCTGAGAGTAGATT5

In [19]:
from Bio.Restriction import BamHI, BglII

In [20]:
stuffer1, insert, stuffer2 = PCR_prod.cut(BamHI)

In [21]:
stuffer1, insert, stuffer2

(Dseqrecord(-7), Dseqrecord(-1819), Dseqrecord(-11))

In [22]:
insert.seq

Dseq(-1819)
GATCCTCT..TAAG    
    GAGA..ATTCCTAG

In [23]:
YEp24PGK_BglII = YEp24PGK.linearize(BglII)

In [24]:
YEp24PGK_BglII.seq

Dseq(-9641)
GATCTCCC..AAAA    
    AGGG..TTTTCTAG

In [25]:
YEp24PGK_XK = (YEp24PGK_BglII + insert).looped()

In [26]:
YEp24PGK_XK = YEp24PGK_XK.synced(YEp24PGK)

In [27]:
YEp24PGK_XK.cseguid()

AttributeError: 'Dseqrecord' object has no attribute 'cseguid'

In [None]:
YEp24PGK_XK.write("YEp24PGK_XK.gb")

### Example 2: Sub cloning by homologous recombination

The construction of the vector pGUP1 is described in the publication:

[Régine Bosson, Malika Jaquenoud, and Andreas Conzelmann, “GUP1 of Saccharomyces cerevisiae Encodes an O-acyltransferase Involved in Remodeling of the GPI Anchor,” Molecular Biology of the Cell 17, no. 6 (June 2006): 2636–2645.](https://www.molbiolcell.org/doi/10.1091/mbc.e06-02-0104)

Our objective is to replicate the cloning steps using Pydna so that we can have the final sequence of the plasmid.

The cloning is described in the paper on page 2637 on the upper left side of the publication:

"The expression vectors harboring GUP1 or GUP1H447A were obtained as follows: the open reading frame of GUP1 was amplified by PCR using plasmid pBH2178 (kind gift from Morten Kielland-Brandt) as a template and using primers  and, underlined sequences being homologous to the target vector pGREG505 (Jansen et al., 2005). The PCR fragment was purified by a PCR purification kit (QIAGEN, Chatsworth, CA) and introduced into pGREG505 by co transfection into yeast cells thus generating pGUP1 (Jansen et al., 2005)."


<div>
<img src="figure2.png" width="50%"/>
</div>


Briefly, two primers (GUP1rec1sens and GUP1rec2AS) were used to amplify the GUP1 gene from _Saccharomyces cerevisiae_ chromosomal DNA using the two primers:

    >GUP1rec1sens 
    gaattcgatatcaagcttatcgataccgatgtcgctgatcagcatcctgtc

    >GUP1rec2AS
    gacataactaattacatgactcgaggtcgactcagcattttaggtaaattccg

Then the vector pGREG505 was digested with the restriction enzyme SalI. This is not mentioned in  Bosson et. al, but they make a reference to Jansen 2005:

Jansen G, Wu C, Schade B, Thomas DY, Whiteway M. 2005. Drag&Drop cloning in yeast. Gene, 344: 43–51. 

Jansen et al describe the pGREG505 vector and that it is digested with SalI before cloning. The SalI digests the vector in two places, so a fragment containing the HIS3 gene is removed.

The SalI sites are visible in the plasmid drawing in Fig. 3.

<div>
<img src="pGREG505.png" width="30%"/>
</div>

In [None]:
GUP1rec1sens, GUP1rec2AS = parse_primers(
    '''
>GUP1rec1sens
gaattcgatatcaagcttatcgataccgatgtcgctgatcagcatcctgtc
>GUP1rec2AS
gacataactaattacatgactcgaggtcgactcagcattttaggtaaattccg
'''
)

In [None]:
GUP1_locus = gb.nucleotide("Z72606")

In [None]:
insert = pcr(GUP1rec1sens, GUP1rec2AS, GUP1_locus)

In [None]:
insert.figure()

                               5tcagcattttaggtaaattccg...gacaggatgctgatcagcgacat3
                                                         |||||||||||||||||||||||
                                                        3ctgtcctacgactagtcgctgtagccatagctattcgaactatagcttaag5
5gacataactaattacatgactcgaggtcgactcagcattttaggtaaattccg3
                                ||||||||||||||||||||||
                               3agtcgtaaaatccatttaaggc...ctgtcctacgactagtcgctgta5

In [None]:
from pydna.readers import read

In [None]:
pGREG505 = read("pGREG505.gb")

In [None]:
pGREG505

In [None]:
from Bio.Restriction import SalI

In [None]:
his3_stuffer, lin_vect = pGREG505.cut(SalI)

In [None]:
lin_vect, his3_stuffer

(Dseqrecord(-8301), Dseqrecord(-1172))

In [None]:
from pydna.assembly import Assembly

The Assembly class implements homologous recombination and make use of the [NetworkX](https://networkx.org) package to find all recombination products.

In [None]:
asm = Assembly((lin_vect, insert))

In [None]:
asm

Assembly
fragments..: 8301bp 1742bp
limit(bp)..: 25
G.nodes....: 4
algorithm..: common_sub_strings

In [None]:
candidates = asm.assemble_circular()

The two candidates are equivalent

In [None]:
candidate1, candidate2 = candidates

In [None]:
candidate1.cseguid() == candidate2.cseguid()

True

In [None]:
pGUP1 = candidate1

In [None]:
pGUP1 = pGUP1.synced(pGREG505)

In [None]:
pGUP1.cseguid()

'0R8hr15t-psjHVuuTj_JufGxOPg'

In [None]:
pGUP1.write("pGUP1.gb")