# Analysis of context effects on synthetic gene expression levels

In this example we study the effects of compositional and cellular context on gene expression using triple reporter plasmids. See paper (https://www.biorxiv.org/content/10.1101/590299v1) for details of plasmid composition. In summary, each plasmid contains three transcription units producing RFP, YFP and CFP. The CFP TU is maintained the same in all plasmids, but the promoter of the RFP and YFP TUs is changed, generating 14 different combinations or contexts with a common reference gene.

First lets import the packages that we need, including the Flapjack API, and set some parameters for plotting with matplotlib:

In [None]:
import flapjack
from flapjack import Flapjack
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
import plotly
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.io as io
import json
import pandas as pd
import seaborn as sns
import getpass
%matplotlib inline

SMALL_SIZE = 6
MEDIUM_SIZE = 10
BIGGER_SIZE = 12

plt.rc('font', size=SMALL_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=SMALL_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=SMALL_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=SMALL_SIZE)    # legend fontsize
plt.rc('figure', titlesize=SMALL_SIZE)  # fontsize of the figure title

io.orca.shutdown_server()

We login to the API

In [None]:
user = input()
passwd = getpass.getpass()
#fj = Flapjack('flapjack.rudge-lab.org:8000')
fj = Flapjack('rudge-lab.org:8000')
fj.log_in(username=user, password=passwd)

## Figure 5B - Plotting the data

As well as using the Flapjack webapp (https://github.com/SynBioUC/flapjack_frontend/wiki/Context-effects-on-gene-expression-levels), you can obtain the figure 5B using the Flapjack Python package. Filter the data to select: 
* study: "Context effects", 
* vector (plasmid): "pAAA",
* strain: "MG1655z1",
* media: "M9-glucose"

To compare between measurements we can group the data by Vector (subplots), and "Signal" (lines). In order to compare different data with various magnitudes we normalize, here by the min/max of the measurements for each sample. 

* normalize='Min/Max',
* subplots='Vector',
* markers='Signal',
* plot='Mean'

In [None]:
# Get objects ids
study_id = fj.get('study', name='Context effects').id
vector_id = fj.get('vector', name='pAAA').id
strain_id = fj.get('strain', name='MG1655z1').id
media_id = fj.get('media', name='M9-glucose').id

# Query and plot data using Python package
fig = fj.plot(study=study_id,
              vector=vector_id,
              strain=strain_id,
              media=media_id,
              normalize='Min/Max',
              subplots='Vector',
              markers='Signal',
              plot='Mean')

fig

This plot looks nice in the web interface, but for publication or reports we can format them better using Plotly. Here we format the figure to be half the width of a 1-column figure (1.65 inches wide) and 6pt font.

In [None]:
# Modify width and size
fig = flapjack.layout_print(fig, width=1.65, height=1.1, font_size=6)
fig

In [None]:
fname = 'Figure5B_raw.png'
io.write_image(fig, fname)

## Figure 5B - Plotting the expression rate of pAAA
To analyze the behaviour of the TUs in more detail we can compute the expression rate (or synthesis rate) of the reporters using the direct method (Zulkower et al., 2015). To do this, we add filters to the method used above (fj.plot()):

* type='Expression Rate (direct)'
* degr=0,
* eps_L=1e-7
* biomass_signal=biomass_signal_id

In [None]:
# Get OD id
biomass_signal_id = fj.get('signal', name='OD').id[0]

# Query and plot data using Python package
fig = fj.plot(study=study_id,
              vector=vector_id,
              strain=strain_id,
              media=media_id,              
              type='Expression Rate (direct)',
              degr=0,
              eps_L=1e-7,
              biomass_signal=biomass_signal_id,
              normalize='Mean/std',
              subplots='Vector',
              markers='Signal',
              plot='Mean')

fig

In [None]:
# Modify width and size
fig = flapjack.layout_print(fig, width=1.65, height=1.1, font_size=6)
fig

You can also save the figure as PNG (or hover the figure and click on the camera icon "Download plot as png") using the method:

In [None]:
fname = 'Figure5B_Expression_rate.png'
io.write_image(fig, fname)

## Figure 5C - Summarizing dynamics with mean expression rates
As a first approach to the overall dynamics of a genetic circuit is to take the mean level of expression, as approximated by the signal detected in the assay. This allows us to compare the average rates of gene expression

In [None]:
od = fj.get('signal', name='OD')
cfp = fj.get('signal', name='CFP')
study = fj.get('study', name=['Context effects'])
exp = fj.analysis(study=study.id,
                    type='Mean Expression',
                    biomass_signal=od.id[0]
                      )

In [None]:
nexp = pd.DataFrame()
for samp, data in exp.groupby('Sample'):
    yfp = data[data.Signal=='YFP']['Expression'].values
    rfp = data[data.Signal=='RFP']['Expression'].values
    cfp = data[data.Signal=='CFP']['Expression'].values
    data.loc[data.Signal=='YFP', ['Expression']] = yfp/cfp
    data.loc[data.Signal=='RFP', ['Expression']] = rfp/cfp
    nexp = nexp.append(data)

Create a heatmap of gene expression in each condition by pivoting the dataframe:

In [None]:
fig,ax = plt.subplots(2,1, figsize=(3.5,2.25), sharex=True)
for i,name in enumerate(['RFP', 'YFP']):
    df_x = nexp[nexp['Signal']==name].copy()
    df_heatmap = df_x.pivot_table(values='Expression',
                                index=['Strain', 'Media'],
                                columns='Vector', aggfunc=np.mean)
    # pAAAF is used in Figure 6
    if 'pAAAF' in df_heatmap.columns:
        df_heatmap = df_heatmap.drop('pAAAF', axis=1)
    # Normalize to mean of columns
    df_heatmap = df_heatmap / df_heatmap.mean()
    # Normalize rows to mean
    #df_heatmap = df_heatmap.div( df_heatmap.mean(axis=1), axis=0 )
    # Take log of normalized values
    df_heatmap = df_heatmap.apply(np.log2)
    
    # Plot heatmap
    sns.heatmap(df_heatmap, annot=False, ax=ax[i], 
                square=True, 
                cmap='bwr', 
                center=0,
                #clim=[-1,1],
                vmin=-1, vmax=1, 
                facecolor='gray',
                linewidths=0.5, linecolor='black')
    # Format plot
    bottom, top = ax[i].get_ylim()
    ax[i].set_ylim(bottom + 0.5, top - 0.5)
    ax[i].set_title(name)
    ax[i].set_xlabel('')
    ax[i].set_ylabel('')
    #plt.tight_layout()
    plt.subplots_adjust(hspace=0.2)
    plt.xticks(rotation=90)
    plt.title(name)
plt.tight_layout()
plt.savefig('heatmap_rpus.png', dpi=300, bbox_inches='tight')

## Figure 5D - Using SynbioHub to compare compositional contexts

First get the DNAs in the study:

Next we query SynbioHub to get the part composition and add this to our dataframe. We are interested in the identity of the RFP and CFP TUs, which are encoded as "engineered regions":

In [None]:
from sbol2 import *

# Some nicer names for display purposes
TU_names = {
    'TU1_1': 'A',
    'TU1_2': 'B',
    'TU1_5': 'E',
    'TU1_8': 'G',
    
    'TU2_1': 'A',
    'TU2_3': 'C',
    'TU2_5': 'E',
    'TU2_6': 'D',
    'TU2_7': 'F',
}

# The URI of "Engineered region" used to encode the TUs
TU_role = 'http://identifiers.org/so/SO:0000804'

df = nexp.copy()
vectors = df.Vector.unique()

synbiouc = PartShop('http://3.128.232.8:7777')

result = pd.DataFrame()
rows_to_add = []
for vector in vectors:
    vec = fj.get('vector', name=[vector])
    dna_id = vec.dnas[0]
    dna = fj.get('dna', id=[dna_id])
    sboluri = dna.sboluri[0]
    data = df[df.Vector==vec.name[0]]

    if sboluri!='':
        # Create a new SBOL document
        doc = Document()
        synbiouc.pull(sboluri, doc)
        plasmid = doc.componentDefinitions[sboluri]
        composition = plasmid.getPrimaryStructure()
        TUs = [component.displayId for component in composition \
                       if TU_role in component.roles]
        # The first TU is the RFP TU
        data = data.assign(rfp_tu=TU_names[TUs[0]])
        # The second TU is the YFP TU
        data = data.assign(yfp_tu=TU_names[TUs[1]])
        rows_to_add.append(data)
    else:
        print(f"Vector {vector} does not have SBOL URI")

df = result.append(rows_to_add)

In [None]:
np.any(np.isnan(df.Expression.values))

Now we can make heatmaps to compare the mean expression rate ratio of each TU in its different compositional contexts. To do this we pivot the table to have the YFP TU name along the x-axis and the RFP-TU along the y-axis. In order to see the effect of context irrespetive of the overall magnitude of expression, we normalize by the mean of the rows of the heatmap for RFP and the columns for YFP. We then take the log base 2 to see the fold change over the mean. In this way we expect the heatmap to be uniformly zero if there are no context effects.

In [None]:
grouped_media = df.groupby('Media')
for media,media_data in grouped_media:
    grouped_strain = media_data.groupby('Strain')
    for strain,df in grouped_strain:
        fig,ax = plt.subplots(1,2, figsize=(3.3,1.4), sharex=False, sharey=False)
        #cbar_ax = fig.add_axes([0.91, .1, .03, .75])
        cbar_ax = fig.add_axes([0.9, .1, .03, .75])
        df_c = df[df['Signal']=='CFP']
        for name,i in zip(['RFP', 'YFP'], np.arange(0,2)):
            df_x = df[df['Signal']==name].copy(deep=True)
            df_heatmap = df_x.pivot_table(values='Expression',
                                            index='rfp_tu',
                                            columns='yfp_tu', aggfunc=np.mean)
            if name=='YFP':
                # Normalize columns to mean
                df_heatmap = df_heatmap / df_heatmap.mean()
            else:
                # Normalize rows to mean
                df_heatmap = df_heatmap.div( df_heatmap.mean(axis=1), axis=0 )
            
            # Take log of normalized values
            df_heatmap = df_heatmap.apply(np.log2)
                
            g = sns.heatmap(df_heatmap, annot=True, ax=ax[i], 
                        square=True, fmt='0.1f', cmap='bwr', 
                        center=0, vmin=-3., vmax=3., linewidths=1, linecolor='black',
                        cbar=(i==1), cbar_ax=cbar_ax)
            g.set_facecolor('gray')
            bottom, top = ax[i].get_ylim()
            ax[i].set_title(name)
            #ax[i].set_ylim(bottom + 0.5, top - 0.5)
            ax[i].set_xlabel('YFP TU')
            if i==0:
                ax[i].set_ylabel('RFP TU')
            else:
                ax[i].set_ylabel('')
        plt.subplots_adjust(bottom=0.3)
        plt.suptitle(strain + ' in ' + media)
        plt.savefig('heatmap_rpu_tu_'+media+'_'+strain+'.png', dpi=300)

## Figure 5E - Effect of context on gene expression time dynamics

In [None]:
study = fj.get('study', name='Context effects')
study_id = fj.get('study', name='Context effects').id
od = fj.get('signal', name='OD')

In [None]:
yfp_vectors = [
    ['pBFA', 'pEFA', 'pGFA'],
    ['pBDA', 'pEDA', 'pGDA'],
    ['pBCA', 'pECA', 'pGCA'],
    ['pAAA', 'pBAA', 'pEAA', 'pGAA']
]

yfp_vector_ids = [[fj.get('vector', name=name).id[0] for name in vecs] for vecs in yfp_vectors]
yfp_id = fj.get('signal', name='YFP').id

medias = ['M9-glucose', 'M9-glycerol']
strains = ['MG1655z1', 'Top10']

# YFP figures
for media in medias:
    for strain in strains:
        print(media, strain)
        for vi,vector_id in enumerate(yfp_vector_ids):
            media_id = fj.get('media', name=media).id
            strain_id = fj.get('strain', name=strain).id
            fig = fj.plot(study=study.id, 
                           vector=vector_id,
                           media=media_id,
                           strain=strain_id,
                           signal=yfp_id,
                           type='Expression Rate (direct)',
                           degr=0,
                           eps_L=1e-6,
                           biomass_signal=od.id[0],
                           normalize='Mean/std', 
                           subplots='Signal', 
                           markers='Vector', 
                           plot='Mean')
            fig = flapjack.layout_print(fig, width=1.65, height=1.25)
            fname = '-'.join([media, strain, yfp_vectors[vi][0][2], 'YFP.png'])
            io.write_image(fig, fname)

In [None]:
rfp_vectors = [
    ['pBAA', 'pBCA', 'pBDA', 'pBFA'],
    ['pEAA', 'pECA', 'pEDA', 'pEFA'],
    ['pGAA', 'pGCA', 'pGDA', 'pGEA', 'pGFA']
]

rfp_vector_ids = [[fj.get('vector', name=name).id[0] for name in vecs] for vecs in rfp_vectors]
rfp_id = fj.get('signal', name='RFP').id

medias = ['M9-glucose', 'M9-glycerol']
strains = ['MG1655z1', 'Top10']

# RFP figures
for media in medias:
    for strain in strains:
        print(media, strain)
        for vi,vector_id in enumerate(rfp_vector_ids):            
            media_id = fj.get('media', name=media).id
            strain_id = fj.get('strain', name=strain).id
            fig = fj.plot(study=study.id, 
                           vector=vector_id,
                           media=media_id,
                           strain=strain_id,
                           signal=rfp_id,
                           type='Expression Rate (direct)',
                           degr=0,
                           eps_L=1e-6,
                           biomass_signal=od.id[0], 
                           normalize='Mean/std', 
                           subplots='Signal', 
                           markers='Vector', 
                           plot='Mean')
            fig = flapjack.layout_print(fig, width=1.65, height=1.25)
            fname = '-'.join([media, strain, rfp_vectors[vi][0][1], 'RFP.png'])
            io.write_image(fig, fname)

In [None]:
cfp_id = fj.get('signal', name='CFP').id
fig = fj.plot(study=study.id, 
               signal=cfp_id,
               type='Expression Rate (direct)',
               degr=0,
               eps_L=1e-6,
               biomass_signal=od.id[0],
               normalize='Mean/std', 
               subplots='Signal', 
               markers='Vector', 
               plot='Mean')
fig = flapjack.layout_print(fig, width=1.65, height=1.25)
fig.update_traces(showlegend=False)
fname = 'CFP.png'
io.write_image(fig, fname)