# Analysis of context effects on synthetic gene expression

In this example we study the effects of compositional and cellular context on gene expression using triple reporter plasmids. See paper (REF) for details of plasmid composition. In summary, each plasmid contains three transcription units producing RFP, YFP and CFP. The CFP TU is maintained the same in all plasmids, but the promoter of the RFP and YFP TUs is changed, generating 14 different combinations or contexts with a common reference gene.

First lets import the packages that we need, including the Flapjack API, and set some parameters for plotting with matplotlib:

In [None]:
from flapjack_api import FlapjackSession, layout_print, load_figures, replace_array_columns
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
import plotly
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.io as io
import json
import pandas as pd
import seaborn as sns
import getpass
%matplotlib inline

SMALL_SIZE = 12
MEDIUM_SIZE = 10
BIGGER_SIZE = 12

plt.rc('font', size=SMALL_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=SMALL_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=SMALL_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=SMALL_SIZE)    # legend fontsize
plt.rc('figure', titlesize=SMALL_SIZE)  # fontsize of the figure title

io.orca.shutdown_server()

## Plotting the data
Using the Flapjack webapp, filter the data to select the study "Phase space" (** update name), and then choose the DNA (plasmid) named "pAAA". To compare between measurements we can group the data by DNA (tabs), DNA (subplots), and "Name" (lines). In order to compare different data with various magnitudes we normalize, here by the min/max of the measurements for each sample, by selecting from the dropdown menu. Now click the "View" button to see the plots. These plots look nice in the web interface, but for publication or reports we can format them better using Plotly. Here we format the figure to be half the width of a 1-column figure (1.65 inches wide) and 6pt font. To do this click "Figure (JSON)" to download the figure as a JSON file, and load it in the cell below by choosing the correct file name:

In [None]:
figs = load_figures('pAAA_raw_data_normalized.json')
fig = figs['pAAA']
layout_print(fig, width=3.3/2, aspect=1.5, font_size=6)
fig.update_yaxes(title='Measurement')
fig.update_traces(showlegend=False, selector=dict(fill='toself'))
fig.update_traces(line=dict(color='#ff0000'), selector=dict(name='RFP'))
fig.update_traces(line=dict(color='#ffff00'), selector=dict(name='YFP'))
fig.update_traces(line=dict(color='#00ffff'), selector=dict(name='CFP'))
fig.update_traces(line=dict(color='#000000'), selector=dict(name='OD'))
fig.show()

## Plotting the expression rate of each TU
To analyze the behaviour of the TUs in more detail we can compute the expression rate (or synthesis rate) of the reporters. To do this, go to the analysis form. Choose "Expression rate (direct)". This implements the linear inversion method of NAME et al. (REF). Choose the degradation rate of the protein as 0 (stable protein) and the density name as "OD" (this is the biomass for normalization). Then click the analyze button to produce the plot. Again we can save the plot as JSON and reformat here in the notebook:

In [None]:
figs = load_figures('pAAA_expr_rate_direct_normalized.json')
fig = figs['pAAA']
layout_print(fig, width=3.3/2, aspect=1.5, font_size=6.)
fig.update_yaxes(title='Expression rate')
fig.update_traces(showlegend=False, selector=dict(fill='toself'))
fig.update_traces(line=dict(color='#ff0000'), selector=dict(name='RFP'))
fig.update_traces(line=dict(color='#ffff00'), selector=dict(name='YFP'))
fig.update_traces(line=dict(color='#00ffff'), selector=dict(name='CFP'))
fig.update_traces(line=dict(color='#000000'), selector=dict(name='OD'))
fig.show()

## Summarizing dynamics with mean expression rates
As a first approach to the overall dynamics of a genetic circuit is to take the mean level of expression, as approximated by the signal detected in the assay. This allows us to compare the average rates of gene expression

In [None]:
user = input()
passwd = getpass.getpass()
session = FlapjackSession('http://localhost:8989', user, passwd)

In [None]:
filter = { 
          'study': ['Context effects'],
            'remove_data': False,
          'normalize': 'off',
          'averaging': 'None',
          'density_name': 'OD',
            'ref_name': 'CFP'
         }
exp,s = session.get_mean_expression_rate_ratio(filter)
exp.head()
exp = replace_array_columns(exp)

Create a heatmap of gene expression in each condition by pivoting the dataframe:

In [None]:
fig,ax = plt.subplots(2,1, figsize=(6,4), sharex=True)
df_c = exp[exp['name']=='CFP'].copy()
for name,i in zip(['RFP', 'YFP'], np.arange(0,2)):
    df_x = exp[exp['name']==name].copy()
    df_heatmap = df_x.pivot_table(values='value',
                                index=['sample__strain__name', 'sample__media__name'],
                                columns='sample__dna__names', aggfunc=np.mean)
    # Normalize to mean of columns
    df_heatmap = df_heatmap / df_heatmap.mean()
    # Normalize rows to mean
    #df_heatmap = df_heatmap.div( df_heatmap.mean(axis=1), axis=0 )
    # Take log of normalized values
    df_heatmap = df_heatmap.apply(np.log2)
    # Plot heatmap
    sns.heatmap(df_heatmap, annot=False, ax=ax[i], 
                square=True, 
                cmap='bwr', fmt='0.1f', 
                center=0, 
                vmin=-1.2, vmax=1.2, 
                linewidths=0.5, linecolor='black')
    # Format plot
    bottom, top = ax[i].get_ylim()
    ax[i].set_ylim(bottom + 0.5, top - 0.5)
    ax[i].set_title(name)
    ax[i].set_xlabel('')
    ax[i].set_ylabel('')
    #plt.tight_layout()
    plt.subplots_adjust(hspace=0.2)
    plt.xticks(rotation=90)
    plt.title(name)
plt.savefig('heatmap_rpus.png', dpi=300, bbox_inches='tight')

## Using SynbioHub to compare compositional contexts

First get the DNAs in the study:

In [None]:
session = FlapjackSession('http://synbio.ing.puc.cl:8989', 'timrudge', 'chicken')
filter = { 
          'study': ['Context effects']
         }

dnas,s = session.get_dnas(filter)

Next we query SynbioHub to get the part composition and add this to our dataframe. We are interested in the identity of the RFP and CFP TUs, which are encoded as "engineered regions":

In [None]:
from sbol import *

# Some nicer names for display purposes
TU_names = {
    'TU1_1': 'A',
    'TU1_2': 'B',
    'TU1_5': 'E',
    'TU1_8': 'G',
    
    'TU2_1': 'A',
    'TU2_3': 'C',
    'TU2_5': 'E',
    'TU2_6': 'D',
    'TU2_7': 'F',
}

# The URI of "Engineered region" used to encode the TUs
TU_role = 'http://identifiers.org/so/SO:0000804'

df = exp.copy()

synbiouc = PartShop('http://synbio.ing.puc.cl:7777')

result = pd.DataFrame()
rows_to_add = []
for idx,dna in dnas.iterrows():
    dna_string = ' + '.join(dna['names'])
    data = df.loc[df['sample__dna__names']==dna_string]
    sboluris = dna['sboluris']
    # We know there is just one SBOL URI because there is only one plasmid
    sboluri = sboluris[0]
    if sboluri!='none':
        # Create a new SBOL document
        doc = Document()
        synbiouc.pull(sboluri, doc)
        plasmid = doc.componentDefinitions[sboluri]
        composition = plasmid.getPrimaryStructure()
        TUs = [component.displayId for component in composition \
                       if TU_role in component.roles]
        # The first TU is the RFP TU
        data['rfp_tu'] = TU_names[TUs[0]]
        # The second TU is the YFP TU
        data['yfp_tu'] = TU_names[TUs[1]]
        rows_to_add.append(data)

df = result.append(rows_to_add)

# Save data to JSON for later analysis
df.to_json('phase_space_ratio_of_mean_expression_rates.json')

df.head()

Now we can make heatmaps to compare the mean expression rate ratio of each TU in its different compositional contexts. To do this we pivot the table to have the YFP TU name along the x-axis and the RFP-TU along the y-axis. In order to see the effect of context irrespetive of the overall magnitude of expression, we normalize by the mean of the rows of the heatmap for RFP and the columns for YFP. We then take the log base 2 to see the fold change over the mean. In this way we expect the heatmap to be uniformly zero if there are no context effects.

In [None]:
grouped_media = df.groupby('sample__media__name')
for media,media_data in grouped_media:
    grouped_strain = media_data.groupby('sample__strain__name')
    for strain,df in grouped_strain:
        fig,ax = plt.subplots(1,2, figsize=(6.7,2.75), sharex=False, sharey=False)
        cbar_ax = fig.add_axes([0.91, .1, .03, .75])
        df_c = df[df['name']=='CFP']
        for name,i in zip(['RFP', 'YFP'], np.arange(0,2)):
            df_x = df[df['name']==name].copy(deep=True)
            df_heatmap = df_x.pivot_table(values='value',
                                            index='rfp_tu',
                                            columns='yfp_tu', aggfunc=np.mean)
            if name=='YFP':
                # Normalize columns to mean
                df_heatmap = df_heatmap / df_heatmap.mean()
            else:
                # Normalize rows to mean
                df_heatmap = df_heatmap.div( df_heatmap.mean(axis=1), axis=0 )
            
            # Take log of normalized values
            df_heatmap = df_heatmap.apply(np.log2)
                
            g = sns.heatmap(df_heatmap, annot=True, ax=ax[i], 
                        square=True, fmt='0.1f', cmap='bwr', 
                        center=0, vmin=-3., vmax=3., linewidths=1, linecolor='black',
                        cbar=(i==1), cbar_ax=cbar_ax)
            g.set_facecolor('gray')
            bottom, top = ax[i].get_ylim()
            ax[i].set_title(name)
            ax[i].set_ylim(bottom + 0.5, top - 0.5)
            ax[i].set_xlabel('YFP TU')
            if i==0:
                ax[i].set_ylabel('RFP TU')
            else:
                ax[i].set_ylabel('')
        #plt.tight_layout()
        plt.subplots_adjust(hspace=0.2)
        plt.suptitle(strain + ' in ' + media)
        plt.savefig('heatmap_rpu_tu_'+media+'_'+strain+'.png', dpi=300)

## Statistical analysis of context effects on gene expression

In [None]:
import statsmodels as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm


label_map = {
    'C(sample__media__name)': 'Media',
    'C(sample__strain__name)': 'Strain',
    'C(rfp_tu)': 'RFP TU',
    'C(yfp_tu)': 'YFP TU',
    'C(rfp_tu):C(yfp_tu)': 'RFP/YFP TU',
    'Residual': 'Other'
}

df = pd.read_json('phase_space_ratio_of_mean_expression_rates.json')

#fig,ax = plt.subplots(1,2, figsize=(6.7,2.5))
fig = make_subplots(1, 2, specs=[[{'type':'domain'}, {'type':'domain'}]])
i = 0
for name in ['RFP', 'YFP']:
    data = df[df['name']==name].dropna()
    #print(data.head())
    results = ols('value ~ C(sample__media__name) \
                  + C(sample__strain__name) \
                  + C(rfp_tu) + C(yfp_tu) \
                  + C(rfp_tu):C(yfp_tu)', data=data).fit()
    results.summary()

    aov_table = anova_lm(results, typ=2)

    ss_tot = aov_table['sum_sq'].sum()
    #print(ss_tot)

    aov_table['eta'] = aov_table['sum_sq'] / ss_tot *100

    print(aov_table)

    labels=[label_map[ind] for ind in aov_table.index]
    #ax[i].pie(aov_table['eta']) #, labels=labels)
    #plt.legend(labels)
    #ax[i].set_title(name)
    pie = go.Pie(labels=labels, values=aov_table['eta'])
    fig.add_trace(pie, row=1, col=i+1)
    i += 1
    
fig.update_traces(marker=dict(line=dict(color='#000000', width=1)))
fig = layout_print(fig, width=3.3, aspect=2.5, font_size=6.)
fig.show()


## Effect of context on gene expression time dynamics

In [None]:
session = FlapjackSession('http://localhost:8989', 'timrudge', 'chicken2019')
filter = { 
          'study': ['Context effects'],
          'degr': 0.03,
          'eps_L': 1e-7,
            'remove_data': False,
          'bg_correction': 2.,
          'normalize': 'temporal_mean',
          'averaging': 'mean',
          'density_name': 'OD',
            'groupby1': 'name',
            'groupby2': 'name',
            'groupby3': 'sample__dna__names'
         }

medias = [
    ['M9-glucosa'], 
    ['M9-glicerol']
]

strains = [
    ['MG1655z1'],
    ['Top10']
]

In [None]:
yfp_dnas = [
    [['pBFA'], ['pEFA'], ['pGFA']],
    [['pBDA'], ['pEDA'], ['pGDA']],
    [['pBCA'], ['pECA'], ['pGCA']],
    [['pAAA'], ['pBAA'], ['pEAA'], ['pGAA']]
]

# YFP figures
for media in medias:
    for strain in strains:
        print(media, strain)
        for dna in yfp_dnas:
            filter['dna'] = dna
            filter['media'] = media
            filter['strain'] = strain
            figs,s = session.plot_rate_direct(filter)
            yfp = figs['YFP']
            # Layout the figure better for our purposes
            layout_print(yfp, width=3.3/2, aspect=1.25, font_size=6.)
            yfp.update_yaxes(rangemode='tozero')
            yfp.update_yaxes(title='Expression rate')
            # Set the figure title
            yfp['layout']['annotations'][0]['text'] = 'YFP'
            yfp.show()
            fname = '-'.join([media[0], strain[0], dna[0][0][2], 'YFP.png'])
            io.write_image(yfp, fname, 'png')

In [None]:
rfp_dnas = [
    [['pBAA'], ['pBCA'], ['pBDA'], ['pBFA']],
    [['pEAA'], ['pECA'], ['pEDA'], ['pEFA']],
    [['pGAA'], ['pGCA'], ['pGDA'], ['pGEA'], ['pGFA']]
]

# RFP figures
for media in medias:
    for strain in strains:
        print(media, strain)
        for dna in rfp_dnas:
            filter['dna'] = dna
            filter['media'] = media
            filter['strain'] = strain
            figs,s = session.plot_rate_direct(filter)
            rfp = figs['RFP']
            # Layout the figure better for our purposes
            layout_print(rfp, width=3.3/2, aspect=1.25, font_size=6.)
            rfp.update_yaxes(title='Expression rate')
            rfp.update_yaxes(rangemode='tozero')
            # Set the figure title
            rfp['layout']['annotations'][0]['text'] = 'RFP'
            rfp.show()
            fname = '-'.join([media[0], strain[0], dna[0][0][1], 'RFP.png'])
            print(fname)
            io.write_image(rfp, fname, 'png')

In [None]:
# CFP figures
for media in medias:
    for strain in strains:
        print(media, strain)
        filter['dna'] = []
        filter['media'] = media
        filter['strain'] = strain
        figs,s = session.plot_rate_direct(filter)
        cfp = figs['CFP']
        # Layout the figure better for our purposes
        layout_print(cfp, width=3.3/2, aspect=1.25, font_size=6.)
        cfp.update_yaxes(title='Expression rate')
        cfp.update_yaxes(rangemode='tozero')
        #cfp.update_traces(showlegend=False)
        # Set the figure title
        cfp['layout']['annotations'][0]['text'] = 'CFP'
        cfp.show()
        fname = '-'.join([media[0], strain[0], 'CFP.png'])
        io.write_image(cfp, fname, 'png')