# Correlation between (log) protein synthesis and (log) mRNA abundance

According to https://www.ncbi.nlm.nih.gov/pubmed/28365149:

>The overall correlation between mRNA and protein abundances across all conditions was low (0.46), but for differentially expressed proteins (n = 202), the median mRNA-protein correlation was 0.88.

Their Fig. 2A shows $R^2 = 0.45$.

We can only do synthesis rates.

#### Protein synthesis rates:

In [63]:
import pandas as pd
import pickle as pkl
import numpy as np

In [4]:
synthrates = pkl.load(open("../../parameters/prot_speeds.p"))

In [6]:
pd.Series(synthrates).describe()

count    4475.000000
mean        1.349305
std         7.847669
min         0.001149
25%         0.049425
50%         0.122989
75%         0.379763
max       193.922989
dtype: float64

In [12]:
pd.Series(synthrates).sum() * 3600

21737298.352131646

Interesting. So here we get 21.7 M proteins per hour

### Exkurs: compare with 11 M estimate

#### On average transcriptome:

In [21]:
init_rates_plotkin = pkl.load(open('../../parameters/init_rates_plotkin.p'))

In [22]:
pd.Series(init_rates_plotkin).describe()

count    4.839000e+03
mean     1.567727e-06
std      1.128263e-06
min      9.375766e-10
25%      8.320521e-07
50%      1.291872e-06
75%      1.962904e-06
max      1.440641e-05
dtype: float64

In [23]:
transcriptome = pkl.load(open('../../parameters/transcriptome_plotkin.p'))

In [26]:
sum(transcriptome.values())

60000

Sum of init rates weighted by transcript abundance:

In [33]:
initations_per_second = {gene: init_rates_plotkin[gene] * transcriptome[gene] for gene in transcriptome}

In [37]:
sum(initations_per_second.values())

0.21047256610692186

In [38]:
sum(initations_per_second.values()) * 0.16 * 200000

6735.122115421499

In [39]:
sum(initations_per_second.values()) * 0.16 * 200000 * 3600

24246439.615517396

#### On time-resolved transcriptome:

In [40]:
transcriptome_t = pkl.load(open('../../parameters/transcriptome_time_dependent_v2.p'))

In [43]:
pd.DataFrame(transcriptome_t).sum()

0     42123
5     42167
10    72667
15    72764
20    72928
25    27655
30    27355
35    27315
40    27222
45    27559
50    16741
55    32065
60    22535
dtype: int64

In [58]:
pd.DataFrame(transcriptome_t).sum().mean()

39315.07692307692

In [48]:
initations_per_second_per_phase = {t: {gene: init_rates_plotkin[gene] * transcriptome_t[t][gene] for gene in init_rates_plotkin if gene in transcriptome_t[t]} for t in transcriptome_t}

In [50]:
for t in transcriptome_t:
    print t, len(initations_per_second_per_phase[t])

0 4716
35 4716
5 4716
40 4716
10 4716
45 4716
15 4716
50 4716
20 4716
55 4716
25 4716
60 4716
30 4716


Ok, good enough. (120 genes were lost.)

In [54]:
total = 0

for t in sorted(transcriptome_t):
    print t, sum(initations_per_second_per_phase[t].values()) * 0.16 * 200000
    
    total += sum(initations_per_second_per_phase[t].values()) * 0.16 * 200000 * 5 * 60  # 5 minutes

0 2996.65882243
5 2959.11618101
10 5117.34159415
15 5110.98080077
20 5166.46340274
25 1935.34116088
30 1948.10801725
35 1946.47814169
40 1927.31369308
45 1986.97177212
50 1188.58638812
55 2276.13370683
60 1585.15075859


In [55]:
total

10843393.331895405

Mean $p_I$:

In [57]:
pI = 0

for t in sorted(transcriptome_t):
    print t, 1.0 * sum(initations_per_second_per_phase[t].values()) / sum(transcriptome_t[t].values())
    
    pI += 1.0 * sum(initations_per_second_per_phase[t].values()) / sum(transcriptome_t[t].values())

pI = pI / len(transcriptome_t)
print pI

0 2.22314621942e-06
5 2.19300354914e-06
10 2.20068153105e-06
15 2.19501607971e-06
20 2.21385450493e-06
25 2.18692501456e-06
30 2.22549353095e-06
35 2.22688786117e-06
40 2.21249551498e-06
45 2.2530885692e-06
50 2.21870405763e-06
55 2.21828093991e-06
60 2.19817888644e-06
2.21275048147e-06


In [59]:
orf_genomic_dict = pkl.load(open("../../parameters/orf_coding.p"))
orf_lengths = {prot: len(orf_genomic_dict[prot]) for prot in orf_genomic_dict}

In [62]:
orf_lengths = {gene: orf_lengths[gene] for gene in init_rates_plotkin if gene in orf_lengths} 

In [65]:
np.mean(orf_lengths.values()) / 3

523.4016115351993

Weighted by abundance:

In [66]:
weighted_orf_lengths = {gene: orf_lengths[gene] * transcriptome[gene] for gene in transcriptome if gene in orf_lengths}

In [72]:
np.mean(weighted_orf_lengths.values()) / np.mean(transcriptome.values())

1171.5209252226464