# 研究长期进化实验中柠檬酸盐的进化利用
在此，我们演示如何使用COMETS研究特定突变对实验进化过程中种群动态的影响。为此，我们将借鉴一项最著名的进化实验——大肠杆菌长期进化实验[E. coli Long Term Evolution Experiment]。

经过大约33000代之后，在大肠杆菌长期进化实验的一个复制群体（Ara-3）中观察到了大规模种群扩张现象（Blount等人，2008年）。这种种群扩张与两个关键突变有关，这两个突变共同促进了对空气中的柠檬酸强烈利用表型（Cit++表型）的进化。第一个突变（约在31000代时发生）导致了citT转运蛋白在有氧条件下的表达，产生了较弱的柠檬酸生长表型（Cit+，Blount等人，2012年）。随后的突变（约在33000代左右发生）引起了dctA蛋白的高度持续表达，这是一种质子驱动的二羧酸转运蛋白（Quandt等人，2014年）。由于这两个突变引入了已知的反应进入大肠杆菌代谢网络中，因此我们可以通过COMETS对其进行模拟。

此示例模拟采用了与(Bajic等人，2018年)中相同的大肠杆菌模型、参数和反应敲除策略。

首先，创建布局并模拟DM25培养基。

In [1]:
import cometspy as c
import matplotlib as plt
import cobra.io
import cobra
import pandas as pd
import numpy as np

layout = c.layout()

#Set up media to be DM25
layout.add_typical_trace_metabolites()
layout.set_specific_metabolite('glc__D_e', 0.000139)
layout.set_specific_metabolite('cit_e', 0.0017)

building empty layout model
models will need to be added with layout.add_model()


  self.media = pd.concat([self.media,


现在加载模型并构建突变体。为了展示与COBRApy工具箱的兼容性，我们将通过COBRA加载模型并对模型进行突变操作，然后将这些修改后的模型作为输入构建COMETS模型。

不同于LTEE（长期进化实验）的祖先菌株REL606（以及一般的大肠杆菌），虽然它们具有利用柠檬酸所必需的基因但在有氧条件下并不表达这些基因，而iJO1366模型默认情况下对柠檬酸和琥珀酸的利用反应是未加限制的。因此，要复现祖先表型，可以通过敲除三个反应：CITt7pp（编码citT的转运反应）、SUCCt2_2pp（编码dctA的转运反应）和SUCCt2_3pp（编码dcuA或dcuB的转运反应）。

In [2]:
# Load the E. coli iJO1366 model 
mod  = cobra.io.load_json_model("iJO1366.json")

# Set exchange reaction lower bounds to -1000 to allow them being controlled by COMETS 
for i in mod.reactions:
    if 'EX_' in i.id:
        i.lower_bound =-1000.0

#now create the mutants 
mod.reactions.SUCCt2_3pp.upper_bound=0.0
CitTdctA = mod.copy()
mod.reactions.SUCCt2_2pp.upper_bound =0.0
CitT = mod.copy()
mod.reactions.CITt7pp.upper_bound =0.0
WT = mod.copy()
WT.id= 'Ancestor'
CitT.id = 'Cit+'
CitTdctA.id = 'Cit++'

# Generate comets models and set their initial population size
p = c.model(WT)
p.initial_pop = [0, 0, 3.9e-11] # We'll introduce genotypes 100 cells at a time to avoid the risk of them going extinct through drift
p2 = c.model(CitT)
p2.initial_pop = [0, 0, 0] # not present at start
p3 = c.model(CitTdctA)
p3.initial_pop = [0, 0, 0] # not present at start

# Add the models to the simulation 
layout.add_model(p)
layout.add_model(p2)
layout.add_model(p3)

Read LP format model from file C:\Users\99374\AppData\Local\Temp\tmpzx5t_7kr.lp
Reading time = 0.03 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
Read LP format model from file C:\Users\99374\AppData\Local\Temp\tmpzj34sgvv.lp
Reading time = 0.03 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
Read LP format model from file C:\Users\99374\AppData\Local\Temp\tmpq1l3ts57.lp
Reading time = 0.03 seconds
: 1805 rows, 5166 columns, 20366 nonzeros


  self.smat = pd.concat([self.smat, cdf])
  self.objective = [int(self.reactions[self.reactions.
  self.smat = pd.concat([self.smat, cdf])
  self.objective = [int(self.reactions[self.reactions.
  self.smat = pd.concat([self.smat, cdf])
  self.objective = [int(self.reactions[self.reactions.


设置模拟的参数。我们使用 1 小时作为 COMETS 时间步长来加速模拟。将其缩短到更常用的 0.1 小时不会对最终结果产生重大影响，但确实会显着增加运行此模拟所需的时间。

In [3]:
### Setting paramaters for the simulation ###
b_params = c.params()
b_params.all_params['timeStep'] = 1.0 
b_params.all_params['deathRate'] = 0.01
b_params.all_params['batchDilution'] =True
b_params.all_params['dilTime'] =24
b_params.all_params['dilFactor'] =100
b_params.all_params['cellSize']= 3.9e-13 #Size of a single cell
b_params.all_params['minSpaceBiomass'] = 3.8e-13

执行模拟。我们将整个模拟过程分为三个独立的COMETS运行阶段。从第25000代开始模拟，并运行大约6000代。在大约第31000代时，我们引入CitT基因型，并继续运行约2000代。最后，在大约第33000代时，我们引入CitTdctA基因型，并进行最后6000代的模拟。每次运行时，我们都将以前一次运行结束时的生物量组成作为新的起始条件。每一轮模拟中获得的生物量数据都会被分别存储在一个单独的数据框中，然后我们将这些数据框连接起来以便进行后续的综合分析。

In [5]:
cycles_per_day = 24.0/b_params.all_params['timeStep']

# We'll start at Generation 25,000.  and run for around 6000 generations
batch_test = c.comets(layout, b_params)
batch_test.parameters.all_params['maxCycles'] = int(900*cycles_per_day)
batch_test.run()
phase_1 = pd.DataFrame({'Ancestor' : batch_test.total_biomass.Ancestor/(3.9e-13),
                        'CitT' : batch_test.total_biomass['Cit+']/(3.9e-13),
                        'CitTdctA' : batch_test.total_biomass['Cit++']/(3.9e-13),
                        'Generations' : 6.67*(batch_test.total_biomass.cycle+1)/cycles_per_day + 25000})

# At roughly Generation 31,000 we introduce the CitT genotype and run for around 2000 Generations
batch_test.layout.models[0].initial_pop = [0, 0, float(batch_test.total_biomass.Ancestor.tail(1))]
batch_test.layout.models[1].initial_pop = [0, 0, 3.9e-11]
batch_test.layout.build_initial_pop()
batch_test.parameters.set_param('maxCycles', int(300*cycles_per_day))
batch_test.run()
phase_2 = pd.DataFrame({'Ancestor' : batch_test.total_biomass.Ancestor/(3.9e-13),
                        'CitT' : batch_test.total_biomass['Cit+']/(3.9e-13),
                        'CitTdctA' : batch_test.total_biomass['Cit++']/(3.9e-13),
                        'Generations' : 6.67*(batch_test.total_biomass.cycle)/cycles_per_day + max(phase_1.Generations)})

# At roughly Generation 33,000 we introduce the CitTdctA Genotype and run for a final 6000 generationr 
batch_test.layout.models[0].initial_pop = [0, 0, float(batch_test.total_biomass.Ancestor.tail(1))]
batch_test.layout.models[1].initial_pop = [0, 0, float(batch_test.total_biomass['Cit+'].tail(1))]
batch_test.layout.models[2].initial_pop = [0, 0, 3.9e-11]
batch_test.layout.build_initial_pop()
batch_test.parameters.set_param('maxCycles', int(900*cycles_per_day))
batch_test.run()
phase_3 = pd.DataFrame({'Ancestor' : batch_test.total_biomass.Ancestor/(3.9e-13),
                        'CitT' : batch_test.total_biomass['Cit+']/(3.9e-13),
                        'CitTdctA' : batch_test.total_biomass['Cit++']/(3.9e-13),
                        'Generations' : 6.67*(batch_test.total_biomass.cycle)/cycles_per_day + max(phase_2.Generations) })


Running COMETS simulation ...


TypeError: 'int' object is not subscriptable

将三次运行的所有结果组合在一起，并绘制随时间变化的固定相种群大小。

In [ ]:
#Remove the final timepoint from each phase
phase_1.drop(phase_1.tail(1).index, inplace=True)
phase_2.drop(phase_2.tail(1).index, inplace=True)
phase_3.drop(phase_3.tail(1).index, inplace=True)

final_df = pd.concat([phase_1,phase_2,phase_3])
final_df.reindex()
final_df = final_df[np.round((final_df.Generations - 25000) % 6.67,3) == 6.67]