Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Problem using the built-in sampling #558
I'm having an issue with using the built-in sampling in cobrapy v0.7.0. Upon sampling my model, I get identical values for all of the samples within each of my reactions of interest. I ran almost identical code (same reaction bounds, same model, same sampling parameters) on my computer using MATLAB and the optGpSampler code available here and received the desired result of varied values within my reactions of interest. I have tried using both the achr and optgp solving methods in cobrapy. The reason I would like to continue using the Python version of this software rather than sticking with the MATLAB version that works is that I would like to use the compute server that I have access to, but I had problems setting up optGpSampler on the server.
import cobra import pandas from cobra.flux_analysis.sampling import sample fluxes = pandas.read_excel("Flux_Values.xlsx") #Load flux values from Excel sheet model = cobra.io.load_matlab_model("iCHOv1_gimme_final.mat", variable_name="gimmeK1") #Load CHOK1 model for i in xrange(18): rxn_id = str(fluxes["Reaction"][i]) rxn = model.reactions.get_by_id(rxn_id) flux_val = fluxes["Exchange rate(mmol/gDW/hr)"][i] if i < 6: #First 6 reactions in the Excel sheet are contraints rxn.lower_bound = flux_val rxn.upper_bound = flux_val else: #Last 12 reactions are the ones we want to sample #Set the fluxes to be +- 15% of the experimental seed value #We are sampling uptake rates, so all of the exchange reactions will have negative fluxes #Set bounds to be +- 15% of experimental seed value rxn.lower_bound = 1.15*flux_val rxn.upper_bound = .85*flux_val model.objective = "biomass_cho_producing" sModel = sample(model, 1500, thinning=1000, processes=4) #Sample fluxes for model with 1500 returned points and 500 samples for each returned point. Uses 4 processes sModel.to_excel('./outputs/output_new.xlsx') #Save flux values for later analysis
load('CELS201_6mr9dq_Additional_Files\cels201_mmc2_6mr9dq\iCHOv1_gimme_final.mat','gimmeK1'); %Load CHOK1 model [bounds, rxnList, raw] = xlsread('CalculatedfluxValues-AMBICExp1.xlsx', 'B2:C19'); %This is the same excel sheet as in the python code, I just renamed it for i = 1:length(rxnList) if (i < 7) gimmeK1 = changeRxnBounds(gimmeK1, rxnList(i), bounds(i), 'b'); else gimmeK1 = changeRxnBounds(gimmeK1, rxnList(i), bounds(i)*1.15, 'l'); gimmeK1 = changeRxnBounds(gimmeK1, rxnList(i), bounds(i)*.85, 'u'); end end model = changeObjective(gimmeK1,'biomass_cho_producing'); sModel = optGpSampler(model,,1.5e3,1e3,4,'glpk',0);
As described above, the Python code returns and excel sheet that has identical values for all of the samples within many reactions, including the reactions that I am trying to study.
I expect the sampled values within each reaction, other than the reactions that I have specifically constrained, to vary. This is the result I achieve with the MATLAB version of the code.
I will be happy to help debug, but I will need a minimal reproducible example. Do you think you could append your final model (with all the applied constraints)? To save the model you could run the following just before the call to
from cobra.io import to_json with open("model.json", "w") as outfile: outfile.write(to_json(model))
The saved model (maybe zip it) can be posted here or you can mail it to me confidentially to
The OptGPSampler code you referenced does not support inhomogeneous problems natively, so basically you can not have equality constraints different from zero and sample efficiently. Have you checked whether the samples returned from that OptGPSampler respect your equality constraints for the first 6 reactions?
referenced this issue
Jul 25, 2017
Okay fixed now in #556. Helped me to simplify the inhomogeneous sampling a bit :D
For your model I get the following with the fix:
In : from cobra.io import load_json_model In : mod = load_json_model("Downloads/model.json") In : from cobra.flux_analysis.sampling import OptGPSampler In : %time optgp = OptGPSampler(mod, 6) CPU times: user 6min 21s, sys: 56.6 s, total: 7min 18s Wall time: 2min 44s In : %time s = optgp.sample(100) CPU times: user 15.3 ms, sys: 35.7 ms, total: 51 ms Wall time: 1.16 s In : s.std().describe() Out: count 4.723000e+03 mean 4.009833e+01 std 8.795487e+01 min 0.000000e+00 25% 5.191518e-15 50% 4.732379e-04 75% 8.952801e-01 max 5.478859e+02 dtype: float64 # So you see the constraints were respected :) In : s.biomass_cho_producing.head() Out: 0 0.0193 1 0.0193 2 0.0193 3 0.0193 4 0.0193 Name: biomass_cho_producing, dtype: float64
So now there is a lot of variance even with the constraints.