New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem using the built-in sampling #558

Closed
bkamino1 opened this Issue Jul 25, 2017 · 5 comments

Comments

Projects
None yet
2 participants
@bkamino1

bkamino1 commented Jul 25, 2017

I'm having an issue with using the built-in sampling in cobrapy v0.7.0. Upon sampling my model, I get identical values for all of the samples within each of my reactions of interest. I ran almost identical code (same reaction bounds, same model, same sampling parameters) on my computer using MATLAB and the optGpSampler code available here and received the desired result of varied values within my reactions of interest. I have tried using both the achr and optgp solving methods in cobrapy. The reason I would like to continue using the Python version of this software rather than sticking with the MATLAB version that works is that I would like to use the compute server that I have access to, but I had problems setting up optGpSampler on the server.

Code Sample

import cobra
import pandas

from cobra.flux_analysis.sampling import sample

fluxes = pandas.read_excel("Flux_Values.xlsx") #Load flux values from Excel sheet
model = cobra.io.load_matlab_model("iCHOv1_gimme_final.mat", variable_name="gimmeK1") #Load CHOK1 model

for i in xrange(18):
    rxn_id = str(fluxes["Reaction"][i])
    rxn = model.reactions.get_by_id(rxn_id)
    flux_val = fluxes["Exchange rate(mmol/gDW/hr)"][i]
    if i < 6: #First 6 reactions in the Excel sheet are contraints
        rxn.lower_bound = flux_val
        rxn.upper_bound = flux_val
    else: #Last 12 reactions are the ones we want to sample
          #Set the fluxes to be +- 15% of the experimental seed value
        #We are sampling uptake rates, so all of the exchange reactions will have negative fluxes
        #Set bounds to be +- 15% of experimental seed value
        rxn.lower_bound = 1.15*flux_val
        rxn.upper_bound = .85*flux_val

model.objective = "biomass_cho_producing"

sModel = sample(model, 1500, thinning=1000, processes=4)  #Sample fluxes for model with 1500 returned points and 500 samples for each returned point. Uses 4 processes

sModel.to_excel('./outputs/output_new.xlsx') #Save flux values for later analysis
load('CELS201_6mr9dq_Additional_Files\cels201_mmc2_6mr9dq\iCHOv1_gimme_final.mat','gimmeK1'); %Load CHOK1 model
[bounds, rxnList, raw] = xlsread('CalculatedfluxValues-AMBICExp1.xlsx', 'B2:C19'); %This is the same excel sheet as in the python code, I just renamed it
for i = 1:length(rxnList)
    if (i < 7)
        gimmeK1 = changeRxnBounds(gimmeK1, rxnList(i), bounds(i), 'b');
    else
        gimmeK1 = changeRxnBounds(gimmeK1, rxnList(i), bounds(i)*1.15, 'l');
        gimmeK1 = changeRxnBounds(gimmeK1, rxnList(i), bounds(i)*.85, 'u');
    end
end

model = changeObjective(gimmeK1,'biomass_cho_producing');
sModel = optGpSampler(model,[],1.5e3,1e3,4,'glpk',0);

Actual Output

As described above, the Python code returns and excel sheet that has identical values for all of the samples within many reactions, including the reactions that I am trying to study.

Expected Output

I expect the sampled values within each reaction, other than the reactions that I have specifically constrained, to vary. This is the result I achieve with the MATLAB version of the code.

Output of cobra.show_versions()

System Information

OS Linux
OS-release 2.6.32-573.22.1.el6.x86_64
Python 2.7.10

Package Versions

pip 9.0.1
setuptools 28.8.0
cobra 0.7.0
future 0.16.0
swiglpk 1.4.3
optlang 1.2.1
ruamel.yaml 0.14.12
pandas 0.19.2
numpy 1.12.1
tabulate 0.7.7
scipy 0.19.0
matplotlib 1.5.3

@cdiener

This comment has been minimized.

Show comment
Hide comment
@cdiener

cdiener Jul 25, 2017

Member

I will be happy to help debug, but I will need a minimal reproducible example. Do you think you could append your final model (with all the applied constraints)? To save the model you could run the following just before the call to sample:

from cobra.io import to_json

with open("model.json", "w") as outfile:
    outfile.write(to_json(model))

The saved model (maybe zip it) can be posted here or you can mail it to me confidentially to mail [at] cdiener.com.

The OptGPSampler code you referenced does not support inhomogeneous problems natively, so basically you can not have equality constraints different from zero and sample efficiently. Have you checked whether the samples returned from that OptGPSampler respect your equality constraints for the first 6 reactions?

Member

cdiener commented Jul 25, 2017

I will be happy to help debug, but I will need a minimal reproducible example. Do you think you could append your final model (with all the applied constraints)? To save the model you could run the following just before the call to sample:

from cobra.io import to_json

with open("model.json", "w") as outfile:
    outfile.write(to_json(model))

The saved model (maybe zip it) can be posted here or you can mail it to me confidentially to mail [at] cdiener.com.

The OptGPSampler code you referenced does not support inhomogeneous problems natively, so basically you can not have equality constraints different from zero and sample efficiently. Have you checked whether the samples returned from that OptGPSampler respect your equality constraints for the first 6 reactions?

@bkamino1

This comment has been minimized.

Show comment
Hide comment
@bkamino1

bkamino1 Jul 25, 2017

I've sent my model to your email address. Thank you!

bkamino1 commented Jul 25, 2017

I've sent my model to your email address. Thank you!

@bkamino1

This comment has been minimized.

Show comment
Hide comment
@bkamino1

bkamino1 Jul 25, 2017

I just re-ran my sampling code in MATLAB and the optGpSampler is not respecting the constraints that I put.

bkamino1 commented Jul 25, 2017

I just re-ran my sampling code in MATLAB and the optGpSampler is not respecting the constraints that I put.

@cdiener

This comment has been minimized.

Show comment
Hide comment
@cdiener

cdiener Jul 25, 2017

Member

I can reproduce your problem with the cobrapy salmonella model. Working on a fix...

Member

cdiener commented Jul 25, 2017

I can reproduce your problem with the cobrapy salmonella model. Working on a fix...

@cdiener

This comment has been minimized.

Show comment
Hide comment
@cdiener

cdiener Jul 25, 2017

Member

Okay fixed now in #556. Helped me to simplify the inhomogeneous sampling a bit :D

For your model I get the following with the fix:

In [1]: from cobra.io import load_json_model

In [3]: mod = load_json_model("Downloads/model.json")

In [4]: from cobra.flux_analysis.sampling import OptGPSampler

In [5]: %time optgp = OptGPSampler(mod, 6)
CPU times: user 6min 21s, sys: 56.6 s, total: 7min 18s
Wall time: 2min 44s

In [6]: %time s = optgp.sample(100)
CPU times: user 15.3 ms, sys: 35.7 ms, total: 51 ms
Wall time: 1.16 s

In [7]: s.std().describe()
Out[7]: 
count    4.723000e+03
mean     4.009833e+01
std      8.795487e+01
min      0.000000e+00
25%      5.191518e-15
50%      4.732379e-04
75%      8.952801e-01
max      5.478859e+02
dtype: float64

# So you see the constraints were respected :)
In [9]: s.biomass_cho_producing.head()
Out[9]: 
0    0.0193
1    0.0193
2    0.0193
3    0.0193
4    0.0193
Name: biomass_cho_producing, dtype: float64

So now there is a lot of variance even with the constraints.

Member

cdiener commented Jul 25, 2017

Okay fixed now in #556. Helped me to simplify the inhomogeneous sampling a bit :D

For your model I get the following with the fix:

In [1]: from cobra.io import load_json_model

In [3]: mod = load_json_model("Downloads/model.json")

In [4]: from cobra.flux_analysis.sampling import OptGPSampler

In [5]: %time optgp = OptGPSampler(mod, 6)
CPU times: user 6min 21s, sys: 56.6 s, total: 7min 18s
Wall time: 2min 44s

In [6]: %time s = optgp.sample(100)
CPU times: user 15.3 ms, sys: 35.7 ms, total: 51 ms
Wall time: 1.16 s

In [7]: s.std().describe()
Out[7]: 
count    4.723000e+03
mean     4.009833e+01
std      8.795487e+01
min      0.000000e+00
25%      5.191518e-15
50%      4.732379e-04
75%      8.952801e-01
max      5.478859e+02
dtype: float64

# So you see the constraints were respected :)
In [9]: s.biomass_cho_producing.head()
Out[9]: 
0    0.0193
1    0.0193
2    0.0193
3    0.0193
4    0.0193
Name: biomass_cho_producing, dtype: float64

So now there is a lot of variance even with the constraints.

hredestig added a commit to cdiener/cobrapy that referenced this issue Jul 26, 2017

fix: reset bug and reduce memory footprint
* Bug fixes

Gives a putative solution for recursion limit errors when using
`multiprocessing.Pool`. Apparently arguments can not be a `zip`
object. No idea why though...

Fixes bugs related to inhomogeneous sampling. Turns out the reprojection
strategy was overly complicated and did not work well since it spans
very little of the space. As a consequence inhomogeneous sampling often
returned identical samples (fixes opencobra#558). Using `delta = any_warmup -
center` as new direction already works for inhomogeneous problems since
`S*(any_warmup - center) = b - b = 0`. This spans the search space much
better and gives a much better distribution.

* Minor fixes

The sampling problem now stores the nullspace rather than the entire
projection (N * N.T) which reduces the memory footprint since it uses
memory in the order of `2 * n_reaction * n_metabolites` instead of `4 *
n_reactions^2`. Does not seem to influence sampling speed at all.

Log some minor issues the sampler resolves automatically as info for
better debugging.

opencobra#556

hredestig added a commit to cdiener/cobrapy that referenced this issue Jul 26, 2017

fix: inhomogeneous sampling
* Bug fixes

Gives a putative solution for recursion limit errors when using
`multiprocessing.Pool`. Apparently arguments can not be a `zip`
object. No idea why though...

Fixes bugs related to inhomogeneous sampling. Turns out the reprojection
strategy was overly complicated and did not work well since it spans
very little of the space. As a consequence inhomogeneous sampling often
returned identical samples (fixes opencobra#558). Using `delta = any_warmup -
center` as new direction already works for inhomogeneous problems since
`S*(any_warmup - center) = b - b = 0`. This spans the search space much
better and gives a much better distribution.

* Minor fixes

The sampling problem now stores the nullspace rather than the entire
projection (N * N.T) which reduces the memory footprint since it uses
memory in the order of `2 * n_reaction * n_metabolites` instead of `4 *
n_reactions^2`. Does not seem to influence sampling speed at all.

Log some minor issues the sampler resolves automatically as info for
better debugging.

opencobra#556

hredestig added a commit that referenced this issue Jul 26, 2017

fix: reset bug and reduce memory footprint
* Bug fixes

Gives a putative solution for recursion limit errors when using
`multiprocessing.Pool`. Apparently arguments can not be a `zip`
object. No idea why though...

Fixes bugs related to inhomogeneous sampling. Turns out the reprojection
strategy was overly complicated and did not work well since it spans
very little of the space. As a consequence inhomogeneous sampling often
returned identical samples (fixes #558). Using `delta = any_warmup -
center` as new direction already works for inhomogeneous problems since
`S*(any_warmup - center) = b - b = 0`. This spans the search space much
better and gives a much better distribution.

* Minor fixes

The sampling problem now stores the nullspace rather than the entire
projection (N * N.T) which reduces the memory footprint since it uses
memory in the order of `2 * n_reaction * n_metabolites` instead of `4 *
n_reactions^2`. Does not seem to influence sampling speed at all.

Log some minor issues the sampler resolves automatically as info for
better debugging.

#556

hredestig added a commit that referenced this issue Jul 26, 2017

fix: inhomogeneous sampling
* Bug fixes

Gives a putative solution for recursion limit errors when using
`multiprocessing.Pool`. Apparently arguments can not be a `zip`
object. No idea why though...

Fixes bugs related to inhomogeneous sampling. Turns out the reprojection
strategy was overly complicated and did not work well since it spans
very little of the space. As a consequence inhomogeneous sampling often
returned identical samples (fixes #558). Using `delta = any_warmup -
center` as new direction already works for inhomogeneous problems since
`S*(any_warmup - center) = b - b = 0`. This spans the search space much
better and gives a much better distribution.

* Minor fixes

The sampling problem now stores the nullspace rather than the entire
projection (N * N.T) which reduces the memory footprint since it uses
memory in the order of `2 * n_reaction * n_metabolites` instead of `4 *
n_reactions^2`. Does not seem to influence sampling speed at all.

Log some minor issues the sampler resolves automatically as info for
better debugging.

#556
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment