Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error writing to sbml without prefixing identifiers #883

Open
Karrenbelt opened this issue Jul 25, 2019 · 18 comments
Open

error writing to sbml without prefixing identifiers #883

Karrenbelt opened this issue Jul 25, 2019 · 18 comments
Assignees
Labels
SBML Related to reading and writing SBML models.

Comments

@Karrenbelt
Copy link

Karrenbelt commented Jul 25, 2019

Trying to prevent cobra from prefixing my identifiers when writing SBML, I got this (using the textbook model, model history error is unrelated):

cobra.io.write_sbml_model(model, 'test.xml', f_replace={})

Error encountered trying to <set model history>.
LibSBML error code -5: The object passed as an argument to the method is not of a type that is valid for the operation or kind of object involved. For example, handing an invalidly-constructed ASTNode to a method expecting an ASTNode will result in this error.
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/venv/lib/python3.6/site-packages/cobra/io/sbml.py", line 831, in write_sbml_model
    doc = _model_to_sbml(cobra_model, f_replace=f_replace, **kwargs)
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/venv/lib/python3.6/site-packages/cobra/io/sbml.py", line 1059, in _model_to_sbml
    gpa.setAssociation(gpr_new)
UnboundLocalError: local variable 'gpr_new' referenced before assignment

replacing gpr_new with gpr in these lines should fix this

@matthiaskoenig
Copy link
Contributor

matthiaskoenig commented Jul 25, 2019

I basically fixed this yesterday in the devel branch as part of other bugfixes ;). Please try the devel version.
Do in your virtualenv

(cobra) git clone https://github.com/opencobra/cobrapy.git
(cobra) cd cobrapy
(cobra) git checkout devel
(cobra) pip install -e .

This will be resolved with the next release.

@Karrenbelt
Copy link
Author

Karrenbelt commented Jul 25, 2019

it writes, but reading back it shows to be erroneous

cobra.io.read_sbml_model('test.xml')

Traceback (most recent call last):
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 210, in read_sbml_model
    **kwargs)
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 381, in _sbml_to_model
    sid = _check_required(specie, specie.getIdAttribute(), "id")
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 1194, in _check_required
    raise CobraSBMLError(msg)
cobra.io.sbml.CobraSBMLError: Required attribute 'id' cannot be found or parsed in '<Species "3-Phospho-D-glyceroyl phosphate">'. with name '3-Phospho-D-glyceroyl phosphate'
None
Traceback (most recent call last):
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 210, in read_sbml_model
    **kwargs)
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 381, in _sbml_to_model
    sid = _check_required(specie, specie.getIdAttribute(), "id")
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 1194, in _check_required
    raise CobraSBMLError(msg)
cobra.io.sbml.CobraSBMLError: Required attribute 'id' cannot be found or parsed in '<Species "3-Phospho-D-glyceroyl phosphate">'. with name '3-Phospho-D-glyceroyl phosphate'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 217, in read_sbml_model
    "Something went wrong reading the SBML model. Most likely the SBML"
cobra.io.sbml.CobraSBMLError: Something went wrong reading the SBML model. Most likely the SBML model is not valid. Please check that your model is valid using the `cobra.io.sbml.validate_sbml_model` function or via the online validator at http://sbml.org/validator .
	`(model, errors) = validate_sbml_model(filename)`
If the model is valid and cannot be read please open an issue at https://github.com/opencobra/cobrapy/issues .

same when using libsbml to read

doc = libsbml.readSBMLFromFile('test.xml')
doc.printErrors() # 16 in total

@Midnighter Midnighter reopened this Jul 25, 2019
@Midnighter Midnighter added the SBML Related to reading and writing SBML models. label Jul 25, 2019
@Midnighter
Copy link
Member

I think this is a different error and a duplicate of #879. Basically, the model you were using doesn't have correct SBML identifiers and those are dropped upon writing the model thus the model is broken on import.

@cdiener
Copy link
Member

cdiener commented Jul 25, 2019

Yeah, most common pitfall is that you have identifiers that start with numbers (common for metabolites) and this is not allowed in SBML (and is why we add a to IDs).

@matthiaskoenig
Copy link
Contributor

Not sure what to do here. There are clear error messages on writing the SBML that the identifiers are incorrect. We agreed on only reading valid SBML.
Only solution here is to change the invalid identifiers if you want to write them out.

We could be more drastic and just not create a SBML file at all on write errors like identifiers. Or restrict the allowed identifiers within cobrapy (nobody will like that most of all the bigg database).

Not sure what the issue is here. Please let me know if there is anything to fix.

@cdiener
Copy link
Member

cdiener commented Jul 26, 2019

I would say there is a slightly incorrect behavior in writing the initial SBML. It seems to have invalid identifiers but no error is given. Writing a SBML model with invalid identifiers should nope out and make it clear that this is not right.

@Midnighter
Copy link
Member

I agree, invalid identifiers is definitely an error that should not only be a logged error level message but actually raise an exception and not write the file.

@gregmedlock
Copy link
Member

Maybe we could steer users in the right direction with a function that checks SBML compliance prior to writing, so that what we end up putting in documentation is more like:

# check SBML compliance
cobra.io.check_sbml_model(model) # prints any compliance issues

# write the compliant SBML model
cobra.io.write_sbml_model(model)

check_sbml_model could even return a "validated model" that is the only acceptable input to write_sbml_model if we wanted to be really strict... check_sbml_model could also have output that reads more like a report than an error, telling the user what they need to fix to be compliant.

I think only writing fully valid SBML files, plus adding a checking function of some sort that we recommend use of prior to writing, is the way to go. For cases like BIGG, IMO identifiers should be changed on a case-by-case basis for SBML compliance and/or shared as json.

@matthiaskoenig matthiaskoenig self-assigned this Jul 29, 2019
@matthiaskoenig
Copy link
Contributor

My personal take on this is to just make the used ids SBML compliant, i.e., if somebody tries to set an id on a cobra model which starts with a number the user should get a warning/error and the replacement should be performed right there and then.

Having invalid SBML ids internally in cobra creates many issues, e.g.,

  • things like tab completion are not working (Prepend underscores for tab completion of IDs that start with a number #828)
  • mapping data on networks becomes cumbersome, because one has to keep track of two set of ids (one internally in cobrapy and working with python, a second set of ids as soon as using the SBML model in a different tool; this hampers interoperability; makes it complicated to do simple things like hashmap lookups, because one has to perform possible id replacements everywhere and so on)
  • SBML export becomes slower and much more complicated (because one has to keep track of id replacements all the way through the model)
  • default SBML export behavior becomes strange (why are ids replaced by default)? I look for my cobrapy id in my SBML model but cannot find it, because strange replacements were made.

I would prefer the following behavior:

  • enforce (or highly recommend the use of) SBML compliant ids internally in cobrapy
  • add a convenience function allowing to make all ids SBML compliant
  • SBML export with invalid SBML ids does raise an error
  • id replacement is not the norm on SBML export any more (but can be enabled if people want this, for instance if some tools expect M_, R_ or G_ prefixes)

@cdiener
Copy link
Member

cdiener commented Jul 29, 2019

@gregmedlock I think this woul be great but may be hard to implement given the sheer size of the SBML spec. The only way that seems simple would be to write the broken model and use a validator on the written SBML file (maybe @matthiaskoenig) knows if there is functionality for that in libSBML (something like the online validator).

@matthiaskoenig That sounds good. I think due to backwards compatibility we can't enforce those IDs but we could nudge users to use good ones (even though a warning may be too strong since non-SBML compliant IDs are valid in other export formats such as JSON and YAML). The rest sounds great too me (error on write and optional replacement).

@gregmedlock
Copy link
Member

@cdiener agreed--I should have been more specific. The proposed function would only check compliance for SBML aspects that are already checked when writing the file via libSBML (e.g., IDs).

The idea is that users should be free to modify IDs for their own use, but should hit a hard roadblock when they attempt to write SBML without making the IDs compliant. Of course we should encourage compliance from the beginning, but I think throwing errors when a user sets a non-compliant ID is going to scare people away from the package. In my experience, most people struggle to understand the dire need for SBML when they first start using COBRA tools. For that reason, I favor warnings over errors when non-compliant IDs are set (and when a model containing non-compliant IDs is read). E.g., the suggestions by @matthiaskoenig with strong preference for warnings over errors when IDs are first set/read.

@cdiener
Copy link
Member

cdiener commented Jul 30, 2019

@gregmedlock Very much agreed. To clarify I would only throw an error if the user enforces no prefixes and has invalid SBML IDs. The default would still be to add prefixes if the IDs are not compliant and just show a warning.

@akaviaLab
Copy link
Contributor

I ran across this problem or a similar one. What about implementing something similar to https://github.com/opencobra/cobratoolbox/blob/develop/src/base/io/utilities/SBML/convertSBMLID.m
This is needed for various reasons, but since both read and write SBML call it, the user can't even tell that anything is happening.
If this is okay with you guys, I can take a stab at it.

@matthiaskoenig
Copy link
Contributor

@akaviaLab This is happening right now. Code is part of the current behavior and is already implemented.

We are only talking about the very special use case in this issue of "someone has invalid identifiers but does not want the id replacements on writing SBML. What should we do in this case?"

@matthiaskoenig
Copy link
Contributor

@akaviaLab In addition, ID replacement is a nightmare for interoperability between tools. Whenever possible tools should highly encourage to use valid ids. It becomes very difficult to map datasets onto networks if the ids change between formats. E.g. I run a flux simulation in cobra and want to visualize this with the SBML network. The ids of the flux simulation do not match the ids of the SBML exchange format any more.

@Karrenbelt
Copy link
Author

naming conventions are more often a pain than they ought to be.
I agree with matthiaskoenig and cdiener, it is best to only throw an error in the special case someone does not want identifier replacement and thereby would create an invalid SBML model.

@akaviaLab
Copy link
Contributor

@matthiaskoenig - the code does not replace the GIDs when reading old-style SBML files that have the gene association in NOTES field, like Recon 2.2. See example file exampleModel.xml.gz, where if you read SBML and try to write SBML, you get an error because the gene IDs have ":".
I think the SBML needs a function like the MATLAB Cobratoolbox I linked to. It might work with GPRcleaner, but it would need to be extended to colons - I also think it is a good idea that the replacement is the same in Cobrapy and cobatoolbox.
If you'd like, I can open a different bug for this issue.

@Midnighter
Copy link
Member

If someone feels like mediating, coordinating with the Matlab folks to standardize on how identifiers are transformed could be a useful thing?

Other than that, I take from this issue that we need to:

  1. Do transformations for GPR read from notes.
  2. Raise an error upon writing invalid SBML.
  3. Anything else I missed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SBML Related to reading and writing SBML models.
Projects
None yet
Development

No branches or pull requests

6 participants