Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json-schema for json export #125

Closed
zakandrewking opened this issue Sep 19, 2014 · 14 comments
Closed

json-schema for json export #125

zakandrewking opened this issue Sep 19, 2014 · 14 comments
Milestone

Comments

@zakandrewking
Copy link
Contributor

To ensure the long-term viability of exported JSON models, it would be good to define a schema for them. json-schema lets you define these easily, and validators are available in a number of languages, including Python and JavaScript.

Escher will use json-schema for defining the Escher map file format. We can also consider .escher and .cobra filename extensions to clarify which file is which.

I can help out with this.

@zakandrewking
Copy link
Contributor Author

For an update, I've decided not to use .escher filenames. Instead, every JSON map starts with an informative section, so you can see what the file is in a text editor. E.g.,

[ { "map_name": "my map", 
    "map_id": "6574684",  
    "map_description": "my long map description", 
    "homepage": "https://zakandrewking.github.io/escher", 
    "schema": "https://zakandrewking.github.io/escher/jsonschema/1-0-0#" },   
{ map data } ]

@aebrahim
Copy link
Member

aebrahim commented Oct 7, 2014

I think that makes sense.

(Also FYI your schema url doesn't resolve)

On Mon, Oct 6, 2014 at 2:52 PM, Zachary King notifications@github.com
wrote:

For an update, I've decided not to use .escher filenames. Instead,
every JSON map starts with an informative section, so you can see what the
file is in a text editor. E.g.,

[ { "map_name": "my map",
"map_id": "6574684",
"map_description": "my long map description",
"homepage": "https://zakandrewking.github.io/escher",
"schema": "https://zakandrewking.github.io/escher/jsonschema/1-0-0#" },
{ map data } ]


Reply to this email directly or view it on GitHub
#125 (comment).

@zakandrewking
Copy link
Contributor Author

Yeah. It isn’t live yet.

On Mon, Oct 6, 2014 at 6:20 PM, Ali Ebrahim notifications@github.com
wrote:

I think that makes sense.
(Also FYI your schema url doesn't resolve)
On Mon, Oct 6, 2014 at 2:52 PM, Zachary King notifications@github.com
wrote:

For an update, I've decided not to use .escher filenames. Instead,
every JSON map starts with an informative section, so you can see what the
file is in a text editor. E.g.,

[ { "map_name": "my map",
"map_id": "6574684",
"map_description": "my long map description",
"homepage": "https://zakandrewking.github.io/escher",
"schema": "https://zakandrewking.github.io/escher/jsonschema/1-0-0#" },
{ map data } ]


Reply to this email directly or view it on GitHub
#125 (comment).


Reply to this email directly or view it on GitHub:
#125 (comment)

@zakandrewking
Copy link
Contributor Author

Actually, the link WAS wrong. Here's a correct one:

https://zakandrewking.github.io/escher/escher/jsonschema/1-0-0#

@aebrahim
Copy link
Member

Speaking of this, can you tell me if I am on the right track with a spec file (still incomplete)

{
    "title": "cobra schema",
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "description": "short name for a model (i.e. iAF1260)"
        },
        "description": {
            "type": "string",
            "description": "i.e. A Model of Escherichia Coli K-12 MG1655"
        },
        "metabolites": {
            "type": "array",
            "items": {
                "id": {
                    "type": "string"
                },
                "annotation": {
                    "type": "object"
                },
                "charge": {
                    "type": "integer"
                },
            }
        },
        "reactions": {
            "type": "array",
            "items": {
                "id": {
                    "type": "string"
                }
                "lower_bound": {
                    "type": "float"
                }
            }
        }
    }
}

@zakandrewking
Copy link
Contributor Author

Yeah. That's the basic idea.

You might need the equivalents of these attributes on the top level for it to validate:

  "id": "https://zakandrewking.github.io/escher/escher/jsonschema/1-0-0#",
  "$schema": "http://json-schema.org/draft-04/schema#",

And for array, you need to wrap it in an "object", like this:

            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {
                        "type": "string"
                    },
                    "lower_bound": {
                        "type": "float"
                    }
                }
            }

@aebrahim
Copy link
Member

Noted. Thanks.

On Mon, Oct 20, 2014 at 12:55 PM, Zachary King notifications@github.com
wrote:

Yeah. That's the basic idea.

You might need the equivalents of these attributes on the top level for it
to validate:

"id": "https://zakandrewking.github.io/escher/escher/jsonschema/1-0-0#",
"$schema": "http://json-schema.org/draft-04/schema#",

And for array, you need to wrap it in an "object", like this:

        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "id": {
                    "type": "string"
                }
                "lower_bound": {
                    "type": "float"
                }
            }


Reply to this email directly or view it on GitHub
#125 (comment).

@phantomas1234
Copy link
Contributor

Ok, regarding #170, why would you have an annotation attribute for metabolites and not for reactions? SBML would also be ok but much harder to implement I guess. I generated a couple of universal models from metanetx and using cobrapy's reaction.annotation was pretty useful in that respect. Is there any specific reason to not include this in the JSON spec?

@aebrahim
Copy link
Member

There isn't really a good reason not to include it in the JSON spec, other than "laziness." If it makes your life easier, we can include this in the JSON format (from the cobrapy perspective, not sure about the escher perpsective).

I brought up SBML for storage of these efforts for a kind of different reason: in my quest to make cobrapy more "maintainable", I wanted to focus on specific use cases for each of the supported I/O formats to avoid duplication of effort and features:

SBML - general purpose storage of models
JSON - working with escher
mat - working with cobra toolbox (until it supports SBML3+fbc2)
pickle - specific cobrapy-only use cases

If we go with this, it would make sense for SBML to be the places where features like this get supported. While working with SBML files and all their various flavors is often a pain (and getting support for features we need will probably take way too much time and effort), the unfortunate fact is that it is the standard for modeling, and cobra should probably "play ball" and use it.

I'd like to know your general thoughts on this, @zakandrewking and @phantomas1234

@phantomas1234 What are your use cases for reaction.annotation? Was it something like reaction.annotation["metanetx_id"] = 10343? Adding support for something like that to SBML shouldn't be too hard (and something I was planning on implementing anyways), but for arbitrary expressions you are right that SBML support will be much harder.

@phantomas1234
Copy link
Contributor

Laziness doesn't really count though because I served you the PR on a golden platter 😃

@aebrahim I guess I was not aware that the JSON schema was primarily intended for escher (given that bigg.ucsd.edu currently doesn't serve SBML via the API I thought the JSON models would be a good way to serialize models). In principle I agree that SBML should be the standard for storing models (if they can be read fast enough). pickles are just too large and too slow to read. I found the JSON files to be very convenient for non-sharing serialization. Anyways, I just hacked my way around it by adding annotation myself to _to_dict(model) for the few models I wanted to store. I will try writing them to SBML+FBC and test your parsers. Doesn't really solve the problem with the annotations though ... I don't think it would be too hard to translate the dict content of reaction.annotation into some well formatted xml that can be placed into the SBML annotation tags (see screenshot). Andreas D. would be a good person to ask for how to proceed.
image

@aebrahim
Copy link
Member

Yeah I'm just that lazy :) In all seriousness though, Zak and I have talked about updating the JSON format anyways, and I'd like to do them at the same time (so the format only has to change once), so this wouldn't get merged until that is finalized.

SBML I/O is now a lot faster than it used to be with libsbml (by a factor of 2-3x for large models like iJO1366).

And yes, I definitely intend to consult Andreas on how to do this for SBML. Can you send me an example of the type of annotation you have with your models?

@aebrahim
Copy link
Member

Also, I don't think the new bigg serves JSON models via the api. They're treated just like the SBML models (static exported files).

@zakandrewking
Copy link
Contributor Author

Some comments:

  • JSON will be useful for Escher, and also any other web applications that rely on Escher or BiGG. JSON is going to be far easier to work with on the web.
  • BiGG does serve JSON right now in the api (e.g. http://bigg.ucsd.edu/api/v2/models/iJO1366/download)
  • Users will expect that the JSON format includes the content from all attributes for reaction, metabolite, and gene objects in the COBRApy model. If we ignore any of these attributes, people (a.k.a. me) will get confused. Are there any attributes we currently ignore in the export?

@aebrahim
Copy link
Member

OK. So looks like this should definitely get merged.

On Tue, Jul 21, 2015 at 6:22 PM, Zachary A. King notifications@github.com
wrote:

Some comments:

  • JSON will be useful for Escher, and also any other web applications
    that rely on Escher or BiGG. JSON is going to be far easier to work with on
    the web.
  • BiGG does serve JSON right now in the api (e.g.
    http://bigg.ucsd.edu/api/v2/models/iJO1366/download)
  • Users will expect that the JSON format includes the content from all
    attributes for reaction, metabolite, and gene objects in the COBRApy model.
    If we ignore any of these attributes, people (a.k.a. me) will get confused.
    Are there any attributes we currently ignore in the export?


Reply to this email directly or view it on GitHub
#125 (comment).

aebrahim added a commit to aebrahim/cobrapy that referenced this issue Jul 27, 2015
The defined format changed a few things in a compatible way by marking
some attributes as optional.

This fixes opencobra#125
aebrahim added a commit to aebrahim/cobrapy that referenced this issue Jul 28, 2015
The defined format changed a few things in a compatible way by marking
some attributes as optional.

This fixes opencobra#125 and replaces opencobra#170
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants