Import

This page describes import actions available under the file menu.

Content

File
- Import...
  - Peptide
  - KGML
  - Cross-references (navigates to Tools)
- Import Excel
- Import model-SEED
- Import SBML
- Import ENA Genome
Importing BioPax (via BioPax2SBML)

Importing SBML

If no reconstruction is open create a new reconstruction (see Creating Metabolic Entities)
Select the menu option File > Import SBML
Choose the file you which to import and click open
Configure any import options
Configure any unidentified compartments

Options

merge entries

Unidentified Compartments

Metingear will attempt to resolve the compartment names to one of several defined compartments. If the compartment can not be identified or is ambiguous a popup will interrupt the loading of the file and request the selection of the unidentified compartment.

Below is an example where a model has a compartment named cell which is too general for the current definitions. A list of possible compartments is displayed with appropriate GO Terms and in this case cytoplasm is selected.

Selecting SBML Compartments

Another common case would be in and out. For these simple cases is suffices to annotated each compartment as cytoplasm and extracellular.

Currently Metingear does not define a boundary compartment.

imported data

The following details how basic data is converted.

sbml:name is loaded as a metabolite/reaction name
sbml:id is loaded as a metabolite/reaction id
sbml:metaid is loaded as metabolite/reaction abbreviation
annotations with MIRIAM URNs (e.g. urn:miriam:kegg.compound:C00009) and identifiers.org URLs (e.g. http://identifiers.org/kegg.compound/C00009/) load as cross-references of the relavent species
annotations with InChIs from rdf.openmolecules.net (e.g. http://rdf.openmolecules.net?InChI=1/CH4/h1H4) are loaded as InChI annotations (link)
comments are loaded as Comment annotations

Importing Microsoft Excel Worksheets

Models/Networks in a worksheet are unstructured and thus additional knowledge of the data type and location is required. Metingear provides a dialog wizard to guide the user through the import process.

If no reconstruction is open create a new reconstruction (see Creating Metabolic Entities)
Select the menu option File > Import Excel
Choose the file you which to import and click open. Currently only .xls and not .xlsx files are supported.
Select which sheets list the reactions/metabolites - a guess is attempted but some worksheets may contain multiple reaction sheets (e.g. internal and exchange reactions).
Proceed to the next step
Select a range of continuos block of data (i.e. no blank links separator rows)
Select which columns in the reaction sheet refer to the predefined types
Proceed to the next step
Select a range of continuos block of data (i.e. no blank links separator rows)
Select which columns in the metabolite sheet refer to the predefined types
Proceed to the next step
Click finish to import the reconstruction

selection dialog

When the range is selected, only data between the given rows is imported. This is indicated in the preview at the bottom of the dialog as grey row (which will not be loaded).

Range

There are several other actions within the dialog. The follow diagram depicts action (red) and if there is a subtle result (green).

Wizzard

imported data

As the data is unstructured there may be more information in one reconstruction compared to another. At the core all that is required is a well formated reaction equation (see Creating Metabolic Entities) and unique metabolite identifiers. The following details a list of all importable data, required fields are marked in bold.

reaction table (basic)

equation describes the participants of a reaction either referring by metabolite abbreviation/name [required]
name loaded as the name of the reaction [optional]
abbreviation/id loaded as a reaction abbreviation [optional]
locus, subsystem, references loaded as direct annotation [optional]
classification is parsed and matched to either an Enzyme or Transport Classification number

metabolite table

abbreviation/id loaded as the metabolite abbreviation [required]
name loaded as the metabolite name [optional]
charge loaded as the formal charge annotation on the metabolite [optional]
molecular formula loaded as the molecular formula annotation on the metabolite
compartment sometimes the compartment is specified on the metabolite table - providing this option will override any identifier compartments within the reaction equation [optional]
KEGG, ChEBI and PubChem - loaded as cross-reference the specified resource

reaction table (advanced)

gibbs free energy/error loaded as a single annotation of the Gibbs Free Energy with an error range [optional]
direction indicates the direction of a reaction, only required if the direction is not specified in the reaction equation (e.g. a + b = c + d has no direction). If no direction can be identifier the direction is loaded as unknown [option]
flux bounds loads the lower/upper flux bounds of a reaction to appropriate annotations [optional]

Please note Gene and protein tables are not yet imported (planned) but if locus information on a reaction is provided then these links can resolved after import.

Importing KGML

If no reconstruction is open create a new reconstruction (see Creating Metabolic Entities)
Select the menu option File > Import > KGML
Choose the file you which to import and click open

imported data

The KGML file only provides the compound and reaction identifies. As such the compound identifier is loaded as the id, name, abbreviation and as a cross-reference. The cross-reference can then be used to transfer information form the compound entry. The reaction participants are preserved but as with the compound no human identifiable names are loaded.

Importing model-SEED Excel Spread Sheets

Reconstructions from the popular model-SEED can be imported without selecting where the data is. With an active project (see Creating Metabolic Entities) the menu item File > Import model-SEED (xls) will open a file chooser. Selecting the desired file will import all available data on metabolites and reactions.

Importing genomes from the European Nucleotide Archive (ENA)

Important: only complete genes are fully supported, if a genome is loaded which has genes split across multiple contigs and error may occur when it is loaded.

Genome data can be imported from ENA .xml files into an active project. These genomes can be downloaded from http://www.ebi.ac.uk/genomes/ and selecting the download option.

Downloading ENA XML

Metingear will import the genome sequence across the marked genes as well as the protein sequence. Any recognised cross-references (e.g. Uniprot, InterPro) are converted to cross-references on the relavent genes and proteins.

Importing FASTA format

FASTA formatted sequences are imported as empty proteins into an active project. The FASTA identifier (everything before the first space) is imported as the protein id and the rest is loaded as the name.

>id a longer description of the protein 
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFIL

where id is imported as the id and a longer description of the protein is loaded as the protein name. identifiers with resource prefixes are supported. That is gb|AAD44166.1 will be loaded as a GeneBank identifier.

Importing BioPax (via BioPax2SBML)

Metingear does not currently support BioPax import but using the online tool BioPax2SBML we can convert the owl file into something readable. Unfortunately the transition isn't seamless and there are a few hacks which need to be made.

This section will demonstrate how to import from BioPax using a fragment from Rhea - annotated reaction databases : (RHEA fragment download). With the file downloaded we unzip the owl file and upload it to BioPax2SBML

upload

once the file has been uploaded

upload-run

Select the convert tool

sbml-tools

convert-config

When the conversion has finished, download the file and uncompress it.

convert-download

We now have to modify the XML slightly, SBML version 3 does not support annotations (need to be loaded by a module) - unfortunately that module isn't complete yet so we now to manually downgrade the SBML version. You can still import Level 3 XML but there will be no annotations. With the SBML file open in a text editor change the <sbml tag.

Head of XML

<?xml version='1.0' encoding='UTF-8' standalone='no'?>
<!-- Created by BioPAX2SBML version 1.0 on 2013-04-11 17:39 with JSBML version 1.0-rc1. -->
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" qual:required="false" level="3" xmlns:qual="http://www.sbml.org/sbml/level3/version1/qual/version1" version="1">
  <model id="SId_83805498" name="/rahome/webservices/galaxy/database/files/003/dataset_3078_files/10000-11219-rhea-biopax_lite.owl" metaid="meta_SId_83805498" timeUnits="time" substanceUnits="substance" volumeUnits="volume">
...

original tag

<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" qual:required="false" level="3" xmlns:qual="http://www.sbml.org/sbml/level3/version1/qual/version1" version="1">

changed tag

<sbml xmlns="http://www.sbml.org/sbml/level2" qual:required="false" level="2" version="4">

the head of the file should now look like this

<?xml version='1.0' encoding='UTF-8' standalone='no'?>
<!-- Created by BioPAX2SBML version 1.0 on 2013-04-11 17:39 with JSBML version 1.0-rc1. -->
<sbml xmlns="http://www.sbml.org/sbml/level2" qual:required="false" level="2" version="4">
  <model id="SId_83805498" name="/rahome/webservices/galaxy/database/files/003/dataset_3078_files/10000-11219-rhea-biopax_lite.owl" metaid="meta_SId_83805498" timeUnits="time" substanceUnits="substance" volumeUnits="volume">
...

You can now import the file using Import SBML.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly