-
Notifications
You must be signed in to change notification settings - Fork 3
Tools
This manual details the Tools
menu items and their usage.
- Annotation
- Select Choke Points
-
Merge Loci(use Associate Reactions to Gene Products) - Remove Worst Structures
- Sequence Homology
- Transfer Functional Annotations
- Create Stoichiometric Matrix
- Structure
- Associate
- Merge
- File
Tools > Annotation > Automatic Cross-reference
active when one or more metabolites are selected
This tool will attempt to match a metabolite name to one of several prioritised resources. The tool proceeds in two parts. Firstly, a search is performed to gather candidate matches. The candidate matches are retrieved from the resources in the order shown by Resource Priority. To maximise this search stage an approximate match can be performed. The approximate match will recall more candidates in the search stage but can take considerably longer. Secondly, when the candidates have be found the direct match is attempted between the names. If no exact match is found the names are normalised *. Only if the names have no difference after normalisation are they considered a match. Any matched entries are annotated as cross-references on the selected metabolite.
Resource Priority - indicates the order in which resources will be search. For example in our image if we prioritised the resources as ChEBI, KEGG Compound and HMDB. If we found a match in ChEBI we would not search for matches in KEGG Compound or HMDB. The ordering can be changed by dragging and dropping the names of resources. Resources can also be removed from the list by clicking the trash icon. If you delete a resource accidentally use the refresh symbol to repopulate the list. If no resources are visible you may need to load entries from one of several databases (see Resources) or use the web service option.
Greedy Mode - continue to find matches even if we found match in our highest priority resource. For our example dialog if a match was found in ChEBI searches would also be performed on HMDB and KEGG Compound. All matched references will be annotated on the selected metabolite.
Approximate Match - in the query stage more candidate names are retrieved. Due to some search heuristics this may find more matches.
Allow Web Services - this option is disabled by default. Metingear can use ChEBI and KEGG Compound web services to perform the search stage if no Resources are loaded. Due to the number of searches required it can take a long time to complete the operation for many metabolites.
\* to normalise the names they are converted to lower case and have had _chemical punctuation_ and spaces removed. The removal of _chemical punctuation_ allows matching of names such as `1-butene` and `1,butene`. Both of these would be converted to `1butene`. ## Assign Flag AnnotationsTool > Annotation > Assign Flag Annotations
requires one or more selected entities
An annotation flag is a stateless logical annotation, that is to say it is either pressent on an entity or absent but does not have a value. These flags serve as markers of special entities in a model. Flags can be assigned manually or can inferred by a given condition. Selecting this menu item will check all selected entities for each flag condition and assign the flag if the condition is met.
Below a list of the current flags outlines their meaning and which condition would result in this flag being assign.
-
Lumped - this flag indicates metabolites which are lumped artificial metabolites. An example would be a metabolite like 'Protein Product' of 'average Fatty-Acid Composition'. This entities are an artefact of the granularity of the model and cannot be assigned chemical structure. Usually these entries occur in or near the biomass equation. Lumped metabolites are assigned if a metabolite as a molecular formula and there are more then 500 atoms pressent.
-
ACP Associated - this flag indicates metabolites which are associated with the Acyl-Carrier-Protein. These metabolites can be ubiquitous in metabolic models and can not be assigned a complete structure. Although a generic structure which annotates the 'ACP' as an alias atom can be used it is often useful to treat such entities separately. If the metabolite name contains the letters
ACP
in uppercase this flag is assigned.
Tools > Annotation > Transfer Chemical Structure from Cross-references
requires one or more selected metabolites
This utility will attach chemical structures to selected metabolite. The chemical structures are assign from the cross-references, as with the Automatic Cross-reference resources can be prioritised.
With one or more metabolites selected the menu item Tools > Annotation > Transfer Chemical Structure
will open a dialog with several options.
Allow Web-services - allow the use of web-services to retrieve structures. If not selected one or more resources must be loaded (see. resources) to use the utility. If no resources are visible in the Resource Priority then the utility will not work.
Greedy Mode - if selected, the chemical structures for all cross-references will be fetched. If de-selected only one structure will be fetched for each metabolite.
Resource Priority - indicates the order in which resources will be search. For example in our image if we prioritised the resources as ChEBI, KEGG Compound and HMDB. If we found a match in ChEBI we would not search for matches in KEGG Compound or HMDB. The ordering can be changed by dragging and dropping the names of resources. Resources can also be removed from the list by clicking the trash icon. If you delete a resource accidentally use the refresh symbol to repopulate the list. If no resources are visible you may need to load entries from one of several databases (see Resources) or use the web service option.
In this dialog only resources which are referenced in the selected metabolites and are loaded or have a web-service available will be listed. With a set of metabolites which contain references ChEBI and KEGG Compound if ChEBI is loaded as a Resource) but KEGG-Compound is not then only ChEBI will appear in the list. If we have metabolites that contain cross-references to HMDB, KEGG Compound, ChEBI, PubChem, and MetaCyc then the list will contain all resources (providing all resources are loaded and web-services is selected for PubChem).
## Curate MetaboliteTools > Annotation > Curate Metabolite
requires a selection of one or more metabolites
This tools simplifies curation of chemical structure providing several more manual methods to attach structures. The curation can be performed in batch where by you can skip
to the next selected metabolite or skip all
.
- Database Search - Search a number of resources and manually choose the best candidate cross-reference
- Assign Structure - Directly attach a chemical structure from MDL Mol, CML, InChi and SMILES
- Generate Peptide - Generate the desired dipeptide structure
- Manual Cross-reference - Manually add a cross-reference
- Web Search - Search the internet for the given name (useful for copying InChI/SMILES records across)
This widget allows a user to manually select the best entry from searching multiple databases.
The top of the widget displays the current match (selected entry in table on left, or first entry). The match indication indicates how well the name, formula and charge of the entity (left) matches the candidate (right). The match follows a traffic light scheme of green -> good, orange -> okay and red -> bad. It may show a matching metabolite with an exact name match but a different formula and protonation stated. This is is the case in the example. The formula of 4-Phospho-L-aspartate
is orange as it matches when the charge difference is considered. If the formula is completely different then it is coloured red. In this case there is a better match with CHEBI:57535
. You can check this match by selecting it in the table.
To assign this cross-reference you must click the Assign
button bellow the resource list. This allows multiple cross-references to be assigned.
By default the search will only search the resource at the top of the list. You can change which resource is searched by reordering the list. If we drag KEGG Compound to the top we now have one candidate. Note you may need to reselect the text field above the resource list to trigger a research.
In this case it looks like the ChEBI entry would fit better. We can also search KEGG for an approximate match by selecting this option. This will produce a lot more candidates. As you can see selecting one which is not right will turn all match indicators red.
The text field above the resource list is the actual query being search. Sometimes an entity may have a typo in the name in which case it is beneficial to change the query however in most cases this isn't needed.
#### Assign StructureThe second widget allows you attach a structure to the metabolite in various formats. The format will be auto detected but it can also be selected in the top left.
The InChI and SMILES line notation are easily be selected from chemical databases or Wikipedia ChemBox.
It is also possible to paste MDL Mol or CML input. This input can obtain from chemical structure editors such as MarvinSketch (ChemAxon) or JChemPaint. When editing a structure in MarvinSketch the default copy format is CML which can be directly pasted into the text area.
MDL Mol and CML are preferable to SMILES/InChI as they maintain the structure diagram (i.e. atom coordinates). See Generate Structure Diagram on generate structure diagrams when one is not available.
#### Generate PeptideThis curation section allows you to generate a poly-peptide for a selected metabolite. For the example below the name ser-ser-ser-ser
was provided. You can see under the Generate Peptide
section the form has been filled out with the correct amino-acid residues. By default the L
stereo form will be selected.
You can add a residue at any position by clicking the green (+) icon. With a new residue added, we can select the type of residue.
You can also remove residues by clicking the red (-) icon. When you have configured you peptide click Okay
at the bottom right of the dialog.
A structure for you peptide chain will be generated and added to the selected entry.
#### Manual Cross-referenceThis widget allows you manually specify cross-references. The input is identical to how cross-references are entered in the metabolite table (see Editing).
#### Web SearchThis widget launches an internet search in your default web browser and is useful when combined with the previous widget to manually enter a cross-reference. By default the search is restricted to common chemical database sites. These sites are: ebi.ac.uk/chebi, pubchem.ncbi.nlm.nih.gov, metacyc.org, ebi.ac.uk/chembl, hmdb.ca, molecular-networks.com/biopath and chemspider.com
Clicking Search
will launch a new page/tab in you default browser.
Tools > Annotation > Extract cross-references from notes
requires a selection of one or more metabolic entities
This tool will try to match cross-references from Note annotations. As an example, we have imported an entry from SBML which has a ChEBI identifier specified in the comments of a species. This has been loaded as a Note in Metingear.
The note reads ChEBI id: CHEBI:15422
. To extract this cross-reference a regular expression is used to match the database indicator, separator and the actual accession. In this cases the database indicator is ChEBI id
, the separator is :
and the accession is CHEBI:15422
. In future it will be possible to customise each part but at pressent this is not possible. Running the tool on the above example would correctly extract the ChEBI cross-reference.
Verify accession is valid This option will check the parsed accession is valid for the given database. For example consider a PubChem-Compound ID PubChem-Compound: 5957
- this is a valid identifier. The same document may however specify empty entries as follows PubChem-Compound: N/A
. In this case it is desirable to check whether N/A is a valid PubChem-Compound ID. As it is not valid is not added as a cross-reference.
In newer versions it is also possible to use this dialog to extract cross-references from the identifier, name and abbreviation attributes. Change the tab of the dialog to 'Extract from Id, Name or Abrv' and choose which attribute you which to convert and to what cross-reference it should set. This is used when imported entities contain database identifiers as the names of the compounds, you can use this tool to add these name as a cross-reference to a specific resource. This then allows other tools which require a cross-reference to be used on the entity.
## Extract Textual Annotations from NotesTools > Annotation > Extract text annotations from notes
requires a selection of one or more metabolic entities
This tool allows you to extract information from comments and notes using regular expressions. As an example we may have the following entry imported from an SBML file.
The formula was specified as a comment in the SBML - we can easily extract the formula from this comment by selecting the correct matcher.
Selecting the Molecular Formula annotation will display the regular expression pattern which will be used.
Pattern - A regular expression pattern with a single set of capturing parathesis. To completely take the whole string from a note and load it into the selected annotation one may use (.+)
. It is also possible to specify specific matches such as Locus:\\s+(.+)
to match the string 'Locus:' and capture everything after.
Case Insensitive - whether the pattern is case insensitive (on by default). This allows a pattern like Locus:\\s+(.+)
to match two notes which only differ by case; Locus: B231
and locus: B241
. If this is option is not selected only the first note would be matched.
Remove After Match - whether the matched Note is removed after a match is successful. This is useful to avoid duplicate information, however leaving the note intact allows one to see where an annotation might have come from.
## Rename from ResourceTools > Annotation > Rename from Resource
requires a selection of metabolites with cross-references
This tool allows you to rename metabolites that have cross-references assigned. This is particularly useful when importing a pathway or collection of reactions which is exclusively defined by identifiers (i.e. KEGG Compound) or when a metabolite has an incorrect name.
Begin by selecting the metabolites from the metabolite view.
With a selection, click the menu item Tools > Annotation > Rename from Resource
.
You can select whether to allow use of web-services and which resource to use. If no resources are selectable ensure the selected metabolites have cross-references and a resource is available. If no resource is available it may need loading (see Resources). Once the resource is selected click Okay
to rename the entries, this action can be undo.
Tools > Annotation > Expand UniProt Annotation
for proteins with a UniProt-KB cross-reference transfer additional annotation from UniProt
Requires the UniProt cross-references to be indexed: Resources/UniProt
This tool will allow you to expand out the annotations for an entity with a UniProt cross-reference. As an example UniProt entry P39594 has an Enzyme Nomenclature annotation EC 2.5.1.3. Such an annotation is useful for linking reactions to protein products and we can transfer from the UniProt entry on to our selected protein product.
For this example I have some genes and proteins preloaded from ENA (see Import/ENA).
With the gene product view active, select one or more protein products from the table.
With the entires selected, go to the menu item Tools > Annotation > Expand UniProt Annotations
. There are no configuration options so the tool will run straight away. When there is a large number of proteins selected there will be an indicator as to the estimated completion time. When the transfer is complete the entries will be updated with the annotations from the entry.
Tools > Select Choke Points
requires an active project and a selected set of reactions
A choke point is a reaction which uniquely consumes or produces a metabolite. From a selected set of reactions Metingear will identify which reactions in that set uniquely consume or produce a given metabolite. After selecting the menu item the choke points are selected in the current view and annotated as to which participant they uniquely consume or produce.
##This tool is deprecated in the most recent release, please use Associate Reactions to Gene Products.
Tools > Sequence Homology
_requires a selection of reactions with Locus
annotations and gene products whose primarily identifier matches the loci of the reactions _
This tools allows you to associate reactions with gene products using a reactions Locus
annotation. Our gene product table contains Proteins that have an accession which can be matched to a Locus
annotation.
Below you can see our reaction has a Locus
annotation that can be found in the gene products table (see. selecte entry above).
Selecting the menu item, Tools > Merge Loci
will associate this reaction with the gene product BG13302
shown here by it's name.
This menu item will inspect the chemical structure annotations of metabolites and remove those which score the worst. If there is no good or okay match to the specified formula and charge then no structures are removed. In the example below there is one good match and two okay matches. The two okay matches would be removed.
In this next example, there are three okay matches. None of these would be removed.
## Sequence Homology `Tools > Sequence Homology` _requires blast+ configured in preferences, an active project and a selected set of protein products_Configuring ncbi-blast+: See. preferences
This tool will run a local homology search on the selected protein sequences and attach the results the entity. From the Gene Products view select one or more proteins which you which to perform a homology search on. Selecting Tools > Sequence Homology
will display a dialog with several options.
Program: Not currently configurable (FASTA is planned)
Threads: The number of CPUs to use for the homology search. This option maps to the command line parameter threads
. By default this will be set to the number of available processors.
Expected Value: The expectation value (E) threshold for saving hits. This options maps to the command line parameter evalue
.
Database: The database to search for homology in. The list is fetched from the Blast database root (see preferences).
Max Results: Only return the top n hits. This option maps to both the num_descriptions
and num_alignments
command line options.
-Parse Alignments_: Specifies whether the actual alignment should be parsed or just the score. Parsing the alignment increases the save size both of the temporary search file and when saving a reconstruction.
Normally the only option which needs configuring is the the selection Database of the database. Once the database has been selected then clicking Okay will create a runnable task. A runnable task is a computation that does not run straight away, instead it can be run in the background. With the task built the side bar will update with the named tasked.
Clicking on the task will change to the Task View.
To begin all tasks start queued, to run all queued tasks select Run > Run Queued Tasks
.
When a task is running the sidebar will indicate a continuous progress icon.
The table will also indicate this.
When the task is completed all affected entities will be updated with their new annotations/observations.
With the local homology task complete you can view the local sequence homologies by clicking on one of the selected proteins.
The score of the alignment is visible by hovering the mouse the one of the alignments.
If the Parse Alignments option was selected then the actual sequence alignment is also displayed.
## Transfer Functional Annotations (experimental)Tools > Transfer Functional Annotations
requires UniProt annotations are loaded (see. resources[link!]) and a selection of one or more gene products
Database cross-references from locally aligned sequences can be transfered from UniProt entries. Currently this feature is experimental and will simply transfer all cross-references from all alignments.
Beside each transferred annotation is an evidence button. Clicking on the evidence button will indicate where that annotation was transferred from.
## Create Stoichiometric MatrixTools > Create Stoichiometric Matrix
requires reactions are pressent in the reconstruction
To create a matrix select one or more reactions from the reaction view. If no reactions are selected all reactions will be added to the matrix.
With a select set of reaction, click the menu item Tools > Create Stoichiometric Matrix
. A dialog will pop up and show the completed stoichiometric matrix.
This matrix can be exported to various outputs (see Export).
## Generate Structure DiagramWhen importing a chemical structure from line-notation (i.e. InChI or SMILES) the structure will not have any coordinates. This in turn will display the following messages is displayed.
To generate coordinates simply select the entry and the menu item Tools > Structure > Generate Structure Diagram
.
Note: care should be taken not to overwrite existing coordinates (default option) as the implementation of the Structure Diagram (from the Chemistry Development Kit) can clobber stereochemistry.
## Associate, Reaction to Gene ProductsTools > Associate > Reaction to Gene Products
associate reactions to their gene products (i.e. the enzyme encoding the reaction)
This dialog allows you to semantically link a reaction to a gene product and indicate that the gene product is the enzyme responsible for the specified reaction. The association is done by selecting which entity attribute will be used to find associations. As an example one could associate reactions with gene products by finding those which share the same enzyme classification.
In this mock example below there are two reactions rxn1
and rxn1
.
There are also three protein products p1
, p2
and p3
. The products p1
and p2
and enzyme classification (E.C.) number of 1.1.1.1
and 1.1.1.85
respectively.
We can link all protein products to their respective reaction using the dialog Tools > Associate > Reactions to Gene Products
.
We will associate these entities using the E.C.
annotation for both gene product and reaction.
Once completed you can now jump to the entities in each table by following the associations in the inspector. Here are the associations for p1
.
Tools > Merge > Metabolties
Merge all metabolites in the model which match a set condition. For manually merging by selection see: Edit/Merge
To demonstrate this tool we will use two Escherichia coli K-12 MG1655 pathways from KEGG.
-
eco00010.xml
- Glycolysis / Gluconeogenesis -
eco00020.xml
- Citrate cycle (TCA cycle)
The files were imported with (Import/KGML) providing 52
metabolites and 50
reactions. The metabolites were then renamed using Tools/Rename from Resource to replace the compound id with a recognisable name.
When we sort by name we can identify several metabolites which are referring to the same entity (e.g Acetyl-COA
, Pyruvate
).
Open the menu item Tools > Merge > Metabolites
. At the time of writing you can currently merge on three values; Identifier, Abbreviation or Name. In the near future there will also be support to merge by cross-reference or chemical structure. For now, select Name
to merge by and press Okay
.
The metabolites with matching names have now been merged into a single entry. There are now 42
metabolites instead of 52
, this merge can be undone with Edit > Undo
. It is important to note that this action will not remove duplicate reactions which result from the merge. These reactions however can be easily filtered out from a stoichiometric matrix by looking for identical columns.
File > Import... > Cross-references and Annotations
import cross-references and annotations from a tab separated value (.tsv
) or comma separated value (.csv
)
There are several options to configure when importing.
- Selected - the file to import from, to choose a file click on the folder icon to the right.
- Header - indicate whether the file you added contains a header row (column names).
- Separator - select 'Tabs' to import from
.tsv
or 'Commas' to import from.csv
. - Map With - indicate how you which to pair the cross-references in the file with those in your loaded reconstruction. You can pair cross-references using the entity identifiers, abbreviations or names.
- Destination - select one or more entity types to import the cross-references on to.
- Mode - how to handle the file
- Resource Inference - this will attempt to guess which resource the identifier is from. Identifiers such as
CHEBI:15422
andC00002
will be correctly picked up as ChEBI and KEGG Compound but identifiers such as5957
from PubChem-Compound will not be imported as it is ambiguous. In such a case the other two modes may be able to help. - Single Resource - this will import all identifiers as a single specified resource. If your file contains cross-references to a single resource then you can use this to specify what the identifiers are.
- Resource Mapping - if the file lists multiple resources these can be specified in the third column. Note the name of the resource should appear as listed in the MIRIAM Registry.
- Single Annotation - this will import the second column of a table as the specified annotation. This allows you to import one of the selected annotation types (e.g. Gibbs Energy, SMILES, InChI, Formula) onto the entires in the model.
- Resource Inference - this will attempt to guess which resource the identifier is from. Identifiers such as
Abbreviation | Pubchem-Compound |
atp | 5957 |
gdp | 6830 |
gtp | 6022 |
Abbreviation | Accession | Resource |
atp | META:ATP | BioCyc |
atp | CHEBI:15422 | ChEBI |
atp | C00002 | KEGG Compound |
atp | 5957 | PubChem-Compound |
gdp | 6830 | PubChem-Compound |
gtp | 6022 | PubChem-Compound |
R/1286739 | 5.2 ± 0.5 |
R/1286740 | 4.3 ± 0.3 |
R/1286740 | -3.1 ± 0.6 |