Skip to content
John May edited this page Apr 24, 2014 · 93 revisions

This manual details the Tools menu items and their usage.

## Automatic Cross-reference

Tools > Annotation > Automatic Cross-reference active when one or more metabolites are selected

This tool will attempt to match a metabolite name to one of several prioritised resources. The tool proceeds in two parts. Firstly, a search is performed to gather candidate matches. The candidate matches are retrieved from the resources in the order shown by Resource Priority. To maximise this search stage an approximate match can be performed. The approximate match will recall more candidates in the search stage but can take considerably longer. Secondly, when the candidates have be found the direct match is attempted between the names. If no exact match is found the names are normalised *. Only if the names have no difference after normalisation are they considered a match. Any matched entries are annotated as cross-references on the selected metabolite.

Automatic Cross-reference

Resource Priority - indicates the order in which resources will be search. For example in our image if we prioritised the resources as ChEBI, KEGG Compound and HMDB. If we found a match in ChEBI we would not search for matches in KEGG Compound or HMDB. The ordering can be changed by dragging and dropping the names of resources. Resources can also be removed from the list by clicking the trash icon. If you delete a resource accidentally use the refresh symbol to repopulate the list. If no resources are visible you may need to load entries from one of several databases (see Resources) or use the web service option.

Greedy Mode - continue to find matches even if we found match in our highest priority resource. For our example dialog if a match was found in ChEBI searches would also be performed on HMDB and KEGG Compound. All matched references will be annotated on the selected metabolite.

Approximate Match - in the query stage more candidate names are retrieved. Due to some search heuristics this may find more matches.

Allow Web Services - this option is disabled by default. Metingear can use ChEBI and KEGG Compound web services to perform the search stage if no Resources are loaded. Due to the number of searches required it can take a long time to complete the operation for many metabolites.

\* to normalise the names they are converted to lower case and have had _chemical punctuation_ and spaces removed. The removal of _chemical punctuation_ allows matching of names such as `1-butene` and `1,butene`. Both of these would be converted to `1butene`. ## Assign Flag Annotations

Tool > Annotation > Assign Flag Annotations requires one or more selected entities

An annotation flag is a stateless logical annotation, that is to say it is either pressent on an entity or absent but does not have a value. These flags serve as markers of special entities in a model. Flags can be assigned manually or can inferred by a given condition. Selecting this menu item will check all selected entities for each flag condition and assign the flag if the condition is met.

Below a list of the current flags outlines their meaning and which condition would result in this flag being assign.

  • Lumped - this flag indicates metabolites which are lumped artificial metabolites. An example would be a metabolite like 'Protein Product' of 'average Fatty-Acid Composition'. This entities are an artefact of the granularity of the model and cannot be assigned chemical structure. Usually these entries occur in or near the biomass equation. Lumped metabolites are assigned if a metabolite as a molecular formula and there are more then 500 atoms pressent.

  • ACP Associated - this flag indicates metabolites which are associated with the Acyl-Carrier-Protein. These metabolites can be ubiquitous in metabolic models and can not be assigned a complete structure. Although a generic structure which annotates the 'ACP' as an alias atom can be used it is often useful to treat such entities separately. If the metabolite name contains the letters ACP in uppercase this flag is assigned.

## Transfer Chemical Structure from Cross-references

Tools > Annotation > Transfer Chemical Structure from Cross-references requires one or more selected metabolites

This utility will attach chemical structures to selected metabolite. The chemical structures are assign from the cross-references, as with the Automatic Cross-reference resources can be prioritised.

With one or more metabolites selected the menu item Tools > Annotation > Transfer Chemical Structure will open a dialog with several options.

fetch-structures-chebi

Allow Web-services - allow the use of web-services to retrieve structures. If not selected one or more resources must be loaded (see. resources) to use the utility. If no resources are visible in the Resource Priority then the utility will not work.

Greedy Mode - if selected, the chemical structures for all cross-references will be fetched. If de-selected only one structure will be fetched for each metabolite.

Resource Priority - indicates the order in which resources will be search. For example in our image if we prioritised the resources as ChEBI, KEGG Compound and HMDB. If we found a match in ChEBI we would not search for matches in KEGG Compound or HMDB. The ordering can be changed by dragging and dropping the names of resources. Resources can also be removed from the list by clicking the trash icon. If you delete a resource accidentally use the refresh symbol to repopulate the list. If no resources are visible you may need to load entries from one of several databases (see Resources) or use the web service option.

In this dialog only resources which are referenced in the selected metabolites and are loaded or have a web-service available will be listed. With a set of metabolites which contain references ChEBI and KEGG Compound if ChEBI is loaded as a Resource) but KEGG-Compound is not then only ChEBI will appear in the list. If we have metabolites that contain cross-references to HMDB, KEGG Compound, ChEBI, PubChem, and MetaCyc then the list will contain all resources (providing all resources are loaded and web-services is selected for PubChem).

fetch-structures-all

## Curate Metabolite

Tools > Annotation > Curate Metabolite requires a selection of one or more metabolites

This tools simplifies curation of chemical structure providing several more manual methods to attach structures. The curation can be performed in batch where by you can skip to the next selected metabolite or skip all.

  1. Database Search - Search a number of resources and manually choose the best candidate cross-reference
  2. Assign Structure - Directly attach a chemical structure from MDL Mol, CML, InChi and SMILES
  3. Generate Peptide - Generate the desired dipeptide structure
  4. Manual Cross-reference - Manually add a cross-reference
  5. Web Search - Search the internet for the given name (useful for copying InChI/SMILES records across)

curate-metabolite

#### Database Search

This widget allows a user to manually select the best entry from searching multiple databases.

curate-metabolite-db-search

The top of the widget displays the current match (selected entry in table on left, or first entry). The match indication indicates how well the name, formula and charge of the entity (left) matches the candidate (right). The match follows a traffic light scheme of green -> good, orange -> okay and red -> bad. It may show a matching metabolite with an exact name match but a different formula and protonation stated. This is is the case in the example. The formula of 4-Phospho-L-aspartate is orange as it matches when the charge difference is considered. If the formula is completely different then it is coloured red. In this case there is a better match with CHEBI:57535. You can check this match by selecting it in the table.

curate-metabolite-db-search-match

To assign this cross-reference you must click the Assign button bellow the resource list. This allows multiple cross-references to be assigned.

By default the search will only search the resource at the top of the list. You can change which resource is searched by reordering the list. If we drag KEGG Compound to the top we now have one candidate. Note you may need to reselect the text field above the resource list to trigger a research.

curate-metabolite-db-search-kegg

In this case it looks like the ChEBI entry would fit better. We can also search KEGG for an approximate match by selecting this option. This will produce a lot more candidates. As you can see selecting one which is not right will turn all match indicators red.

curate-metabolite-db-search-kegg-aprx

The text field above the resource list is the actual query being search. Sometimes an entity may have a typo in the name in which case it is beneficial to change the query however in most cases this isn't needed.

#### Assign Structure

The second widget allows you attach a structure to the metabolite in various formats. The format will be auto detected but it can also be selected in the top left.

curate-metabolite-assign-structure

The InChI and SMILES line notation are easily be selected from chemical databases or Wikipedia ChemBox.

curate-metabolite-assign-structure-inchi

It is also possible to paste MDL Mol or CML input. This input can obtain from chemical structure editors such as MarvinSketch (ChemAxon) or JChemPaint. When editing a structure in MarvinSketch the default copy format is CML which can be directly pasted into the text area.

curate-metabolite-assign-structure-mol

MDL Mol and CML are preferable to SMILES/InChI as they maintain the structure diagram (i.e. atom coordinates). See Generate Structure Diagram on generate structure diagrams when one is not available.

#### Generate Peptide

This curation section allows you to generate a poly-peptide for a selected metabolite. For the example below the name ser-ser-ser-ser was provided. You can see under the Generate Peptide section the form has been filled out with the correct amino-acid residues. By default the L stereo form will be selected.

Generate Peptide

You can add a residue at any position by clicking the green (+) icon. With a new residue added, we can select the type of residue.

Generate Peptide

You can also remove residues by clicking the red (-) icon. When you have configured you peptide click Okay at the bottom right of the dialog.

Generate Peptide

A structure for you peptide chain will be generated and added to the selected entry.

Generate Peptide

#### Manual Cross-reference

This widget allows you manually specify cross-references. The input is identical to how cross-references are entered in the metabolite table (see Editing).

curate-metabolite-xref

#### Web Search

This widget launches an internet search in your default web browser and is useful when combined with the previous widget to manually enter a cross-reference. By default the search is restricted to common chemical database sites. These sites are: ebi.ac.uk/chebi, pubchem.ncbi.nlm.nih.gov, metacyc.org, ebi.ac.uk/chembl, hmdb.ca, molecular-networks.com/biopath and chemspider.com

curate-metabolite-web-search

Clicking Search will launch a new page/tab in you default browser.

curate-metabolite-web-hits

## Extract Unencoded Cross-references

Tools > Annotation > Extract cross-references from notes requires a selection of one or more metabolic entities

This tool will try to match cross-references from Note annotations. As an example, we have imported an entry from SBML which has a ChEBI identifier specified in the comments of a species. This has been loaded as a Note in Metingear.

metabolite-xref-note

The note reads ChEBI id: CHEBI:15422. To extract this cross-reference a regular expression is used to match the database indicator, separator and the actual accession. In this cases the database indicator is ChEBI id, the separator is : and the accession is CHEBI:15422. In future it will be possible to customise each part but at pressent this is not possible. Running the tool on the above example would correctly extract the ChEBI cross-reference.

Verify accession is valid This option will check the parsed accession is valid for the given database. For example consider a PubChem-Compound ID PubChem-Compound: 5957 - this is a valid identifier. The same document may however specify empty entries as follows PubChem-Compound: N/A. In this case it is desirable to check whether N/A is a valid PubChem-Compound ID. As it is not valid is not added as a cross-reference.

In newer versions it is also possible to use this dialog to extract cross-references from the identifier, name and abbreviation attributes. Change the tab of the dialog to 'Extract from Id, Name or Abrv' and choose which attribute you which to convert and to what cross-reference it should set. This is used when imported entities contain database identifiers as the names of the compounds, you can use this tool to add these name as a cross-reference to a specific resource. This then allows other tools which require a cross-reference to be used on the entity.

## Extract Textual Annotations from Notes

Tools > Annotation > Extract text annotations from notes requires a selection of one or more metabolic entities

This tool allows you to extract information from comments and notes using regular expressions. As an example we may have the following entry imported from an SBML file.

metabolite-with-notes

The formula was specified as a comment in the SBML - we can easily extract the formula from this comment by selecting the correct matcher.

extract-text-annotation

Selecting the Molecular Formula annotation will display the regular expression pattern which will be used.

metabolite-with-extracted-annotation

Pattern - A regular expression pattern with a single set of capturing parathesis. To completely take the whole string from a note and load it into the selected annotation one may use (.+). It is also possible to specify specific matches such as Locus:\\s+(.+) to match the string 'Locus:' and capture everything after.

Case Insensitive - whether the pattern is case insensitive (on by default). This allows a pattern like Locus:\\s+(.+) to match two notes which only differ by case; Locus: B231 and locus: B241. If this is option is not selected only the first note would be matched.

Remove After Match - whether the matched Note is removed after a match is successful. This is useful to avoid duplicate information, however leaving the note intact allows one to see where an annotation might have come from.

## Rename from Resource

Tools > Annotation > Rename from Resource requires a selection of metabolites with cross-references

This tool allows you to rename metabolites that have cross-references assigned. This is particularly useful when importing a pathway or collection of reactions which is exclusively defined by identifiers (i.e. KEGG Compound) or when a metabolite has an incorrect name.

Begin by selecting the metabolites from the metabolite view.

rename-from-resource-select

With a selection, click the menu item Tools > Annotation > Rename from Resource.

rename-from-resource-menu

You can select whether to allow use of web-services and which resource to use. If no resources are selectable ensure the selected metabolites have cross-references and a resource is available. If no resource is available it may need loading (see Resources). Once the resource is selected click Okay to rename the entries, this action can be undo.

rename-from-resource-output

## Expand UniProt Annotations

Tools > Annotation > Expand UniProt Annotation for proteins with a UniProt-KB cross-reference transfer additional annotation from UniProt

Requires the UniProt cross-references to be indexed: Resources/UniProt

This tool will allow you to expand out the annotations for an entity with a UniProt cross-reference. As an example UniProt entry P39594 has an Enzyme Nomenclature annotation EC 2.5.1.3. Such an annotation is useful for linking reactions to protein products and we can transfer from the UniProt entry on to our selected protein product.

For this example I have some genes and proteins preloaded from ENA (see Import/ENA).

exp-uniprot-start

With the gene product view active, select one or more protein products from the table.

exp-uniprot-select

With the entires selected, go to the menu item Tools > Annotation > Expand UniProt Annotations. There are no configuration options so the tool will run straight away. When there is a large number of proteins selected there will be an indicator as to the estimated completion time. When the transfer is complete the entries will be updated with the annotations from the entry.

exp-uniprot-done

## Select Choke Points

Tools > Select Choke Points requires an active project and a selected set of reactions

A choke point is a reaction which uniquely consumes or produces a metabolite. From a selected set of reactions Metingear will identify which reactions in that set uniquely consume or produce a given metabolite. After selecting the menu item the choke points are selected in the current view and annotated as to which participant they uniquely consume or produce.

## Merge Loci

This tool is deprecated in the most recent release, please use Associate Reactions to Gene Products.

Tools > Sequence Homology _requires a selection of reactions with Locus annotations and gene products whose primarily identifier matches the loci of the reactions _

This tools allows you to associate reactions with gene products using a reactions Locus annotation. Our gene product table contains Proteins that have an accession which can be matched to a Locus annotation.

Merge Loci Gene Products

Below you can see our reaction has a Locus annotation that can be found in the gene products table (see. selecte entry above).

Merge Loci Reaction

Selecting the menu item, Tools > Merge Loci will associate this reaction with the gene product BG13302 shown here by it's name.

Merge Loci Reaction Linked

## Remove Worst Structures `Tools > Remove Worst Structures` _requires an active project and a selected set of metabolites_

This menu item will inspect the chemical structure annotations of metabolites and remove those which score the worst. If there is no good or okay match to the specified formula and charge then no structures are removed. In the example below there is one good match and two okay matches. The two okay matches would be removed.

Remove Worst

In this next example, there are three okay matches. None of these would be removed.

Remove Worst 2

## Sequence Homology `Tools > Sequence Homology` _requires blast+ configured in preferences, an active project and a selected set of protein products_

Configuring ncbi-blast+: See. preferences

This tool will run a local homology search on the selected protein sequences and attach the results the entity. From the Gene Products view select one or more proteins which you which to perform a homology search on. Selecting Tools > Sequence Homology will display a dialog with several options.

local-homology-dialog

Program: Not currently configurable (FASTA is planned)

Threads: The number of CPUs to use for the homology search. This option maps to the command line parameter threads. By default this will be set to the number of available processors.

Expected Value: The expectation value (E) threshold for saving hits. This options maps to the command line parameter evalue.

Database: The database to search for homology in. The list is fetched from the Blast database root (see preferences).

Max Results: Only return the top n hits. This option maps to both the num_descriptions and num_alignments command line options.

-Parse Alignments_: Specifies whether the actual alignment should be parsed or just the score. Parsing the alignment increases the save size both of the temporary search file and when saving a reconstruction.

Normally the only option which needs configuring is the the selection Database of the database. Once the database has been selected then clicking Okay will create a runnable task. A runnable task is a computation that does not run straight away, instead it can be run in the background. With the task built the side bar will update with the named tasked.

local-homology-task-queued

Clicking on the task will change to the Task View.

local-homology-task-table

To begin all tasks start queued, to run all queued tasks select Run > Run Queued Tasks.

When a task is running the sidebar will indicate a continuous progress icon.

local-homology-task-running

The table will also indicate this.

local-homology-task-table-running

When the task is completed all affected entities will be updated with their new annotations/observations.

Viewing sequence homologies

With the local homology task complete you can view the local sequence homologies by clicking on one of the selected proteins.

local-homology-observations

The score of the alignment is visible by hovering the mouse the one of the alignments.

local-homology-tooltip

If the Parse Alignments option was selected then the actual sequence alignment is also displayed.

local-homology-alignment

## Transfer Functional Annotations (experimental)

Tools > Transfer Functional Annotations requires UniProt annotations are loaded (see. resources[link!]) and a selection of one or more gene products

Database cross-references from locally aligned sequences can be transfered from UniProt entries. Currently this feature is experimental and will simply transfer all cross-references from all alignments.

local-homology-evidence

Beside each transferred annotation is an evidence button. Clicking on the evidence button will indicate where that annotation was transferred from.

## Create Stoichiometric Matrix

Tools > Create Stoichiometric Matrix requires reactions are pressent in the reconstruction

To create a matrix select one or more reactions from the reaction view. If no reactions are selected all reactions will be added to the matrix.

matrix-select-reactions

With a select set of reaction, click the menu item Tools > Create Stoichiometric Matrix. A dialog will pop up and show the completed stoichiometric matrix.

matrix-pane

This matrix can be exported to various outputs (see Export).

## Generate Structure Diagram

When importing a chemical structure from line-notation (i.e. InChI or SMILES) the structure will not have any coordinates. This in turn will display the following messages is displayed.

sdg-no-coordinates

To generate coordinates simply select the entry and the menu item Tools > Structure > Generate Structure Diagram.

sdg-with-coordinates

Note: care should be taken not to overwrite existing coordinates (default option) as the implementation of the Structure Diagram (from the Chemistry Development Kit) can clobber stereochemistry.

## Associate, Reaction to Gene Products

Tools > Associate > Reaction to Gene Products associate reactions to their gene products (i.e. the enzyme encoding the reaction)

This dialog allows you to semantically link a reaction to a gene product and indicate that the gene product is the enzyme responsible for the specified reaction. The association is done by selecting which entity attribute will be used to find associations. As an example one could associate reactions with gene products by finding those which share the same enzyme classification.

In this mock example below there are two reactions rxn1 and rxn1.

asc-gpr-rxn-tbl

There are also three protein products p1, p2 and p3. The products p1 and p2 and enzyme classification (E.C.) number of 1.1.1.1 and 1.1.1.85 respectively.

asc-gpr-gp-tbl

We can link all protein products to their respective reaction using the dialog Tools > Associate > Reactions to Gene Products.

asc-gpr-dialog-1

We will associate these entities using the E.C. annotation for both gene product and reaction.

asc-gpr-dialog-2

Once completed you can now jump to the entities in each table by following the associations in the inspector. Here are the associations for p1.

asc-gpr-done

## Merge Metabolites

Tools > Merge > Metabolties Merge all metabolites in the model which match a set condition. For manually merging by selection see: Edit/Merge

To demonstrate this tool we will use two Escherichia coli K-12 MG1655 pathways from KEGG.

The files were imported with (Import/KGML) providing 52 metabolites and 50 reactions. The metabolites were then renamed using Tools/Rename from Resource to replace the compound id with a recognisable name.

mrg-mtb-unsorted

When we sort by name we can identify several metabolites which are referring to the same entity (e.g Acetyl-COA, Pyruvate).

mrg-mtb-sorted

Open the menu item Tools > Merge > Metabolites. At the time of writing you can currently merge on three values; Identifier, Abbreviation or Name. In the near future there will also be support to merge by cross-reference or chemical structure. For now, select Name to merge by and press Okay.

mrg-mtb-dialog

The metabolites with matching names have now been merged into a single entry. There are now 42 metabolites instead of 52, this merge can be undone with Edit > Undo. It is important to note that this action will not remove duplicate reactions which result from the merge. These reactions however can be easily filtered out from a stoichiometric matrix by looking for identical columns.

mrg-mtb-done

File

Import

## Cross-references and Annotations

File > Import... > Cross-references and Annotations import cross-references and annotations from a tab separated value (.tsv) or comma separated value (.csv)

There are several options to configure when importing.

  • Selected - the file to import from, to choose a file click on the folder icon to the right.
  • Header - indicate whether the file you added contains a header row (column names).
  • Separator - select 'Tabs' to import from .tsv or 'Commas' to import from .csv.
  • Map With - indicate how you which to pair the cross-references in the file with those in your loaded reconstruction. You can pair cross-references using the entity identifiers, abbreviations or names.
  • Destination - select one or more entity types to import the cross-references on to.
  • Mode - how to handle the file
    • Resource Inference - this will attempt to guess which resource the identifier is from. Identifiers such as CHEBI:15422 and C00002 will be correctly picked up as ChEBI and KEGG Compound but identifiers such as 5957 from PubChem-Compound will not be imported as it is ambiguous. In such a case the other two modes may be able to help.
    • Single Resource - this will import all identifiers as a single specified resource. If your file contains cross-references to a single resource then you can use this to specify what the identifiers are.
    • Resource Mapping - if the file lists multiple resources these can be specified in the third column. Note the name of the resource should appear as listed in the MIRIAM Registry.
    • Single Annotation - this will import the second column of a table as the specified annotation. This allows you to import one of the selected annotation types (e.g. Gibbs Energy, SMILES, InChI, Formula) onto the entires in the model.

Example of a table to be imported as 'Single Resource' and the configured dialog

Abbreviation Pubchem-Compound
atp 5957
gdp 6830
gtp 6022

import-cross-reference-single

Example of a table to be imported as 'Resource Mapping' and the configured dialog

Abbreviation Accession Resource
atp META:ATP BioCyc
atp CHEBI:15422 ChEBI
atp C00002 KEGG Compound
atp 5957 PubChem-Compound
gdp 6830 PubChem-Compound
gtp 6022 PubChem-Compound

import-cross-reference-mapped

Example of Gibbs energy to be imported as 'Single Annotation' and the configured dialog

R/1286739 5.2 ± 0.5
R/1286740 4.3 ± 0.3
R/1286740 -3.1 ± 0.6
![import-cross-reference-gibbs](http://johnmay.github.com/metingear/images/tutorial/import-cross-reference-gibbs.png)