Skip to content

G Links annotation aggregator service

gaou edited this page Oct 15, 2020 · 37 revisions

G-Links

G-Links is a rapid data "broker" service that collects and adds related information to a given gene (or gene set).

With the availability of numerous curated databases, researchers are nowadays able to efficiently utilize the multitude of biological data by integrating these resources by hyperlinks and cross references. A large proportion of bioinformatics research tasks, however, is comprised of labor-intensive tasks in fetching, parsing, and merging of these datasets and functional annotations from dispersed databases and web-based services. Therefore, data integration is one of the key challenges of bioinformatics. We here present G-Links, a gateway server for querying and retrieving gene annotation data. The system supports rapid querying with numerous gene IDs from multiple databases or nucleotide/amino acid sequences, by internally centralizing gene annotations based on UniProt entries. This system therefore first converts the query into UniProt ID by ID conversion or by sequence similarity search, and returns related annotations and cross references. Moreover, users are able to run external web-based tools based on the query gene. G-Links is implemented as a RESTful service, so users can easily access this tool from any web browser. This service and documentations are freely available at http://link.g-language.org/.

Base URL

Quick Start

Input sequence (Amino acid or Nucleotide) or Gene ID
<textarea name="query" cols="80" rows="5"></textarea>

Output Format : HTML (default) Tabular Notation3 RDF
E-value :
Identity :
Feeling lucky :

Documentations

Usage (Syntax)

This section describes the G-Links URI syntax conventions: for usage examples, scroll below. G-Link is provided by REST interface. Database cross-references information related as given gene ID or sequence (nucleotide or amino acid) can be accessed through HTTP GET/POST request using unique URI.

REST URI conventions

URL Syntax of G-Links. G-Links is implemented as a RESTful service that can be queried by altering the URL.

figure1

HTML output

HTML output example of BRCA1_HUMAN (UniProt ID of BRCA1 gene in humans). By default, access to G-Links with web browsers display the results in interactive HTML, with related image gallery implemented with CoverFlow (http://imageflow.finnrudolph.de/) on the top, followed by a large table of annotations and cross-references.

figure2

Qualifiers

Standard qualifiers
  • GENE
    • Sequence (nucleotide or amino acid)
    • Gene ID (Available ID list is here)
      • Note: NCBI Entrez Gene ID needs to be specified as "GeneID:¥d", since G-Links considers IDs in numbers-only as taxonomy ID.
    • NCBI tax ID (i.e. 9606)
    • RefSeq Genome ID (i.e. NC_000913)
Optional qualifiers

List of available databases (IDs)

Overview of supported databases and web services in G-Links:

table1

Detailed list is as follows:

Usage Examples

Query URLs

Sample Scripts

Example of programatic access

One of the strength of G-Links is its programmatic access. For example, GO slim classification of all genes of E.coli for GO:Process ontology can be retrieved from the following URL:

  • http://link.g-language.org/NC_000913/extract=GOslim_process This result is shown as a formatted HTML page when viewed in a browser, but when it is accessed from the command line or from programs, the result is automatically returned as TSV file. Using this, simple combination of UNIX commands can produce a classification summary of all genes in E.coli with GOslim:Process terms. Here is an example:
$ curl -v http://link.g-language.org/NC_000913/extract=GOslim_process  |grep \# |cut -f 2,3 |grep GO: |sort |uniq -c |sort -rn

Here, G-Links is accessed from the command line, producing the result to standard output via "curl -v", and the sections containing GO terms and its descriptions are extracted ("|grep # |cut -f 2,3 |grep GO:). Then, the terms are sorted and counted ("|sort |uniq -c"), and printed in a descending order ("|sort -rn").

Following is the output of the above line of commands:

1056 GO:0009058	biosynthetic process
1032 GO:0008150	biological_process
 860 GO:0034641	cellular nitrogen compound metabolic process
 636 GO:0044281	small molecule metabolic process
 526 GO:0006810	transport
 484 GO:0006950	response to stress
 381 GO:0005975	carbohydrate metabolic process
 374 GO:0009056	catabolic process
 285 GO:0055085	transmembrane transport
 273 GO:0006259	DNA metabolic process
 257 GO:0006520	cellular amino acid metabolic process
 190 GO:0051186	cofactor metabolic process
 169 GO:0006629	lipid metabolic process
 127 GO:0006464	cellular protein modification process
 127 GO:0006091	generation of precursor metabolites and energy
  98 GO:0006790	sulfur compound metabolic process
  96 GO:0042592	homeostatic process
  92 GO:0032196	transposition
  84 GO:0006399	tRNA metabolic process
  79 GO:0007165	signal transduction
  76 GO:0071554	cell wall organization or biogenesis
  72 GO:0022607	cellular component assembly
  63 GO:0006412	translation
  52 GO:0034655	nucleobase-containing compound catabolic process
  50 GO:0051301	cell division
  50 GO:0007155	cell adhesion
  50 GO:0006457	protein folding
  45 GO:0048870	cell motility
  43 GO:0006461	protein complex assembly
  39 GO:0007049	cell cycle
  37 GO:0040011	locomotion
  32 GO:0051604	protein maturation
  31 GO:0071941	nitrogen cycle metabolic process
  31 GO:0051276	chromosome organization
  21 GO:0061024	membrane organization
  19 GO:0019748	secondary metabolic process
  18 GO:0044403	symbiosis, encompassing mutualism through parasitism
  17 GO:0000003	reproduction
  15 GO:0022618	ribonucleoprotein complex assembly
  14 GO:0007059	chromosome segregation
  14 GO:0002376	immune system process
   9 GO:0006605	protein targeting
   7 GO:0008219	cell death
   5 GO:0042254	ribosome biogenesis
   4 GO:0006397	mRNA processing
   3 GO:0000902	cell morphogenesis
   2 GO:0048646	anatomical structure formation involved in morphogenesis
   2 GO:0030198	extracellular matrix organization
   2 GO:0030154	cell differentiation
   1 GO:0065003	macromolecular complex assembly
   1 GO:0015979	photosynthesis
   1 GO:0007267	cell-cell signaling
   1 GO:0007010	cytoskeleton organization

If you have a specific set of genes, such as RECA_ECOLI,RUVB_ECOLI,LEXA_ECOLI,UMUD_ECOLI, that may be over represented in a microarray experiment, running the same routine with this list of genes can produce the Gene Ontology classification of these genes of interest.

$ curl -v http://link.g-language.org/RECA_ECOLI,RUVB_ECOLI,LEXA_ECOLI,UMUD_ECOLI/extract=GOslim_process  |grep \# |cut -f 2,3 |grep GO: |sort |uniq -c |sort -rn

This will produce:

   4 GO:0006950	response to stress
   4 GO:0006259	DNA metabolic process
   3 GO:0008150	biological_process
   2 GO:0009058	biosynthetic process
   1 GO:0051276	chromosome organization
   1 GO:0048870	cell motility
   1 GO:0034641	cellular nitrogen compound metabolic process

Now these values are readily used to test its enrichment by Fisher's exact test, for example, to calculate Gene Ontology enrichment scores.

If alternative classification is desirable, simply change the extracting term from GOslim_process to, for example, KEGG BRITE hierarchy.

$ curl -v http://link.g-language.org/NC_000913/extract=KEGG_Brite  |grep \# |cut -f 2,3 |grep ko |sort |uniq -c |sort -rn

This will produce:

1452 ko00001	KEGG Orthology (KO)
1017 ko01000	 Enzymes
 838 ko00002	 KEGG pathway modules
 358 ko01000	Enzymes
 282 ko02000	 Transporters
 197 ko02000	Transporters
 129 ko03000	Transcription factors
  89 ko03400	 DNA repair and recombination proteins
  84 ko03016	 Transfer RNA biogenesis
  65 ko01002	 Peptidases
  61 ko02035	 Bacterial motility proteins
  58 ko02022	 Two-component system
  57 ko03011	 Ribosome
  56 ko03011	 M00178  Ribosome, bacteria
  52 ko02044	 Secretion system
  49 ko03009	 Ribosome biogenesis
  45 ko01007	 Amino acid related enzymes
  44 ko00002	KEGG pathway modules
  39 ko01005	 Lipopolysaccharide biosynthesis proteins
  33 ko01001	 Protein kinases
  31 ko03011	 M00179  Ribosome, archaea
  28 ko03036	Chromosome
  27 ko03110	Chaperones and folding catalysts
  27 ko01003	 Glycosyltransferases
  26 ko03036	 Chromosome
  26 ko03032	 DNA replication proteins
  25 ko02044	Secretion system
  20 ko03110	 Chaperones and folding catalysts
  20 ko01004	 Lipid biosynthesis proteins
  19 ko03009	Ribosome biogenesis
  15 ko03012	Translation factors
  13 ko02044	 M00331  Type II general secretion system
  12 ko02044	 M00335  Sec (secretion) system
  11 ko02000	 M00240  Iron complex transport system
  11 ko01002	Peptidases
  10 ko03021	Transcription machinery
  10 ko03021	 Transcription machinery
  10 ko02035	Bacterial motility proteins
  10 ko02000	 M00324  Dipeptide transport system
   9 ko03400	 M00260  DNA polymerase III complex, bacteria
   9 ko03032	 M00260  DNA polymerase III complex, bacteria
   9 ko02000	 M00306  PTS system, fructose-specific II-like component
   8 ko03400	DNA repair and recombination proteins
   8 ko03032	DNA replication proteins
   8 ko03000	 Transcription factors
   8 ko00194	 Photosynthesis proteins
   7 ko02000	 M00221  Putative simple sugar transport system
   7 ko01006	 Prenyltransferases
   6 ko02000	 M00439  Oligopeptide transport system
   6 ko02000	 M00239  Peptides/nickel transport system
   6 ko02000	 M00237  Branched-chain amino acid transport system
   6 ko01005	Lipopolysaccharide biosynthesis proteins
   5 ko03016	Transfer RNA biogenesis
   5 ko03012	 Translation factors
   5 ko02000	 M00440  Nickel transport system
   5 ko02000	 M00279  PTS system, galactitol-specific II component
   5 ko02000	 M00229  Arginine transport system
   5 ko02000	 M00185  Sulfate transport system
   4 ko03400	 M00183  RNA polymerase, bacteria
   4 ko03021	 M00183  RNA polymerase, bacteria
   4 ko02044	 M00336  Twin-arginine translocation (Tat) system
   4 ko02022	Two-component system
   4 ko02000	 M00349  Microcin C transport system
   4 ko02000	 M00348  Glutathione transport system
   4 ko02000	 M00300  Putrescine transport system
   4 ko02000	 M00299  Spermidine/putrescine transport system
   4 ko02000	 M00283  PTS system, ascorbate-specific II component
   4 ko02000	 M00238  D-Methionine transport system
   4 ko02000	 M00230  Glutamate/aspartate transport system
   4 ko02000	 M00226  Histidine transport system
   4 ko02000	 M00225  Lysine/arginine/ornithine transport system
   4 ko02000	 M00222  Phosphate transport system
   4 ko02000	 M00219  AI-2 transport system
   4 ko02000	 M00209  Osmoprotectant transport system
   4 ko02000	 M00198  Putative sn-glycerol-phosphate transport system
   4 ko02000	 M00197  Putative fructooligosaccharide transport system
   4 ko02000	 M00194  Maltose/maltodextrin transport system
   4 ko02000	 M00193  Putative spermidine/putrescine transport system
   4 ko02000	 M00189  Molybdate transport system
   3 ko04812	 Cytoskeleton proteins
   3 ko02035	 M00506  CheA-CheYBV (chemotaxis) two-component regulatory system
   3 ko02030	 M00506  CheA-CheYBV (chemotaxis) two-component regulatory system
   3 ko02022	 M00506  CheA-CheYBV (chemotaxis) two-component regulatory system
   3 ko02022	 M00474  RcsC-RcsD-RcsB (capsule synthesis) two-component regulatory system
   3 ko02001	Solute carrier family
   3 ko02000	 M00436  Sulfonate transport system
   3 ko02000	 M00435  Taurine transport system
   3 ko02000	 M00320  Lipopolysaccharide export system
   3 ko02000	 M00287  PTS system, galactosamine-specific II component
   3 ko02000	 M00280  PTS system, glucitol/sorbitol-specific II component
   3 ko02000	 M00276  PTS system, mannose-specific II component
   3 ko02000	 M00275  PTS system, cellobiose-specific II component
   3 ko02000	 M00274  PTS system, mannitol-specific II component
   3 ko02000	 M00259  Heme transport system
   3 ko02000	 M00255  Lipoprotein-releasing system
   3 ko02000	 M00254  ABC-2 type transport system
   3 ko02000	 M00248  Putative antibiotic transport system
   3 ko02000	 M00242  Zinc transport system
   3 ko02000	 M00241  Vitamin B12 transport system
   3 ko02000	 M00234  Cystine transport system
   3 ko02000	 M00232  General L-amino acid transport system
   3 ko02000	 M00227  Glutamine transport system
   3 ko02000	 M00217  D-Allose transport system
   3 ko02000	 M00215  D-Xylose transport system
   3 ko02000	 M00214  Methyl-galactoside transport system
   3 ko02000	 M00213  L-Arabinose transport system
   3 ko02000	 M00212  Ribose transport system
   3 ko02000	 M00210  Putative ABC transport system
   3 ko02000	 M00208  Glycine betaine/proline transport system
   3 ko02000	 M00207  Putative multiple sugar transport system
   3 ko02000	 M00192  Putative thiamine transport system
   3 ko02000	 M00191  Thiamine transport system
   3 ko01008	 Polyketide biosynthesis proteins
   2 ko04040	Ion channels
   2 ko02044	 M00429  Competence-related DNA transformation transporter
   2 ko02042	Bacterial toxins
   2 ko02022	 M00502  GlrK-GlrR (amino sugar metabolism) two-component regulatory system
   2 ko02022	 M00500  AtoS-AtoC (complexed poly-(R)-3-hydroxybutyrate biosynthesis) two-component regulatory system
   2 ko02022	 M00499  HydH-HydG (metal tolerance) two-component regulatory system
   2 ko02022	 M00497  GlnL-GlnG (nitrogen regulation) two-component regulatory system
   2 ko02022	 M00488  DcuS-DcuR (aerobic C4-dicarboxylate metabolism) two-component regulatory system
   2 ko02022	 M00486  CitA-CitB (citrate fermentation) two-component regulatory system
   2 ko02022	 M00477  EvgS-EvgA (acid and drug tolerance) two-component regulatory system
   2 ko02022	 M00475  BarA-UvrY (central carbon metabolism) two-component regulatory system
   2 ko02022	 M00473  UhpB-UhpA (hexose phosphates uptake) two-component regulatory system
   2 ko02022	 M00472  NarQ-NarP (nitrate respiration) two-component regulatory system
   2 ko02022	 M00471  NarX-NarL (nitrate respiration) two-component regulatory system
   2 ko02022	 M00456  ArcB-ArcA (anoxic redox control) two-component regulatory system
   2 ko02022	 M00455  TorS-TorR (trimethylamine N-oxide respiration) two-component regulatory system
   2 ko02022	 M00454  KdpD-KdpE (potassium transport) two-component regulatory system
   2 ko02022	 M00453  QseC-QseB (quorum sensing) two-component regulatory system
   2 ko02022	 M00452  CusS-CusR (copper tolerance) two-component regulatory system
   2 ko02022	 M00451  BasS-BasR (antimicrobial peptide resistance) two-component regulatory system
   2 ko02022	 M00450  BaeS-BaeR (envelope stress response) two-component regulatory system
   2 ko02022	 M00449  CreC-CreB (phosphate regulation) two-component regulatory system
   2 ko02022	 M00447  CpxA-CpxR (envelope stress response) two-component regulatory system
   2 ko02022	 M00446  RstB-RstA two-component regulatory system
   2 ko02022	 M00445  EnvZ-OmpR (osmotic stress response) two-component regulatory system
   2 ko02022	 M00444  PhoQ-PhoP (magnesium transport) two-component regulatory system
   2 ko02022	 M00434  PhoR-PhoB (phosphate starvation response) two-component regulatory system
   2 ko02000	 M00303  PTS system, N-acetylmuramic acid-specific II component
   2 ko02000	 M00272  PTS system, arbutin-, cellobiose-, and salicin-specific II component
   2 ko02000	 M00270  PTS system, trehalose-specific II component
   2 ko02000	 M00266  PTS system, maltose and glucose-specific II component
   2 ko02000	 M00265  PTS system, glucose-specific II component
   2 ko02000	 M00258  Putative ABC transport system
   2 ko02000	 M00256  Cell division transport system
   2 ko02000	 M00224  Putative phosphonate transport system
   2 ko02000	 M00223  Phosphonate transport system
   2 ko02000	 M00211  Putative ABC transport system
   1 ko04121	 Ubiquitin system
   1 ko04090	 Cellular antigens
   1 ko03051	 Proteasome
   1 ko02044	 M00571  AlgE-type Mannuronan C-5-Epimerase transport system
   1 ko02044	 M00339  RaxAB-RaxC type I secretion system
   1 ko02044	 M00326  RTX toxin transport system
   1 ko02000	 M00491  Putative arabinogalactan oligomer transport system
   1 ko02000	 M00325  alpha-Hemolysin/cyclolysin transport system
   1 ko02000	 M00305  PTS system, 2-O-A-mannosyl-D-glycerate-specific II component
   1 ko02000	 M00277  PTS system, N-acetylgalactosamine-specific II component
   1 ko02000	 M00273  PTS system, fructose-specific II component
   1 ko02000	 M00271  PTS system, beta-glucosides-specific II component
   1 ko02000	 M00268  PTS system, arbutin-like II component
   1 ko02000	 M00267  PTS system, N-acetylglucosamine-specific II component
   1 ko02000	 M00190  Iron(III) transport system
   1 ko00194	Photosynthesis proteins

Versions

G-Links database is updated once every six month. Next update is Feb 2016.

  • UniProt : 2015_10
    • idmapping, Swiss-Prot, TrEMBL, taxnomic_divisions
  • GEO : 2015_10
    • GeoDb_blob82
  • Enzyme : 16-Sep-2015
  • PharmGKB : 2015-10-04
    • Genes, RSID mapping
  • PID : 2012-9-18 (latest)
  • BIOGRID : 3.4.129
  • Gene Ontology : 2015-09-24

License

Reference

Clone this wiki locally