<a href="https://colab.research.google.com/github/pachterlab/gget_examples/blob/main/gget_workflow_terminal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial: `gget` in the terminal

[`gget`](https://github.com/pachterlab/gget) currently consists of the following nine modules:
- `gget ref`
Fetch File Transfer Protocols (FTPs) and metadata for reference genomes and annotations from [Ensembl](https://www.ensembl.org/) by species.
- `gget search`
Fetch genes and transcripts from [Ensembl](https://www.ensembl.org/) using free-form search terms.
- `gget info`
Fetch extensive gene and transcript metadata from [Ensembl](https://www.ensembl.org/), [UniProt](https://www.uniprot.org/), and [NCBI](https://www.ncbi.nlm.nih.gov/) using Ensembl IDs.  
- `gget seq`
Fetch nucleotide or amino acid sequences of genes or transcripts from [Ensembl](https://www.ensembl.org/) or [UniProt](https://www.uniprot.org/), respectively.  
- `gget blast`
BLAST a nucleotide or amino acid sequence to any [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) database.
- `gget blat` 
Find the genomic location of a nucleotide or amino acid sequence using [BLAT](https://genome.ucsc.edu/cgi-bin/hgBlat).
- `gget muscle` 
Align multiple nucleotide or amino acid sequences to each other using [Muscle5](https://www.drive5.com/muscle/).
- `gget enrichr`
Perform an enrichment analysis on a list of genes using [Enrichr](https://maayanlab.cloud/Enrichr/).
- `gget archs4` 
Find the most correlated genes to a gene of interest or find the gene's tissue expression atlas using [ARCHS4](https://maayanlab.cloud/archs4/).

___

Install gget:

In [None]:
!pip install gget -q

___


<h1><center>Terminal version</center></h1>
<center>Jupyter lab version below.<center>


In [None]:
!gget

In [None]:
# # Show complete manual
# !gget -h

___
# Find reference genome metadata and download links

In [None]:
# Show manual
!gget ref

In [None]:
# Fetch the reference genome metadata of the latest Homo sapiens genome
!gget ref human

In [None]:
# Fetch only the GTF (annotation reference) FTP of the latest Homo sapiens genome
!gget ref -w gtf -ftp human

Ensembl just released Ensembl 106. Note that gget ref and search will automatically fetch from that release now unless a previous release is specified (all other functions are release independent). Show newly available genomes in the latest Ensembl release (compared to previous release 105):

In [None]:
!comm -13 <(gget ref -l -r 105 | sort) <(gget ref -l | sort)

___

# Find gene IDs based on free form search words:
Searching for 'fun' genes in the zebra finch genome.

In [None]:
# Show manual
!gget search

In [None]:
!gget search --species taeniopygia_guttata fun

# Use [Enrichr](https://maayanlab.cloud/Enrichr/) to perform a pathway enrichment analysis on a list of genes

In [None]:
# Show manual
!gget enrichr

In [None]:
!gget enrichr -db pathway AIMP1 MFHAS1 BFAR FUNDC1 AIMP2 ASF1A

# Find the 100 most correlated genes to a gene of interest or show its tissue expression using the [ARCHS4](https://maayanlab.cloud/archs4/) database

In [None]:
# Show manual
!gget archs4

In [None]:
!gget archs4 AIMP1

In [None]:
!gget archs4 --which tissue AIMP1

# Fetch additional information about genes/transcripts:

In [None]:
# Show manual
!gget info

In [None]:
# Show short info on a few of the genes (includes the canonical transcript for each)
!gget info ENSTGUG00000006139 ENSTGUG00000026050 ENSTGUG00000004956

# Fetch the **nucleotide** sequence of a gene, or the **nucleotide** sequences corresponding to all its known protein isoforms.

In [None]:
# Show manual
!gget seq

In [None]:
# Flag [-o] defines the file the results will be saved in
!gget seq -o gene_fasta.fa ENSTGUG00000006139

Get the nucleotide sequences of all known isoforms of ENSTGUG00000006139:

In [None]:
!gget seq -iso -o gene_iso_fasta.fa ENSTGUG00000006139

# Fetch the **amino acid** sequence of the canonical transcript of a gene, or the **amino acid** sequences corresponding to all its known protein isoforms.

In [None]:
# Get amino acid (AA) sequence of canonical transcript
!gget seq --transcribe -o transcript_fasta.fa ENSTGUG00000006139

In [None]:
# Get AA sequences of all isoforms
!gget seq --transcribe -iso -o transcript_iso_fasta.fa ENSTGUG00000006139

Note: If you use the isoform option on a transcript, it will simply fetch the sequence of the specified transcript and notify the user that the isoform option only applies to genes:

In [None]:
!gget seq --transcribe -iso ENSTGUT00000027003.1

# BLAST the gene **nucleotide** sequence:

In [None]:
# Show manual
!gget blast

Note: `blast` also accepts a sequence passed as string instead of a .fa file.

In [None]:
!gget blast gene_fasta.fa

# BLAST the **amino acid** sequence of the canonical transcript:

In [None]:
!gget blast transcript_fasta.fa

# Use MUSCLE algorithm to align the **nucleotide** sequences of all transcripts:

In [None]:
# Show manual
!gget muscle

In [None]:
!gget muscle gene_iso_fasta.fa

# Use MUSCLE algorithm to align the **amino acid** sequences of all transcripts:

In [None]:
!gget muscle transcript_iso_fasta.fa