# Quantifying bulk RNA-seq with kallisto 

This notebook describes how to perform a bulk RNA-seq quantification with kallisto, which is describe in the paper
* Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).



In [20]:
!date

Mon Oct 21 21:51:15 UTC 2019


### Download and install kallisto

In [21]:
!wget https://github.com/pachterlab/kallisto/releases/download/v0.46.0/kallisto_linux-v0.46.0.tar.gz
!tar -xf kallisto_linux-v0.46.0.tar.gz
!cp kallisto/kallisto /usr/local/bin/

--2019-10-21 21:51:16--  https://github.com/pachterlab/kallisto/releases/download/v0.46.0/kallisto_linux-v0.46.0.tar.gz
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/26562905/8cbbf280-8ca9-11e9-8c32-bec32f378e41?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20191021%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20191021T215116Z&X-Amz-Expires=300&X-Amz-Signature=5b06d3d49b0d1fc4138e222b417e4ef6be4f993f9b37b372383ee7a4f28da202&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dkallisto_linux-v0.46.0.tar.gz&response-content-type=application%2Foctet-stream [following]
--2019-10-21 21:51:16--  https://github-production-release-asset-2e65be.s3.amazonaws.com/26562905/8cbbf280-8ca9-11e9-8c32-bec32f378e41?X-Amz-Algorithm=AWS4-HMAC-SHA25

In [22]:
### test
!kallisto
!ls ./kallisto/test/

kallisto 0.46.0

Usage: kallisto <CMD> [arguments] ..

Where <CMD> can be one of:

    index         Builds a kallisto index 
    quant         Runs the quantification algorithm 
    bus           Generate BUS files for single-cell data 
    pseudo        Runs the pseudoalignment step 
    merge         Merges several batch runs 
    h5dump        Converts HDF5-formatted results to plaintext
    inspect       Inspects and gives information about an index
    version       Prints version information
    cite          Prints citation information

Running kallisto <CMD> without arguments prints usage information for <CMD>

chrom.txt  reads_1.fastq.gz  transcripts.fasta.gz
output	   reads_2.fastq.gz  transcripts.gtf.gz
README.md  Snakefile	     transcripts.idx


### Build an index

In [23]:
!kallisto index -i kallisto/test/transcripts.idx kallisto/test/transcripts.fasta.gz





[build] loading fasta file kallisto/test/transcripts.fasta.gz
[build] k-mer length: 31
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done 
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 27 contigs and contains 22118 k-mers 



### Run kallisto


In [24]:
!kallisto quant -t 2 -i kallisto/test/transcripts.idx -o kallisto/test/output -b 100 kallisto/test/reads_1.fastq.gz kallisto/test/reads_2.fastq.gz




[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 14
[index] number of k-mers: 22,118
[index] number of equivalence classes: 20
[quant] running in paired-end mode
[quant] will process pair 1: kallisto/test/reads_1.fastq.gz
                             kallisto/test/reads_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 10,000 reads, 9,413 reads pseudoaligned
[quant] estimated average fragment length: 178.02
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds
[bstrp] number of EM bootstraps complete: 1[bstrp] number of EM bootstraps complete: 2[bstrp] number of EM bootstraps complete: 3[bstrp] number of EM bootstraps complete: 4[bstrp] number of EM bootstraps complete: 5[bstrp] number of EM bootstraps complete: 6[bstrp] number of EM bootstraps complete: 7[bstrp] number of EM bootstraps complete: 8[bstrp] number of EM

In [0]:
### Examine output

In [26]:
!head kallisto/test/output/abundance.tsv
!head kallisto/test/output/run_info.json

target_id	length	eff_length	est_counts	tpm
ENST00000513300.5	1924	1746.98	102.328	11129.2
ENST00000282507.7	2355	2177.98	1592.02	138884
ENST00000504685.5	1476	1298.98	68.6528	10041.8
ENST00000243108.4	1733	1555.98	343.499	41944.9
ENST00000303450.4	1516	1338.98	664	94221.8
ENST00000243082.4	2039	1861.98	55	5612.36
ENST00000303406.4	1524	1346.98	304.189	42908.2
ENST00000303460.4	1936	1758.98	47	5076.85
ENST00000243056.4	2423	2245.98	42	3553.05
{
	"n_targets": 14,
	"n_bootstraps": 100,
	"n_processed": 10000,
	"n_pseudoaligned": 9413,
	"n_unique": 7174,
	"p_pseudoaligned": 94.1,
	"p_unique": 71.7,
	"kallisto_version": "0.46.0",
	"index_version": 10,


In [27]:
!date

Mon Oct 21 21:51:27 UTC 2019
