-
Notifications
You must be signed in to change notification settings - Fork 615
Reference Genome
#Reference Genome Obtain a reference genome from iGenomes. In this example analysis we will use the human hg19/NCBI build 37 version of the genome. Furthermore, we are actually going to perform the analysis using only a single chromosome (chr22) and the ERCC spike-in to make it run faster...
Create the necessary working directory
cd $RNA_HOME
mkdir refs
mkdir refs/hg19
mkdir refs/hg19/fasta
mkdir refs/hg19/fasta/chr22_ERCC92/
cd refs/hg19/fasta/chr22_ERCC92/
Make a copy of chr22 fasta in your working directory. The complete data from which these files were obtained can be found at: http://cufflinks.cbcb.umd.edu/igenomes.html. You could use wget to download the Homo_sapiens_Ensembl_GRCh37.tar.gz file (under Homo sapiens -> Ensembl -> GRCh37), then unzip/untar.
This has been done for you and that data placed on an ftp server. Download it now.
wget https://xfer.genome.wustl.edu/gxfer1/project/gms/testdata/bams/brain_vs_uhr_w_ercc/downsampled_5pc_chr22/chr22_ERCC92.fa.gz
gunzip chr22_ERCC92.fa.gz
View the first 10 lines of this file
head chr22_ERCC92.fa
How many lines and characters are in this file?
wc chr22_ERCC92.fa
To get all chromosomes instead of just chr22 you could do the following:
cd $RNA_HOME
mkdir -p refs/hg19/fasta/
cd refs/hg19/fasta/
cp /media/cbwdata/CourseData/RNA_data/iGenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Chromosomes/* .
cat *.fa > hg19.fa
Note: Instead of the above, you might consider getting reference genomes and associated annotations from UCSC
e.g., ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/
Wherever you get them from, the names of your reference sequences (chromosomes) must those matched in your annotation gtf files.
NOTICE: This resource has been moved to rnabio.org. The version here will be maintained for legacy use only. All future development and maintenance will occur only at rnabio.org. Please proceed to rnabio.org for the current version of this course.
Table of Contents
Module 0: Authors | Citation | Syntax | Intro to AWS | Log into AWS | Unix | Environment | Resources
Module 1: Installation | Reference Genomes | Annotations | Indexing | Data | Data QC
Module 2: Adapter Trim | Alignment | IGV | Alignment Visualization | Alignment QC
Module 3: Expression | Differential Expression | DE Visualization
Module 4: Alignment Free - Kallisto
Module 5: Ref Guided | De novo | Merging | Differential Splicing | Splicing Visualization
Module 6: Trinity
Module 7: Trinotate
Appendix: Saving Results | Abbreviations | Lectures | Practical Exercise Solutions | Integrated Assignment | Proposed Improvements | AWS Setup