Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
bird-genomes
============

Non-coding annotations, analyses scripts and resulting manuscript for 48 bird genomes.

######################################################################
#0. Directories and files:

README      -- this file
data/       -- data directory, primarily genome annotations
  conserved-merged-annotations/
            --Is populated with GFFs generated by gffs2heatmaps.pl. A subset of the "merged-annotations" that are conserved in >10% of the bird genomes. 
  mirbase/  -- miRBase derived annotations in GFF format
  R/        -- data matrices used to generate graphics using R
  RNA-seq/  -- annotations that have been validated using RNA-seq data
  rfam/     -- Rfam derived annotations in GFF format
  trnascan/ -- tRNAscan-SE derived annotations in GFF format
  merged-annotations/
            -- merged annotations from miRBase, Rfam, tRNAscan, ... in GFF format. Generated by the "compete_clans.pl" script
  stadler-annotations/
            -- annotations from Peter Stadler group's set of RNA tools
	    -- fix mis-named chromosomes
paper/      -- files associated with the manuscript (figures/, *.tex, *.bib, ...)
scripts/    -- scripts used for analysing the datasets

######################################################################
#1. annotation tools:

curl ftp://ftp.sanger.ac.uk/pub/databases/Rfam/11.0/database_files/rfam.txt.gz | gunzip > /tmp/rfam.txt &&  cat /tmp/rfam.txt | cut -f 3,4,19 | sort > data/rfam2type.txt

######################################################################
#2. analyses tools:

cd bird-genomes

#Merge all the different annotations, resolve the major overlaps:
# Overlap rules are:
#  1. specific methods (e.g. tRNAscan, miRBase, ...) are selected over Rfam predictions
#  2. High-scoring predictions are selected over low-scoring predictions
#paste all the filenames together (check they coincide if adding new annotations) and run compete_clans.pl:
ls data/rfam/*gff > /tmp/blah1 && 
ls data/trnascan/*gff > /tmp/blah2 && 
ls data/mirbase/*gff > /tmp/blah3 && 
ls data/stadler-annotations/*gff > /tmp/blah4 && 

#CHECK: paste /tmp/blah1 /tmp/blah2 /tmp/blah3 /tmp/blah4 
paste /tmp/blah1 /tmp/blah2 /tmp/blah3 /tmp/blah4 | perl -lane 'if(/mirbase\/(\S+).gff/){print "scripts/compete_clans.pl -g $F[0] -g $F[1] -g $F[2] -g $F[3] -cl data/clan_info.txt -od data/overlaps/ > data/merged-annotations/$1.gff"}' | sh

#generate the count matrices (in data/R) and run the R-code that generates the figures: 
./scripts/gffs2heatmaps.pl


About

Non-coding annotations, analyses scripts and resulting manuscript for 48 bird genomes

Resources

Releases

No releases published

Packages

No packages published