Cooper Lab Repository:

This repository contains scripts I wrote as a graduate research assistant while working for Dr. Elizabeth Cooper's Lab at UNC Charlotte. In these scripts I perform a genome annotation and complete all the steps to identify horizontal transfer of transposable elements (TEs) through the vhica R package.

https://elizcooperlab.com

Genome Annotation

wyeomyia.EDTA.slurm

Bash script for de novo TE annotation. Genome fasta file is required for this script.

VHICA

random-50.py

Python script to pick 50 random orthologues from the Single_Copy_Orthologue_Sequences output directory from orthofinder output.

macse4orthologues.sh

Bash script to align the sequences in each orthologue file. The 50 random files from the previous step are required for this script

concat.sh

Bash script to combine genome annotations in fasta format for all 7 mosquito species. The genome annotation files are required for this script.

cdhit-GA-0.8.sh

Bash script to cluster the TEs in the combined annotation files using CDHIT. The output from the previous step is required for this script.

find-2s-400NT-seqs.py

Python script to do initial filter on cluster output based on number of species and length of sequence. Produces a file with the clusters that meet the criteria and a file containing the corresponding sequences to the kept clusters. The clustered output from the previous step is required for this script

transdecoder.sh

Bash script to use transdecoder to identify candidate coding regions. The cluster output from the previous step is required for this script.

long-orf-files.py

Python script to filter clusters based on the transdecoder output and produce one file for each cluster that contains the corresponding sequences. The transdecoder output and previously filtered cluster file is required for this script.

macse4vhica.sh

Bash script to align the sequence for each cluster file. The output from the previous step is required for this script.

vhica.R

R script to determine presence of horizontal transfer of TEs. The outputs from step 2 and step 8 are required for this script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cooper Lab Repository:

Genome Annotation

VHICA

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
cdhit-GA-0.8.sh		cdhit-GA-0.8.sh
concat.sh		concat.sh
find-2s-400NT-seqs.py		find-2s-400NT-seqs.py
long-orf-files.py		long-orf-files.py
macse4orthologues.sh		macse4orthologues.sh
macse4vhica.sh		macse4vhica.sh
random-50.py		random-50.py
transdecoder.sh		transdecoder.sh
vhica.R		vhica.R
wyeomyia.EDTA.slurm		wyeomyia.EDTA.slurm

lydia-holley/Cooper-Lab

Folders and files

Latest commit

History

Repository files navigation

Cooper Lab Repository:

Genome Annotation

VHICA

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages