Skip to content

jferna10/EnvPaper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Code Related to Functional and Structural Segregation of Overlapping Helices in HIV-1

This github repo contains code related to the submitted paper "Functional and Structural Segregation of Overlapping Helices in HIV-1". The files deposited here are intended to make the analyses - as they were done at the time of writing the paper - transparent. However due to things like files being renamed (e.g. GEO fastq names from GSE179046 are slightly different than original names), the compute environment, etc you probably can't just run this code and get the figures. But it should be pretty close, and you shouldn't hesitate to contact us if you notice any issues.

These scipts use:

bowtie2
Rstudio
java
standard bash commands

Here is the overview of what you will find in this repository:

Stats

Basic QC metrics for MiSeq run for the Env Deep Mutational Scanning Data. These are run-wide stats like demultipliexing stats.

Reports

Basic QC metrics for each of the fastqs for the Env Deep Mutational Scanning Data.

seq

Reference genome sequence and associated bowtie2 index for mapping. Note this virus is the HIV-1 NL4-3 sequence with rev-in-nef.

process_fastqs

Code used to generate codon and amino acid counts.

First fastq's are aligned to the reference with bowtie2 with the following additional flags: --fast-local --rdg 100,3 --rfg 100,3 . These flags allow the randomized codon to align to the ref sequence and not insert indels.

The bulk of the work is done by countDMS, a simple java program which attempts to count codons from each SAM generated by bowtie2. If there is an indel in the alignment the read is not counted.

The output of countDMS are codon and amino acid count files in tab delimited format.

Note that the BAMs provided are slightly different than the tab files as the BAMs are the result of a more recent remapping than the figure in the paper. However the differences are slight (slightly better mapping with the more recent mapping, maybe due to an upgraded version of bowtie2).

If you wish to perform a similar analysis and are worried about alignment artifacts or wish to avoid using the custom countDMS program I suggest using seqkit and the associated amplicon feature to extract the DMS region and parse the resulting sequence.

aa_tab

Amino acid counts generated from countDMS with simple number relabeling to make the coordinates readable.

codon_tab

Codon counts generated from countDMS. These may be useful if you care about specific codons.

Fernandes_GEO_seq_template.xlsx

This file should provide metadata mapping naming changes between GEO and filenames in this repo.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages