GitHub

Code Related to Functional and Structural Segregation of Overlapping Helices in HIV-1

This github repo contains code related to the submitted paper "Functional and Structural Segregation of Overlapping Helices in HIV-1". The files deposited here are intended to make the analyses - as they were done at the time of writing the paper - transparent. However due to things like files being renamed (e.g. GEO fastq names from GSE179046 are slightly different than original names), the compute environment, etc you probably can't just run this code and get the figures. But it should be pretty close, and you shouldn't hesitate to contact us if you notice any issues.

These scipts use:

bowtie2
Rstudio
java
standard bash commands

Here is the overview of what you will find in this repository:

Stats

Basic QC metrics for MiSeq run for the Env Deep Mutational Scanning Data. These are run-wide stats like demultipliexing stats.

Reports

Basic QC metrics for each of the fastqs for the Env Deep Mutational Scanning Data.

seq

Reference genome sequence and associated bowtie2 index for mapping. Note this virus is the HIV-1 NL4-3 sequence with rev-in-nef.

process_fastqs

Code used to generate codon and amino acid counts.

First fastq's are aligned to the reference with bowtie2 with the following additional flags: --fast-local --rdg 100,3 --rfg 100,3 . These flags allow the randomized codon to align to the ref sequence and not insert indels.

The bulk of the work is done by countDMS, a simple java program which attempts to count codons from each SAM generated by bowtie2. If there is an indel in the alignment the read is not counted.

The output of countDMS are codon and amino acid count files in tab delimited format.

Note that the BAMs provided are slightly different than the tab files as the BAMs are the result of a more recent remapping than the figure in the paper. However the differences are slight (slightly better mapping with the more recent mapping, maybe due to an upgraded version of bowtie2).

If you wish to perform a similar analysis and are worried about alignment artifacts or wish to avoid using the custom countDMS program I suggest using seqkit and the associated amplicon feature to extract the DMS region and parse the resulting sequence.

aa_tab

Amino acid counts generated from countDMS with simple number relabeling to make the coordinates readable.

codon_tab

Codon counts generated from countDMS. These may be useful if you care about specific codons.

Fernandes_GEO_seq_template.xlsx

This file should provide metadata mapping naming changes between GEO and filenames in this repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Related to Functional and Structural Segregation of Overlapping Helices in HIV-1

Stats

Reports

seq

process_fastqs

aa_tab

codon_tab

Fernandes_GEO_seq_template.xlsx

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Reports		Reports
Stats		Stats
aa_tab		aa_tab
codon_tab		codon_tab
process_fastqs		process_fastqs
seq		seq
.DS_Store		.DS_Store
Fernandes_GEO_seq_template.xlsx		Fernandes_GEO_seq_template.xlsx
README.md		README.md

jferna10/EnvPaper

Folders and files

Latest commit

History

Repository files navigation

Code Related to Functional and Structural Segregation of Overlapping Helices in HIV-1

Stats

Reports

seq

process_fastqs

aa_tab

codon_tab

Fernandes_GEO_seq_template.xlsx

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages