Supporting material for book chapter 'Identification of parent-of-origin-dependent QTLs using bulk-segregant sequencing (Bulk-Seq)'
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
GenomeSNPmask.py
LICENSE
README.md
cleanCounts.R
mapping.sh
pool.R
snpFile.R

README.md

BulkSeq

This set of scripts is part of the chapter 'Identification of parent-of-origin-dependent QTLs using bulk-segregant sequencing (Bulk-Seq)', from the book Plant Chromatin Dynamics (Springer 2018)

GenomeSNPmask.py: Remove or replace known SNP positions from a genome sequence file (fasta)

mapping.sh: minimal set of commands to filter and map reads from a fastq file, call SNPs and output a out.vcf file with allele frequencies

snpFile.R: retrieve publicly available snp data for the Cvi-0 and Ler-1 accessions of Arabidopsis thaliana, merge and output a reformatted snp matrix (snpm.txt)

cleanCounts.R: merge information from the snp matrix (snpm.txt) with the measured allele frequencies (out.vcf), filters and outputs a counts.csv file with allele counts

pool.R: combine allele frequencies from two samples (obtained with cleanCounts.R) and calculate relative frequencies along chromosomes

Requires:

FastQC 0.11.3; cutadapt 1.8.3; Samtools 1.2 (using htslib 1.2.1); Bowtie 2 2.2.9; R 3.3.1; scales_0.4.0; ggplot2 2.1.0; zoo 1.7-13; Python 3.4.0; Bio 1.65;


Example fastq datasets that can be used in this analysis are available in the ArrayExpress database (www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-5196:

WT_pool_1 (1.56GB)

mea_pool_1 (2GB)