Allele Specific Expression
Pages 36
- QTL mapping pipeline
- Genotype Harmonizer
- Genotype IO
- ASE
- GADO Command line
- Downstreamer
- GeneNetwork Analysis
Analysis plans
Other
Clone this wiki locally
Allele Specific Expression (ASE) mapping using the read count field in a VCF file. Can either use the genotypes in the same VCF file to select the heterozygous samples for a variant or use a secondary genotype file with high quality genotype calls.
Getting started
Downloading the software
You can download the latest version of the software here: Latest version.
Make sure to download eqtl-mapping-pipeline-*.*.*-SNAPSHOT-dist.zip
or eqtl-mapping-pipeline-*.*.*-SNAPSHOT-dist.tar.gz
Running an ASE mapping
You can run an ASE analysis using this command:
java -jar eqtl-mapping-pipeline.jar --mode ase
and the options described below.
In the case of an out of memory exception you need to allocate more memory to java like this:
java -Xmx10g -Xms10g -jar eqtl-mapping-pipeline.jar --mode ase
If you run into problems please have look at this page: Basic introduction to commandline
Using VCF files
Before VCF files can be used they need to be compressed using bgzip and indexed with a tabix. This prevents having to read all data into memory yet still allows quick access.
Installing tabix and bgzip
wget http://sourceforge.net/projects/samtools/files/tabix/tabix-0.2.6.tar.bz2
tar -jxvf tabix-0.2.6.tar.bz2
cd tabix-0.2.6/
make
Preparing a VCF file
bgzip -c example.vcf > example.vcf.gz
tabix -p vcf example.vcf.gz
Create VCF files from pileup file
Pileup files can be created using mpileup, http://samtools.sourceforge.net/mpileup.shtml. Make sure to make note of the recommendation in the mpileup manual to get precise read depth:
Use -BQ0 -d10000000 -f ref.fa
if the purpose is to get the precise depth of coverage rather than call SNPs. Under this setting, mpileup will count low-quality bases, process all reads (by default the depth is capped at 8000), and skip the time-demanding BAQ calculation.
To use these pileup files they need to be converted to VCF. This can be done using the command below, help will be provided when running the command.
java -jar eqtl-mapping-pipeline.jar --mode pileupToVcf
ASE mapping options
###Basic commands
Short | Long | Description |
---|---|---|
-i | --input | Paths to one or more vcf.gz files or folders with each per chr 1 vcf.gz file. Tabix file must be present |
-l | --inputList | One or more text files with on each line a path to a vcf.gz file or a folder with per chr 1 vcf.gz file. Tabix file must be present. Can be combined with --input |
-g | --genotypes | The path to the reference genotypes. These genotypes will be used to determine if a sample is hetrozygous |
-G | --genotypesType | The input data type. If not defined will attempt to automatically select the first matching dataset on the specified path (see inputeType options below) |
-sc | --sampleCoupling | Tab separated file with column 1 sample ID and column 2 sample ID in reference genotype data, no header. If only contains a mapping for a subset of samples the original identifier in the reference is used |
-o | --output | Path to output folder |
###Fine tunning ASE
Short | Long | Description |
---|---|---|
-m | --maxReads | Maximum number of total reads |
-p | --minAllelePercentage | Min percentage of read per allele. Default 0. Can be used in combination --minAlleleReads |
-r | --minReads | Min number total reads for genotype |
-s | --minNumSamples | Min number of samples per ASE effect |
###Other options
Short | Long | Description |
---|---|---|
-f | --gtf | Optional .gtf file for annotations of ASE effects. Must be grouped by chromosome |
-t | --threads | Maximum number of threads to start for parallel reading of multiple VCF files. Defaults to number of cores. |
-d | --debug | Activate debug mode. This will result in a more verbose log file |
####--genotypesType options
Base path refers to --genotypes
- PED_MAP
- Expects Plink PED file at:
${base path}.ped
- Expects Plink MAP file at:
${base path}.map
- Note: it is strongly recommend to use PLINK_BED due to the large memory usage of the current implementation. When dealing with large datasets it is recommend to first use plink to convert the data binary plink using this command
plink --file pedmapData --make-bed --out binaryData
- VCF
- Expects VCF file at:
${base path}.vcf.gz
- Must be compressed using bgzip. (see chapter: Preparing a VCF file)
- Expects tabix file at:
${base path}.vcf.gz.tbi
(see chapter: Preparing a VCF file) - PLINK_BED
- Expects Plink BED file at:
${base path}.bed
- Expects Plink BIM file at:
${base path}.bim
- Expects Plink FAM file at:
${base path}.fam
- Must be in SNP major mode. This is the default of Plink 1.07.
- VCFFOLDER
- Matches all vcf.gz files in the folder specified with the bash path.
- SHAPEIT2
- Expects haps file at:
${base path}.haps
- Expects sample file at:
${base path}.sample
- See also chapter on 'Using SHAPEIT2 Output and Oxford Gen format'
- GEN
- Expects gen file at:
${base path}.gen
or without the extentions${base path}
- Expects sample file at:
${base path}.sample
- See also chapter on 'Using SHAPEIT2 Output and Oxford Gen format'
- TRITYPER
- Expects folder with TriTyper data. Can handle dosage matrix