Skip to content

This study was performed to establish a novel method named LDFF to accurately quantify cell-free fetal DNA fraction (FF) in maternal plasma by utilizing linkage disequilibrium (LD) information from maternal and fetal haplotypes.

jujia-BGI/LDFF_V1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LDFF_V1

This study was performed to establish a novel method named LDFF to accurately quantify cell-free fetal DNA fraction (FF) in maternal plasma by utilizing linkage disequilibrium (LD) information from maternal and fetal haplotypes. The workflow consists of four processes, the first of regional LD-ratios were calculated on the 521 genomic regions, as well as the genome coverage, coverage of the reads with a MQ score>0 and PCR duplication rate. Then, several multivariate regression models were generated with different MAF filtering cutoff using all training samples. Thrid, the outliers in the training samples in different MAF filtering models were identified and removed from the corresponding model to avoid over-fitting. Multivariate regression models were rebuilt using the remaining samples. Finally, MAF filtering cutoff was selected as the model has best accuracy.

Dependence:

The following steps are to work in the Perl 5 or R environment. And samtools and STITCH (version v1.5.3.0008) should be installed.

In addition, you need to prepare the genome reference file "hg19.fasta" and the reference panels. The reference panels contain haplotype files and legend files for each chromosome from 1000GP Phase3. The reference panel format used by STITCH is the same as used by QUILT. Please see the QUILT web page here for specific details of the format.

Step1:Regional LD-ratios and confounders calculation

The "bamlist" file records the path of your input bam files with only 1 column as "example/bamlist". The "bamstats" file records the path of your input bam stats files which were generated by the command "samtools stats input.bam > input.bam.stats" with only 1 column as "example/bamstats". The "samplelist" file contains the sample names which record in the "SM" tag in the bam header of your input bam.

samtools mpileup -f database_hg19/hg19.fasta -b bamlist -r chr1:1-5000000 -v -o result/chr1.1.5000000/mpileup.chr1.1.5000000.vcf.gz -l database_hg19/1kg.easaf0.2/chr1.pos.txt

Rscript bin/STITCH.full.step2.R result/chr1.1.5000000/ bamlist database_hg19/1kg.easaf0.01/chr1.pos.txt chr1 1 5000000 database_hg19/1000GP_Phase3/1000GP_Phase3_chr1.legend.gz database_hg19/1000GP_Phase3/1000GP_Phase3_chr1.hap.gz database_hg19/1000GP_Phase3/1000GP_Phase3.sample (For more information about how STITCH works, please refer to the article DOI: 10.1038/ng.3594)

perl bin/regional_LDratio.pl result/chr1.1.5000000/mpileup.chr1.1.5000000.vcf.gz result/chr1.1.5000000/stitch.chr1.1.5000000.vcf.gz samplelist result/chr1.1.5000000/regional_LDratios.txt database_hg19/1000GP_Phase3/1000GP_Phase3_chr1.legend.gz

perl bin/bamstat.combine.pl bamstats > confounders

Step2:Build multivariate regression models when MAF filtering cutoff=0.2

Pastes the regional LD-ratios of maf 0.02 in regional_LDratios.txt and the genome coverage, coverage of the reads with a MQ score>0 and PCR duplication rate in confounders into train.input.maf0.2

Rscript bin/Linear_regression_chrYbasedff.R train.input.maf0.2 test.input.maf0.2

Step3:Remove outliers in the training set

Rscript bin/run_Reg_Diag.R Linear-model.RData |grep '*' > outliers_line.log

perl bin/remove_outliers.pl train.input.maf0.2 outliers_line.log > train.input.maf0.2.removeoutliers

Step4:multivariate regression models retraining

Rscript bin/Linear_regression_chrYbasedff.R train.input.maf0.2.removeoutliers test.input.maf0.2> test.input.log

About

This study was performed to establish a novel method named LDFF to accurately quantify cell-free fetal DNA fraction (FF) in maternal plasma by utilizing linkage disequilibrium (LD) information from maternal and fetal haplotypes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages