CamCNV

CamCNV pipeline for calling rare CNVs from Illumina data. For further details see the paper in Genetic Epidemiology - "Detecting rare copy number variants from Illumina genotyping arrays with the CamCNV pipeline: Segmentation of z-scores improves detection and reliability" (https://pubmed.ncbi.nlm.nih.gov/33020983/) and the analysis of a large breast cancer dataset - "Rare germline copy number variants (CNVs) and breast cancer risk" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8766486/)

The basic steps of the pipeline as outlined in the example code are:

Load Log R Ratio intensities (LRR) and B Allele Frequencies (BAF), sorted by chromosome and position, from Illumina gentotyping into a NetCDF data store
Run a principal component adjustment on the LRR
Calculate the mean and standard deviation of the LRR for each probe across all samples and convert the LRR into z-scores
For each sample segment the z-scores using DNACopy (https://bioconductor.org/packages/release/bioc/html/DNAcopy.html) and identify potential CNVs from the mean z-scores of the segments
Generate additional QC scores for each CNVs

The pipeline uses a NetCDF data store and the R ncdf4 library (https://cran.r-project.org/web/packages/ncdf4/index.html) but could be easily adapted to use simple R data objects. For loading large datasets into NetCDF files I have found the Perl NetCDF libraries (https://metacpan.org/source/DHUNT/PDL-NetCDF-4.20/netcdf.pd) faster. Note the bigPCA package (http://cran.nexr.com/web/packages/bigpca/index.html) has not been updated and requires an older version of R e.g. R-3.4.2 - other R PCA packages could be used.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitattributes		.gitattributes
README.md		README.md
baf_data_table.txt.gz		baf_data_table.txt.gz
called_cnvs_v1.csv		called_cnvs_v1.csv
called_cnvs_v2.csv		called_cnvs_v2.csv
camcnv_example_code.R		camcnv_example_code.R
camcnv_onco.nc		camcnv_onco.nc
cnv_probe_exclusions.csv		cnv_probe_exclusions.csv
cnv_probes_ordered_by_pos.csv		cnv_probes_ordered_by_pos.csv
dlrs_after_pca.pdf		dlrs_after_pca.pdf
dlrs_after_pca.txt		dlrs_after_pca.txt
dlrs_before_pca.pdf		dlrs_before_pca.pdf
dlrs_before_pca.txt		dlrs_before_pca.txt
dnacopy_segments.csv		dnacopy_segments.csv
genotype_data_table.txt.gz		genotype_data_table.txt.gz
lrr_data_table.txt.gz		lrr_data_table.txt.gz
means_sds_onco.csv		means_sds_onco.csv
pc_scree.pdf		pc_scree.pdf
sample_name_lookup.csv		sample_name_lookup.csv
segment_counts.csv		segment_counts.csv
snps_to_exclude_before_pca.csv		snps_to_exclude_before_pca.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CamCNV

About

Releases

Packages

Languages

jgd29/CamCNV

Folders and files

Latest commit

History

Repository files navigation

CamCNV

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages