Skip to content

kbvstmd/XCNV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1. Introduction

X-CNV is a tool to predict CNV pathogenicity using an XGBoost classifier.

X-CNV calculates a meta-voting prediction (MVP) score to quantitatively evaluate disease-causing probability. It consists of the most comprehensive CNV data and annotations by integrating various publicly available genetic variant repositories. The features covering the genomics, genome region, variation types, and population genetics properties are taken into account to boost the prediction power. More importantly, a meta-voting prediction (MVP) score is proposed to measure the CNV pathogenic effect quantitatively, which can be used to determine the CNV pathogenicity.The reference genome version used by X-CNV is GRCh37/hg19.
Cite:Zhang L, Shi J, Ouyang J, Zhang R, Tao Y, Yuan D, Lv C, Wang R, Ning B, Roberts R, Tong W, Liu Z, Shi T. X-CNV: genome-wide prediction of the pathogenicity of copy number variations. Genome Med. 2021 Aug 18;13(1):132. doi: 10.1186/s13073-021-00945-4. PMID: 34407882.

2. Requirements

The local version X-CNV requires two R packages, data.table and xgboost(your may get issues when you install the xgboost packages you can see the #1 for help), and Bedtools v2.26.0. If the R packages and bedtools cannot be installed automatically, users can install them manually. The executable file of bedtools should be placed in ./tools/. ## Memory limit >=8G

3. Installation

git clone https://github.com/kbvstmd/XCNV.git
cd XCNV
sh Install.sh

4. Usage and example

Usage:

./bin/XCNV prefix.bed

The output filename: prefix.output.csv

Example:

./bin/XCNV ./example_data/1.bed

The results can be seen in the 1.output.csv

5. Input & output

Input file format (The columns are separated by TAB key and the header is not required):

2 2222999 3000222 gain

Column 1: The chromosome (no “chr”)
Column 2: Start
Column 3: End
Column 4: CNV type (gain or loss)

The output file has 35 columns and is provided as Comma-Separated Values (CSV) format.

Columns Description Category
ChrChromosomeInput
StartStart positionInput
EndEnd positionInput
CNV typeCNV type (gain or loss)Input
FATHMM scoreFATHMM Dnase score for the CNV regionCoding
LR scoreLR score for the CNV regionCoding
LRT scoreLRT Dnase score for the CNV regionCoding
MutationAssessor scoreMutationAssessor score for the CNV regionCoding
MutationTaster scoreMutationTaster score for the CNV regionCoding
Polyphen2-HDIV scorePolyphen2_HDIV score for the CNV regionCoding
Polyphen2-HVAR scorePolyphen2_HVAR score for the CNV regionCoding
RadialSVM scoreRadialSVM Dnase score for the CNV regionCoding
SIFT scoreSIFT Dnase score for the CNV regionCoding
VEST3 scoreVEST3 score for the CNV regionCoding
pLIProbability of being loss-of-function intolerantCoding
EpiscoreA computational method to predict haploinsufficiency leveraging epigenomic data from a broad range of tissue and cell types by machine learning methods.Coding
GHISAn integrative approach to predicting haploinsufficient genesCoding
CADD scoreAverage CADD score for the CNV regionGenome-wide
GERPGERP++_RS Dnase score for the CNV regionGenome-wide
phyloP100wayphyloP100way_vertebrate score for the CNV regionGenome-wide
phyloP46wayphyloP46way_placental score for the CNV regionGenome-wide
SiPhy29waySiPhy_29way_logOdds score for the CNV regionGenome-wide
cdts-1stThe coverage ratio between CDTS percentile < 1% and the CNV regionNoncoding
cdts-5thThe coverage ratio between CDTS percentile < 5% and the CNV regionNoncoding
pELSThe coverage of proximal enhancer-like sequence (pELS) within the CNV regionNoncoding
CTCF-boundThe coverage of CTCF-bound sequence within the CNV regionNoncoding
PLSThe coverage of promoter-like sequence within the CNV regionNoncoding
dELSThe coverage of distal enhancer-like sequence within the CNV regionNoncoding
CTCF-onlyThe coverage of CTCF-only sequence within the CNV regionNoncoding
DNase-H3K4me3The coverage of DNase-H3K4me3 sequence within the CNV regionNoncoding
gain-PAFPopulation allele frequency for duplicationUniversal
LengthCNV lengthUniversal
loss-PAFPopulation allele frequency for deletionUniversal
Type.1CNV type (gain or loss code as 1 or 0)Universal
MVP_scoreThe MVP scoreOutput

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published