Skip to content

Binary Classification project carried out in Applied Data Science Bootcamp at Kodluyoruz.org

Notifications You must be signed in to change notification settings

mertyldrr/GeneticVariantClassification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Genetic Variant Classification

This project is a Binary Classification project carried out in Applied Data Science Bootcamp on Kodluyoruz, which is realized under the trainings of Çağlar Subaşı, using the Genetic Variant Classification dataset in Kaggle.

Dataset --> (https://www.kaggle.com/kevinarvai/clinvar-conflicting)

-- Project Status: [Completed]

Project Intro/Objective

The purpose of this project is the objective is to predict whether a Clinic Variant will have conflicting classifications. This is presented here as a binary classification problem, where each record in the dataset is a genetic variant.

Methods Used

  • Inferential Statistics
  • Machine Learning
  • Data Visualization
  • Predictive Modeling

Technologies

  • Python
  • Pandas, Numpy
  • Matplotlib,Seaborn
  • Missingno
  • Sklearn

Modelling Algorithms

  • Logistic Regression
  • XGBoost Classifier
  • KNeighbors Classifier
  • Decision Tree Classifier
  • LigthGBM Classifier
  • Gradient Boosting Classifier
  • Hist Gradient Boosting Classifier

Project Description

Clinic Variant is a public resource containing annotations about human genetic variants. These variants are (usually manually) classified by clinical laboratories on a categorical spectrum ranging from benign, likely benign, uncertain significance, likely pathogenic, and pathogenic. Variants that have conflicting classifications (from laboratory to laboratory) can cause confusion when clinicians or researchers try to interpret whether the variant has an impact on the disease of a given patient.

Conflicting classifications are when two of any of the following three categories are present for one variant, two submissions of one category are not considered conflicting.

  1. Likely Benign or Benign
  2. VUS
  3. Likely Pathogenic or Pathogenic

Conflicting classification has been assigned to the CLASS column. It is a binary representation of whether or not a variant has conflicting classifications, where 0 represents consistent classifications and 1 represents conflicting classifications.

Needs of this project

  • data exploration/descriptive statistics
  • data processing/cleaning
  • statistical modeling
  • writeup/reporting

Getting Started

  1. Clone this repo (for help see this tutorial).
  2. Raw Data is being kept here also within this repo.

Members:

Name
Sümeyye ÖZTÜRK
Mehmet Haliloğlu
Mert Yıldırır

About

Binary Classification project carried out in Applied Data Science Bootcamp at Kodluyoruz.org

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published