Skip to content

Exploration of the risks of cervical cancer dataset, using supervised & unsupervised machine learning techniques to predict cervical cancer cases resulting in biopsies.

Notifications You must be signed in to change notification settings

lauramann/cervicalCancerAnalysis

Repository files navigation

Predicting Cervical Cancer Cases Resulting in Biopsies Using Ensemble Methods

This is an exploration of this Kaggle dataset: https://www.kaggle.com/loveall/cervical-cancer-risk-classification using supervised and unsupervised machine learning techniques including a random forest & gradient boosting classifers, SVM, and K-means clustering.

Contents

The dataset is included as kag_risk_factors_cervical_cancer.csv, as well as my Jupyter notebook containing the exploration of the dataset, and a final report with my findinds.

Introduction

Cervical cancer is a malignant tumour starting in the cells of a woman’s cervix, and possibly spreading or metastasizing to other parts of her body. Although the number of cases of cervical cancer have been declining in recent years due to more advanced screening and early detection with the Pap test, 300,000 women worldwide die each year due to cervical cancer. My investigation of the Risk Factors of Cervical Cancer Dataset focuses on predicting whether a woman will result in having a biopsy due to cervical cancer.

Results

My research and analysis of the dataset resulted in the Gradient Boosting classifier having the highest accuracy, precision, and F1 measure, compared to the other 3 models.

About

Exploration of the risks of cervical cancer dataset, using supervised & unsupervised machine learning techniques to predict cervical cancer cases resulting in biopsies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published