This is an exploration of this Kaggle dataset: https://www.kaggle.com/loveall/cervical-cancer-risk-classification using supervised and unsupervised machine learning techniques including a random forest & gradient boosting classifers, SVM, and K-means clustering.
The dataset is included as kag_risk_factors_cervical_cancer.csv, as well as my Jupyter notebook containing the exploration of the dataset, and a final report with my findinds.
Cervical cancer is a malignant tumour starting in the cells of a woman’s cervix, and possibly spreading or metastasizing to other parts of her body. Although the number of cases of cervical cancer have been declining in recent years due to more advanced screening and early detection with the Pap test, 300,000 women worldwide die each year due to cervical cancer. My investigation of the Risk Factors of Cervical Cancer Dataset focuses on predicting whether a woman will result in having a biopsy due to cervical cancer.
My research and analysis of the dataset resulted in the Gradient Boosting classifier having the highest accuracy, precision, and F1 measure, compared to the other 3 models.