BREAST CANCER PREDICTION

Table of Contents
Libraries
Data Source
Project Workflow
Training
Test
Confusion Matrix
Conclusion

1. Libraries

numpy
pandas
matplotlib.pyplot
seaborn as sns
sklearn

2. Data source

This breast cancer databases was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg

O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via linear programming", SIAM News, Volume 23, Number 5, September 1990, pp 1 & 18.

It can be downloaded from UCI Machine Learning Repository

3. Project Workflow

The workflow for this project is listed below:

Pre-processing
Exploratory Data Analysis (EDA)
Data Visulization
Feature Seletion
Train-Test
Confusion Matrix
Conclusion

4. Training

We tried Four Machine Learning Models and evaluated their performance with our Dataset.

4.1. Training Results

The models are listed with their training accuracy below:

Decision Tree: Mean accuracy = 0.954974 & Std accuracy is 0.020103

Support Vector Machine: Mean accuracy = 0.971386 & Std accuracy = 0.013512

Gaussian Naive Bayes: Mean accuracy = 0.963223 & Std accuracy = 0.025463

KNN', K-Nearest Neighbors: Mean accuracy = 0.969345 & Std accuracy = 0.016428

5. Test

Training shows that the Support Vector Machine model has the highest training accuracy. So we further tested our test dataset with SVC and the Test Accuracy was 0.9714285714285714 (97.14%)

5.1. Random Testing

We provided a list of numbers as an input for nine features and it accurately predicted the class for breast cancer.

6. Confusion Matrix

Confusion Matrix from our Validation is:

7. Conclusion

We had Wisconsin Breast Cancer Database having 699 records in 11 columns for attributes.

Attributes from column indices 2 through 10 have been used to represent instances.

Each instance has one of 2 possible classes: benign or malignant.

These classes was included as attribute at column index 11

Class distribution was:

Benign: 458 (65.5%)

Malignant: 241 (34.5%)

After processing data and obtaining analysis from it, we split the dataset into train-test of 70%-30%. we trained four models. By far, SVM model had highest accuracy for training and test data.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
.gitattributes		.gitattributes
Breast Cancer Prediction.ipynb		Breast Cancer Prediction.ipynb
README.md		README.md
breast-cancer-wisconsin.names		breast-cancer-wisconsin.names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BREAST CANCER PREDICTION

1. Libraries

2. Data source

3. Project Workflow

4. Training

4.1. Training Results

5. Test

5.1. Random Testing

6. Confusion Matrix

7. Conclusion

About

Releases

Packages

Languages

kashifm777/BreastCancerPrediction

Folders and files

Latest commit

History

Repository files navigation

BREAST CANCER PREDICTION

1. Libraries

2. Data source

3. Project Workflow

4. Training

4.1. Training Results

5. Test

5.1. Random Testing

6. Confusion Matrix

7. Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages