Skip to content

Latest commit

 

History

History
94 lines (75 loc) · 2.89 KB

File metadata and controls

94 lines (75 loc) · 2.89 KB

Breast Cancer Wisconsin (Diagnostic) Prediction

Predict whether the cancer is benign or malignant

Status npm

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].

This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/

Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

In this repository i will trained lots of Machine learning algorithm from scratch to find which will be the best Algorithm for this dataset.I did bunch of research for analysing this dataset in my main file that is ipython notebook you will see lots of analysis i did using seaborn library in python. seaborn is really a best python library for data visualization.

Attribute Information:

1) ID number 2) Diagnosis (M = malignant, B = benign) 3-32)

How to use

  1. Clone the repo
git clone https://github.com/suvhradipghosh07/Breast-Cancer-prediction-using-Machine-Learning-various-Algorithm.git
  1. cd to the repo
cd Breast-Cancer-prediction-using-Machine-Learning-various-Algorithm
  1. Start Main file
python3 _main_.py 

Have fun!

Random Forest Algorithm

#define the algorithm class into the algo_one variable
algo_one=RandomForestClassifier()
algo_one.fit(x_train,y_train)
#predicting the algorithm into the non trained dataset that is test set 
prediction = algo_one.predict(x_test)
metrics.accuracy_score(prediction,y_test)

0.956140350877193

Observation


Here are the results of our Five Algorithm observation
Model Algorithm Test Accuracy
Model 1 Random Forest Algorithm 95%
Model 2 SupportVector Machine Algorithm (SVM) 90%
Model 3 Decision Tree Classifier Algorithm 92%
Model 4 K-Nearest NeighborsClassifier Algorithm 94.7%
Model 5 GaussianNB Algorithm 93.8%