# Breast Cancer Detection Using Machine Learning

I'll use Python to develop a Naive Bayes algorithm in Machine Learning. I'll use a database of breast cancer tumor information for this work to detect breast cancer.

#### Importing the dependency

In [5]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LinearRegression

Now we have imported the required information lets import the dataset and assign it to the attributes

In [9]:
breast_cancer_data = load_breast_cancer()

Lets check what are the directories present in the Load_breast_cancer()

In [11]:
dir(breast_cancer_data)

['DESCR',
 'data',
 'data_module',
 'feature_names',
 'filename',
 'frame',
 'target',
 'target_names']

We need to create new variables for each important set of information that we just explored above

In [13]:
label_names = breast_cancer_data["target_names"]
labels = breast_cancer_data["target"]
feature_names = breast_cancer_data["feature_names"]
features = breast_cancer_data["data"]


Each collection of useful information in the dataset now has a value. Let's look at our data by printing our class labels, the label for the first data instance, our entity names, and the entity values for the first data instance to better understand our dataset:

In [17]:
print(label_names)
print("Class label :", labels[0])
print(feature_names)
print(features[0], "\n")

['malignant' 'benign']
Class label : 0
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
 1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
 6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
 1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
 4.601e-01 1.189e-01] 



Now that our data is loaded, we can work with our data to build our machine learning model using the Naive Bayes algorithm for the breast cancer detection task

#### Splitting of Dataset

To evaluate the performance of a classifier, you should always test the model on invisible data. Therefore, before I create a machine learning model for breast cancer detection, I will divide your data into two parts: an 80% training set and a 20% test set:

In [18]:
train, test, train_labels, test_labels = train_test_split(features, labels, test_size=0.2, random_state=42)

Using Naive Bayes for Breast Cancer Detection

There are numerous machine learning models, each with its own set of strengths and shortcomings. For the Breast Cancer Detection Model task, I'll use a basic technique called the Naive Bayes classifier, which performs well in binary classification tasks in general:

In [19]:
gnb = GaussianNB()
gnb.fit(train, train_labels)

GaussianNB()

Now wthat we have trained the model, we can use it to make predictions on our testset.

In [20]:
preds = gnb.predict(test)
print(preds, '\n')

[1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0
 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0
 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 1 0
 1 1 0] 



Now that we have an array of real class labels, we can compare the two arrays (test labels vs preds) to see how accurate the model predictor is.

In [21]:
print(accuracy_score(test_labels, preds))

0.9736842105263158


You can see from the output above, our breast cancer detection model gives an accuracy rate of almost 97%. This means that 97% of the time the classifier is able to make the correct prediction.