# Breast Cancer prediction



## Introduction
In this exercise we'll work with the Wisconsin Breast Cancer Dataset from the UCI machine learning repository. We'll predict whether a tumor is malignant or benign based on two features: the mean radius of the tumor (radius_mean) and its mean number of concave points (concave points_mean).



![breast cancer](https://media.istockphoto.com/id/1337299869/vector/breast-cancer-awareness-month-and-diverse-ethnic-women-with-pink-support-ribbon.jpg?s=612x612&w=0&k=20&c=HDCgjAueaKKxOP1cZUbOrCVi_1I6Cp67VksS0-ymYBw=)

In [None]:
from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

import pandas as pd
import numpy as np

from sklearn.metrics import accuracy_score

## Dataset

In [None]:
df_cancer = pd.read_csv('wisconsin_breast_cancer.csv')

In [None]:
print(df_cancer.head())

         id diagnosis  radius_mean  texture_mean  perimeter_mean  area_mean  \
0    842302         M        17.99         10.38          122.80     1001.0   
1    842517         M        20.57         17.77          132.90     1326.0   
2  84300903         M        19.69         21.25          130.00     1203.0   
3  84348301         M        11.42         20.38           77.58      386.1   
4  84358402         M        20.29         14.34          135.10     1297.0   

   smoothness_mean  compactness_mean  concavity_mean  concave points_mean  \
0          0.11840           0.27760          0.3001              0.14710   
1          0.08474           0.07864          0.0869              0.07017   
2          0.10960           0.15990          0.1974              0.12790   
3          0.14250           0.28390          0.2414              0.10520   
4          0.10030           0.13280          0.1980              0.10430   

   ...  texture_worst  perimeter_worst  area_worst  smoothness

In [None]:
mapping = {'M':1, 'B':0}
df_cancer['diagnosis'] = df_cancer['diagnosis'].map(mapping)

In [None]:
df_cancer.isna().sum()

id                           0
diagnosis                    0
radius_mean                  0
texture_mean                 0
perimeter_mean               0
area_mean                    0
smoothness_mean              0
compactness_mean             0
concavity_mean               0
concave points_mean          0
symmetry_mean                0
fractal_dimension_mean       0
radius_se                    0
texture_se                   0
perimeter_se                 0
area_se                      0
smoothness_se                0
compactness_se               0
concavity_se                 0
concave points_se            0
symmetry_se                  0
fractal_dimension_se         0
radius_worst                 0
texture_worst                0
perimeter_worst              0
area_worst                   0
smoothness_worst             0
compactness_worst            0
concavity_worst              0
concave points_worst         0
symmetry_worst               0
fractal_dimension_worst      0
Unnamed:

In [None]:
df_cancer = df_cancer.drop(['Unnamed: 32'], axis=1)

In [None]:
# X and y data
X = df_cancer.drop(['diagnosis'], axis=1)
y = df_cancer[['diagnosis']]

## Train/Test split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)

## Train the tree model

In [None]:
# Instantiate decisiontreeclassifier
dt = DecisionTreeClassifier(max_depth=6, random_state=123)

# Fit to the training set
dt.fit(X_train, y_train)

# Predict test set labels
y_pred = dt.predict(X_test)
print(y_pred[0:5])

[1 0 1 1 1]


## Evaluate classification

In [None]:
# Compute test set accuracy
acc = accuracy_score(y_test, y_pred)
print("Test set accuracy: {:.2f}".format(acc))

Test set accuracy: 0.90


## Using entropy criterion



In [None]:
# Instantiate model with entropy criterion
dt_entropy = DecisionTreeClassifier(max_depth=8, criterion='entropy', random_state=1)

# Fit the model to the training set
dt_entropy.fit(X_train, y_train)

## Predict and score

In [None]:
# predict X_test
y_pred = dt_entropy.predict(X_test)

# Evaluate accuracy
accuracy_entropy = accuracy_score(y_test, y_pred)

print(f'Accuract achieved by using entropy: {accuracy_entropy:.3f}')


Accuract achieved by using entropy: 0.860
