<div style="color:white;display:fill;border-radius:8px;font-size:200%; letter-spacing:1.0px;"><p style="padding: 5px;color:white;text-align:left;"><b><span style='color:#fc6603'>AUTHOR: SOBIA ALAMGIR</span></b></p></div>

<a id="13"></a>
<h1 style="background-color:#435420;font-family:newtimeroman;font-size:300%;text-align:center;border-radius: 15px 50px;color:#FF9900;">Breast Cancer Dataset using Random Forest Classifiers with Hyperparameters</h1>
<figcaption style="text-align: center;">
    <strong>
    </strong>
</figcaption>

  - The `Breast Cancer dataset` in Scikit-learn is a binary classification dataset that contains 569 samples of tumor data, each with 30 numeric features related to cell nuclei, such as radius, texture, and perimeter. The target variable has two classes: malignant (0) and benign (1), which indicates whether the tumor is cancerous or not. This dataset is commonly used for medical classification tasks to predict tumor types.

## Step-01 Load Libraries

In [29]:
import pandas as pd

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split,GridSearchCV,RandomizedSearchCV
from sklearn.metrics import accuracy_score, precision_score,recall_score,f1_score,confusion_matrix,classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder

## Step-02 Load Dataset

In [None]:
lbc = load_breast_cancer()

In [6]:
# Create dataset for easy visualization
df = pd.DataFrame(lbc.data, columns=lbc.feature_names)
df['target'] = lbc.target

In [7]:
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


## Step-03 Data Preprocessing

In [8]:
df.shape

(569, 31)

In [37]:
df['target'].isnull().sum()

0

In [12]:
df.columns

Index(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity error',
       'concave points error', 'symmetry error', 'fractal dimension error',
       'worst radius', 'worst texture', 'worst perimeter', 'worst area',
       'worst smoothness', 'worst compactness', 'worst concavity',
       'worst concave points', 'worst symmetry', 'worst fractal dimension',
       'target'],
      dtype='object')

## Step-04 Data Splitting

In [None]:
X = lbc.data
y = lbc.target

In [17]:
X_train,X_test,y_train,y_test= train_test_split(X,y,test_size=0.2,random_state=42)

## Step-05 Hyperparameter Tuning with `Grid Search CV`

In [22]:
%%time
model = RandomForestClassifier()

params = {
    'criterion':['entropy','gini'],
    'n_estimators': [10, 50, 100, 200],
    'max_depth': [None, 5, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 5, 10]
}

grid = GridSearchCV (
    estimator = model,
    param_grid = params,
    cv = 5,
    scoring = 'accuracy',
    n_jobs = -1
    )

grid.fit(X_train,y_train)
print(f'Best Parameters',grid.best_params_)

Best Parameters {'criterion': 'entropy', 'max_depth': 10, 'min_samples_leaf': 1, 'min_samples_split': 10, 'n_estimators': 10}
CPU times: total: 5.25 s
Wall time: 2min 3s


## Step-06 Model Prediction

In [25]:
y_pred = grid.predict(X_test)

## Step-07 Model Evaluation

In [34]:
Accuracy = accuracy_score(y_test,y_pred)
Precision = precision_score(y_test,y_pred)
Recall = recall_score(y_test,y_pred)
f1 = f1_score(y_test,y_pred)

print(f'Accuracy Score: {Accuracy:.2f}')
print(f'Precision Score: {Precision:.2f}')
print(f'Recall Score: {Recall:.2f}')
print(f'f1 Score: {f1:.2f}')

Accuracy Score: 0.96
Precision Score: 0.96
Recall Score: 0.99
f1 Score: 0.97


* Let's check Classification Report

In [33]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.98      0.93      0.95        43
           1       0.96      0.99      0.97        71

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114



<a id="13"></a>
<h1 style="background-color:#435420;font-family:newtimeroman;font-size:300%;text-align:center;border-radius: 15px 50px;color:#FF9900;">Thanks For Reading My Notebook!​</h1>
<figcaption style="text-align: center;">
    <strong>
    </strong>
</figcaption>