If you found it helpful, do upvote

Feel free to comment

I would love to have suggestions.

# Dataset Content

1) ID number

2) Diagnosis (M = malignant, B = benign)
3-32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)

b) texture (standard deviation of gray-scale values)

c) perimeter

d) area

e) smoothness (local variation in radius lengths)

f) compactness (perimeter^2 / area - 1.0)

g) concavity (severity of concave portions of the contour)

h) concave points (number of concave portions of the contour)

i) symmetry

j) fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant

# Import Libraries

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split

# Read Data

In [None]:
data= pd.read_csv("/kaggle/input/breast-cancer-wisconsin-data/data.csv")
data

In [None]:
data.head(5)

# Data Describe

In [None]:
data.info()

In [None]:
data.isnull().sum()

In [None]:
sns.heatmap( data.isnull(),cmap=sns.cubehelix_palette(as_cmap=True))

In [None]:
df=data.copy()
df

# Data Preprocessing & Cleaning

In [None]:
df.info()

**Drop the columns with > 80% missing**

In [None]:
df.drop(columns=["Unnamed: 32"], inplace=True)

In [None]:
df.drop(columns=["id"], inplace=True)

In [None]:
df.info()

In [None]:
df['diagnosis'].value_counts()

In [None]:
df['diagnosis'].replace({'B':'benign', "M":'malignant'}, inplace = True)

In [None]:
df['diagnosis'].value_counts()

 I have done this to have more clear form of the dataset.

# encoding the string data

In [None]:
df.describe(include=object)

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

In [None]:
cols = ['diagnosis']

df[cols] = df[cols].apply(LabelEncoder().fit_transform)

In [None]:
df.info()

# train test split

In [None]:
x=df.drop('diagnosis',axis=1).values
y=df['diagnosis'].values

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
x_train , x_test , y_train , y_test = train_test_split(x,y, test_size=0.2, random_state=42)

In [None]:
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

# Scaling the Data

In [None]:
#importing StandardScaler
from sklearn.preprocessing import StandardScaler
#creating object
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

# Model Structure

In [None]:
model = Sequential()
model.add(Dense(10, activation='relu', input_dim=30))
model.add(Dense(20, activation='relu'))
model.add(Dense(1, activation='relu'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
model.summary()

**Train model**

In [None]:
history=model.fit(x_train, y_train,batch_size = 32, validation_data=(x_test, y_test),epochs =100)

# Evaluate model

In [None]:
from sklearn.metrics import classification_report,confusion_matrix

In [None]:
y_pred = model.predict(x_test)
y_pred = (y_pred>0.5)

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
cm = confusion_matrix(y_test,y_pred)
score = accuracy_score(y_test,y_pred)
print(cm)
print('score is:',score)

In [None]:
print(classification_report(y_test,y_pred))