<h1 style="margin:auto;width:50%;">Breast Cancer Prediction</h1>
<br>
<div>
    <img  src="https://www.rd.com/wp-content/uploads/2018/10/50-Everyday-Habits-That-Reduce-Your-Risk-of-Breast-Cancer-15-760x506.jpg" />
</div>


**Breast cancer:** Breast cancer is cancer that develops from breast tissue. Signs of breast cancer may include a lump in the breast, a change in breast shape, dimpling of the skin, fluid coming from the nipple, a newly inverted nipple, or a red or scaly patch of skin. In those with distant spread of the disease, there may be bone pain, swollen lymph nodes, shortness of breath, or yellow skin.
`From Wikipedia`



- **Attribute Information:**

 * 1) ID number
 * 2) Diagnosis (M = malignant, B = benign)


- **Ten real-valued features are computed for each cell nucleus:**

  * a) radius (mean of distances from center to points on the perimeter)
  * b) texture (standard deviation of gray-scale values)
  * c) perimeter
  * d) area
  * e) smoothness (local variation in radius lengths)
  * f) compactness (perimeter^2 / area - 1.0)
  * g) concavity (severity of concave portions of the contour)
  * h) concave points (number of concave portions of the contour)
  * i) symmetry
  * j) fractal dimension ("coastline approximation" - 1)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv('../input/breast-cancer-wisconsin-data/data.csv')

In [None]:
pd.set_option("display.max_columns", None)

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df = df.drop(['id','Unnamed: 32'],axis=1)

In [None]:
df['diagnosis'].value_counts()

In [None]:
df['diagnosis'] = df['diagnosis'].map({'B':0,'M':1})

In [None]:
df.describe()

In [None]:
df.isna().sum()

In [None]:
df.duplicated().sum()

## Explore Data Analysis (EDA): 

In [None]:
df.head()

In [None]:
palette_color  = ['black','red']
cmap = ['gray','black']

In [None]:
df.head()

In [None]:
fig, axs = plt.subplots(nrows=4, ncols=3 ,figsize=(14,18))

sns.scatterplot(data=df, x='radius_mean', y='texture_mean', hue='diagnosis',palette=palette_color, ax=axs[0][0])
sns.scatterplot(data=df, x='radius_mean', y='perimeter_mean', hue='diagnosis',palette=palette_color,  ax=axs[0][1])
sns.scatterplot(data=df, x='radius_mean', y='radius_se', hue='diagnosis',palette=palette_color,  ax=axs[0][2])
sns.scatterplot(data=df, x='radius_mean', y='concave points_mean', hue='diagnosis',palette=palette_color,  ax=axs[1][0])
sns.scatterplot(data=df, x='radius_mean', y='smoothness_worst', hue='diagnosis',palette=palette_color,  ax=axs[1][1])
sns.scatterplot(data=df, x='radius_mean', y='area_mean', hue='diagnosis',palette=palette_color,  ax=axs[1][2])

sns.scatterplot(data=df, x='radius_mean', y='area_se', hue='diagnosis',palette=palette_color, ax=axs[2][0])
sns.scatterplot(data=df, x='radius_mean', y='smoothness_se', hue='diagnosis',palette=palette_color,  ax=axs[2][1])
sns.scatterplot(data=df, x='radius_mean', y='radius_worst', hue='diagnosis',palette=palette_color,  ax=axs[2][2])
sns.scatterplot(data=df, x='radius_mean', y='concavity_se', hue='diagnosis',palette=palette_color,  ax=axs[3][0])
sns.scatterplot(data=df, x='radius_mean', y='concave points_se', hue='diagnosis',palette=palette_color,  ax=axs[3][1])
sns.scatterplot(data=df, x='radius_mean', y='symmetry_worst', hue='diagnosis',palette=palette_color,  ax=axs[3][2])

# plt.legend(loc=)
plt.show()

In [None]:
fig, axs = plt.subplots(nrows=4, ncols=3 ,figsize=(14,18))

sns.violinplot(data=df,  y='texture_mean', palette='gray', ax=axs[0][0])
sns.violinplot(data=df,  y='perimeter_mean', palette='gray',  ax=axs[0][1])
sns.violinplot(data=df,  y='radius_se', palette='gray',  ax=axs[0][2])
sns.violinplot(data=df,  y='concave points_mean', palette='gray',  ax=axs[1][0])
sns.violinplot(data=df,  y='smoothness_worst', palette='gray',  ax=axs[1][1])
sns.violinplot(data=df,  y='area_mean', palette='gray',  ax=axs[1][2])

sns.violinplot(data=df,  y='area_se', palette='gray', ax=axs[2][0])
sns.violinplot(data=df,  y='smoothness_se', palette='gray',  ax=axs[2][1])
sns.violinplot(data=df,  y='radius_worst', palette='gray',  ax=axs[2][2])
sns.violinplot(data=df,  y='concavity_se',palette='gray',  ax=axs[3][0])
sns.violinplot(data=df,  y='concave points_se', palette='gray',  ax=axs[3][1])
sns.violinplot(data=df,  y='symmetry_worst', palette='gray',  ax=axs[3][2])

plt.show()

____

## Build Model

In [None]:
from sklearn.preprocessing   import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model    import LogisticRegression
from sklearn.metrics         import confusion_matrix, accuracy_score

In [None]:
X = df.drop('diagnosis',axis=1)
y = df['diagnosis']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, random_state=40)

In [None]:
sc = StandardScaler()

In [None]:
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

In [None]:
log_reg = LogisticRegression(penalty='l1',solver='liblinear', random_state=40, )

In [None]:
log_reg.fit(X_train, y_train)

In [None]:
y_predict = log_reg.predict(X_test)

In [None]:
confusion_matrix(y_test, y_predict)

In [None]:
acc_score = np.around(accuracy_score(y_test, y_predict) * 100, 2)
print('accuracy score :',acc_score, '%')

In [None]:
test_score = np.round(log_reg.score(X_test, y_predict) * 100, 2)
print('Train score :',test_score,'%')

In [None]:
train_score = np.round(log_reg.score(X_train, y_train) * 100, 2)
print('Train score :',train_score,'%')

In [None]:
iterations = log_reg.n_iter_[0]
print('Number of iterations : ',iterations)