![](https://i.imgur.com/zRTtnT8.png)

# *EDA on CBIS-DDSM Breast Cancer Dataset*

Breast cancer is a disease in which cells in the breast grow out of control. There are different kinds of breast cancer. The kind of breast cancer depends on which cells in the breast turn into cancer.

Breast cancer can begin in different parts of the breast. A breast is made up of three main parts: lobules, ducts, and connective tissue. The lobules are the glands that produce milk. The ducts are tubes that carry milk to the nipple. The connective tissue (which consists of fibrous and fatty tissue) surrounds and holds everything together. Most breast cancers begin in the ducts or lobules.

Breast cancer can spread outside the breast through blood vessels and lymph vessels. When breast cancer spreads to other parts of the body, it is said to have metastasized.

<br><br>
**Diagnosis of breast cancer**

To determine if the symptoms are caused by breast cancer or a benign breast condition, doctor will do a thorough physical exam in addition to a breast exam. They may also request one or more diagnostic tests to help understand what’s causing the symptoms.


Tests that can help your doctor diagnose breast cancer include:

- Mammogram - The most common way to see below the surface of your breast is with an imaging test called a mammogram. Many women ages 40 and older get annual mammograms to check for breast cancer. If your doctor suspects you may have a tumor or suspicious spot, they will also request a mammogram. If an atypical area is seen on your mammogram, your doctor may request additional tests.

- Ultrasound - A breast ultrasound uses sound waves to create a picture of the tissues deep in your breast. An ultrasound can help your doctor distinguish between a solid mass, such as a tumor, and a benign cyst.


<br><br>
**Pathalogy**
1. **Mass**<br>
A breast lump or thickening that feels different from the surrounding tissue. Change in the size, shape or appearance of a breast. Changes to the skin over the breast, such as dimpling.
A painless, hard mass that has irregular edges is more likely to be cancer, but breast cancers can be also soft, round, tender, or even painful. Other possible symptoms of breast cancer include: Swelling of all or part of a breast (even if no lump is felt) Skin dimpling (sometimes looking like an orange peel)


2. **Calcification**<br>
Although breast calcifications are usually noncancerous (benign), certain patterns of calcifications — such as tight clusters with irregular shapes and fine appearance — may indicate breast cancer or precancerous changes to breast tissue.

<br><br>
**Image view**

In screening digital mammography, each breast is typically imaged with two different views, i.e., 
- The mediolateral oblique (MLO) view and cranial caudal (CC) view.
- The MLO view is taken from the center of the chest outward, while the CC view is taken from above the breast.

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df_calc = pd.read_csv("/kaggle/input/breast-cancer-jpg-image-dataset-of-cbisddsm/k_CBIS-DDSM/calc_case(with_jpg_img).csv")
df_mass = pd.read_csv("/kaggle/input/breast-cancer-jpg-image-dataset-of-cbisddsm/k_CBIS-DDSM/mass_case(with_jpg_img).csv")

In [None]:
df_all_cases = pd.concat([df_calc, df_mass], axis=0)
df_all_cases.sample(5)

In [None]:
df_all_cases.shape

In [None]:
df_all_cases.info()

In [None]:
df_all_cases.isna().sum()

## Data Visualization 📊

### Visualize Abnormality

In [None]:
plt.figure(figsize=(25, 10))
palette = sns.color_palette("rainbow", 8)


abnormality_insight = df_all_cases["abnormality type"].value_counts()

plt.subplot(1, 2, 1)
sns.barplot(data=df_all_cases, x=abnormality_insight.index, y=abnormality_insight.values, palette=palette)
plt.xlabel('Abnormality')
plt.ylabel('Cases count')

plt.subplot(1, 2, 2)

plt.pie(data=df_all_cases,  x=abnormality_insight.values, labels=abnormality_insight.index, autopct='%.0f%%', colors=palette, explode = [0.05,0])
plt.title("Abnormality")


plt.show()

### Visualize the position and image view of the breast

In [None]:
plt.figure(figsize=(25, 10))
palette = sns.color_palette("Spectral", 8)

plt.subplot(1, 2, 1)
breat_type = df_all_cases['left or right breast'].value_counts()

sns.barplot(x=breat_type.index, y=breat_type.values, palette=palette)
plt.xlabel('Position')
plt.ylabel('count')
plt.title("Position of Breast")


plt.subplot(1, 2, 2)
palette = sns.color_palette("icefire", 8)

image_view = df_all_cases['image view'].value_counts()

sns.barplot(y=image_view.index, x=image_view.values, palette=palette)
plt.xlabel('count')
plt.ylabel('Image View')
plt.title("Image View of Breast")

plt.show()

### Visualize Benign vs Benign without Callback vs Malignant

In [None]:
plt.figure(figsize=(25, 10))
palette = sns.color_palette("terrain", 8)


pathalogy_insight = df_all_cases["pathology"].value_counts()

plt.subplot(1, 2, 1)
sns.barplot(data=df_all_cases, x=pathalogy_insight.index, y=pathalogy_insight.values, palette=palette)
plt.xlabel('Pathology')
plt.ylabel('Cases count')

plt.subplot(1, 2, 2)

plt.pie(data=df_all_cases,  x=pathalogy_insight.values, labels=pathalogy_insight.index, autopct='%.0f%%', colors=palette, explode = [0.05,0,0])
plt.title("Pathology")


plt.show()

### Visualize different Types of Calcification

In [None]:
plt.figure(figsize=(35, 30))
palette = sns.color_palette("rainbow", 8)

calc_type = df_all_cases['calc type'].value_counts()

sns.barplot(y=calc_type.index, x=calc_type.values, palette=palette)
plt.ylabel('calc type')

plt.xlabel('count', fontsize=25)
plt.ylabel('calc type', fontsize=25)

plt.xlabel('count')
plt.title("Types of calcification", fontsize=25)
plt.yticks(fontsize=20)
plt.show()

### Visualize different Shapes of Mass

In [None]:
plt.figure(figsize=(35, 30))
palette = sns.color_palette("rainbow", 8)

mass_shape = df_all_cases['mass shape'].value_counts()

sns.barplot(y=mass_shape.index, x=mass_shape.values, palette=palette)
plt.ylabel('mass shape', fontsize=25)
plt.xlabel('count', fontsize=25)
plt.title("Shapes of mass", fontsize=25)
plt.yticks(fontsize=20)
plt.show()

## Mammography Visualization

In [None]:
import os
plt.figure(figsize=(30, 20)) # define the plot size


for i in range(8):
    img = plt.imread(os.path.join("/kaggle/input/breast-cancer-jpg-image-dataset-of-cbisddsm/k_CBIS-DDSM/jpg_img/", df_all_cases["jpg_fullMammo_img_path"].iloc[i].split("/", 1)[1]))
    
    plt.subplot(2, 4, i+1) # subplot of the images 2 row and 4 columns with plt index of 0, 1, 2, 3,...
    plt.xticks([]), plt.yticks([]) # remove the axis from the plt
    plt.xlabel("\n"+str(df_all_cases["pathology"].iloc[i]+"\n\n"), fontsize=20) # print the class label i.e Malignant or Benign
    
    plt.imshow(img, cmap='gray') # show the image
plt.tight_layout()
plt.show()  