###### Description: 
The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago’s Billings Hospital on the survival of patients who had undergone surgery for breast cancer.

###### Attribute Information:
* Age of patient at the time of operation (numerical)
* Patient’s year of operation (year — 1900, numerical)
* Number of positive axillary nodes detected (numerical)
* **Survival status (class attribute)** :
    - 1 = the patient survived 5 years or longer
    - 2 = the patient died within 5 years

#### Note: (About Axillary Lymph)
The body has about 20 to 40 bean-shaped axillary lymph nodes located in the underarm area. These lymph nodes are responsible for draining lymph – a clear or white fluid made up of white blood cells – from the breasts and surrounding areas, including the neck, the upper arms, and the underarm area.

##### Importing Libraries

In [27]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

#reading the csv file
data = pd.read_csv("haberman.csv")
data.head(10)

Unnamed: 0,Age,Year,axillary nodes,status
0,30,64,1,1
1,30,62,3,1
2,30,65,0,1
3,31,59,2,1
4,31,65,4,1
5,33,58,10,1
6,33,60,0,1
7,34,59,0,2
8,34,66,9,2
9,34,58,30,1


In [18]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 306 entries, 0 to 305
Data columns (total 4 columns):
Age                306 non-null int64
Year               306 non-null int64
 axillary nodes    306 non-null int64
status             306 non-null int64
dtypes: int64(4)
memory usage: 9.7 KB


#### Observations:
* There are no missing values in this data set
* Mapping status from integer to a categorical datatype
- In the status column, the value 1 can be mapped to ‘yes’ which means the patient has survived 5 years or longer. And the value 2 can be mapped to ‘no’ which means the patient died within 5 years.

In [28]:
data['Survived'] = data['status'].map({1:'Yes', 2:'No'})
data.head() 

Unnamed: 0,Age,Year,axillary nodes,status,Survived
0,30,64,1,1,Yes
1,30,62,3,1,Yes
2,30,65,0,1,Yes
3,31,59,2,1,Yes
4,31,65,4,1,Yes


In [29]:
data.describe()

Unnamed: 0,Age,Year,axillary nodes,status
count,306.0,306.0,306.0,306.0
mean,52.457516,62.852941,4.026144,1.264706
std,10.803452,3.249405,7.189654,0.441899
min,30.0,58.0,0.0,1.0
25%,44.0,60.0,0.0,1.0
50%,52.0,63.0,1.0,1.0
75%,60.75,65.75,4.0,2.0
max,83.0,69.0,52.0,2.0


#### Observations:
* **Average Age :** 52 yrs
* **Maximum Age :** 83 yrs
* **Minimum Age :** 30 yrs

In [30]:
data["status"].value_counts()
#gives each count of the status type

1    225
2     81
Name: status, dtype: int64

#### Observations:
* **Survival status (class attribute)** :
    - 1 = the patient survived 5 years or longer = **225**
    - 2 = the patient died within 5 years = **81**
* Out of 306 patients, 225 patients survived and 81 did not