> # Heart Disease UCI dataset Analysis
<a href="https://www.kaggle.com/ronitf/heart-disease-uci">GET HERE</a>

-----------------

### What types of data ? 

**Size :** 303 values, 14 columns

**Target column :** data["target"]

**Column types :** 13 features + 1 target


| Name   | Type   |
| :----: | :----: |
|age     | int64  |
|sex     | int64  |
|cp      | int64  |
|trestbps| int64  |
|chol	 | int64  |
|fbs	 | int64  |
|restecg | int64  |
|thalach | int64  |
|exang 	 | int64  |
|oldpeak | float64|
|slope 	 | int64  |
|ca 	 | int64  |
|thal    | int64  |
|target|int64|

**NaN  :** 0 null values


**Attributes information** : 

   * **sex** = 1 : male; 0 : female
   * **cp** = The chest pain experienced (Value 1: typical angina, Value 2: atypical angina, Value 3: non-anginal pain, Value 4: asymptomatic)
   * **trestbps** = resting blood pressure
   * **chol** = serum cholestoral in mg/dl
   * **fbs** = fasting blood sugar  > 120 mg/dl : (1 = true; 0 = false) 
   * **restecg** = resting electrocardiographic results (values 0,1,2)
   * **thalach** = maximum heart rate achieved
   * **exang** = exercise induced angina
   * **oldpeak** = ST depression induced by exercise relative to rest
   * **slope** = the slope of the peak exercise ST segment
   * **ca** = number of major vessels (0-3) colored by flourosopy
   * **thal**: is a blood disorder called thalassemia, 3 = normal; 6 = fixed defect; 7 = reversable defect


-----------------

### Data analysis 

* **Univariate** : 

    * Target -> positive = 165, negative = 138. The target is nicely balanced
    * Age    -> global mean = 54.3 ; women mean: 55.677083 ; man mean: 53.758454
    * Chest pain type -> most of the data are type 0, can be usefull to try to correlate this one with target
    * thalac -> mean = 149 heart beats per minutes; max = 202

* **Bivariate** : 
    * Create subset of data : Positive / negative , Blood (column :'trestbps','chol', 'thal', 'fbs) 
    * Corr matrix : max corr between target and cp (0.43) , target thalach (0,42)
    * Age / Target
    * Sex / Target
    * Chest type / Target : 
    * blood / target : thal = 5 -> the highest positive result    
    * Blood / Blood : 
    * Blood / Age : Blood is not correlate with Age (max trestbps = 0.279351)


### Hypothesis test
* H0 = Chess type seems to play a role of Heart disease

* H1 = thalach seems to be correlate with heart disease

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
data = pd.read_csv("../input/heart-disease-uci/heart.csv")
data

In [None]:
data.shape

### Types

In [None]:
data.dtypes

In [None]:
data.dtypes.value_counts().plot.pie()

### NaN

In [None]:
(data.isna().sum()/data.shape[0]).sort_values(ascending = True)

### Describe data

In [None]:
data.describe()

### Univariate 

#### Target

In [None]:
data['target'].value_counts()

#### distribution plot of the features

In [None]:
for col in data.select_dtypes('int64'):
    sns.set()
    sns.displot(data[col], kde = True)

In [None]:
sns.distplot(data['oldpeak'], kde= True)

In [None]:
sns.boxplot(x = "sex", y = "age", data = data)

In [None]:
data.groupby('sex').age.plot(kind='kde')

In [None]:
data[["sex", "age"]].groupby("sex").mean()

## Create Subset

### Negative / positive subset

In [None]:
# positive results
positive_result = data[data['target'] == 1]

# negative results
negative_result = data[data['target'] == 0]

In [None]:
blood = data[['trestbps','chol', 'thal','fbs']]
blood

## Bivariate / multi Analysis

#### Create the pair plot correlation matrix

In [None]:
sns.pairplot(data)

In [None]:
data_corr = data.corr()
sns.heatmap(data_corr, annot = True)

### Sexe / target

In [None]:
sns.displot(data, x = 'sex', hue='target', alpha = 0.9)

### Age / Target

In [None]:
data.groupby('target').age.plot(kind='kde')

In [None]:
sns.countplot(x="age", hue="target", data=data)

### Chest value / Target

#### Chest 0 / target value subset

In [None]:
cp_zero_pos_target = len(data[(data['cp'] == 0) & (data['target'] == 1)])
cp_zero_neg_target = len(data[(data['cp'] == 0) & (data['target'] == 0)])

In [None]:
sns.barplot(x = ['cp = zero and negative target', 'cp = zero and positive target'], 
            y = [cp_zero_neg_target, cp_zero_pos_target])
plt.show()

#### Chest 1 / target value subset

In [None]:
cp_one_pos_target = len(data[(data['cp'] == 1) & (data['target'] == 1)])
cp_one_neg_target = len(data[(data['cp'] == 1) & (data['target'] == 0)])

In [None]:
sns.barplot(x = ['cp = One and negative target', 'cp = one and positive target'], 
            y = [cp_one_neg_target, cp_one_pos_target])
plt.show()

#### Chest 2 / target value subset

In [None]:
cp_two_pos_target = len(data[(data['cp'] == 2) & (data['target'] == 1)])
cp_two_neg_target = len(data[(data['cp'] == 2) & (data['target'] == 0)])

In [None]:
sns.barplot(x = ['cp = two and negative target', 'cp = two and positive target'], 
            y = [cp_two_neg_target, cp_two_pos_target])
plt.show()

#### Chest 3 / target value subset

In [None]:
cp_three_pos_target = len(data[(data['cp'] == 3) & (data['target'] == 1)])
cp_three_neg_target = len(data[(data['cp'] == 3) & (data['target'] == 0)])

In [None]:
sns.barplot(x = ['cp = three and negative target', 'cp = three and positive target'], 
            y = [cp_three_neg_target, cp_three_pos_target])
plt.show()

### Blood / Target

In [None]:
for col in blood:
    plt.figure()
    negative_result[col].hist(label = "negative")
    positive_result[col].hist(label = "positive", alpha = 0.7)
    plt.legend()
    plt.xlabel(col)

### THALCH (maximum heart rate achieved) / Target   

In [None]:
sns.displot(data, x='thalach', hue='target', alpha = 0.8)

In [None]:
data.groupby('target').thalach.plot(kind='kde')

## Blood / Blood 

In [None]:
sns.pairplot(blood)

### Blood / Age

In [None]:
for col in blood:
    plt.figure()
    sns.lmplot(x="age", y=col, hue="target", data=data)

In [None]:
data.corr()["age"].sort_values()

## Test hypothesis

In [None]:
from scipy.stats import ttest_ind

check class (must be nicely balanced), if is not we can select a sample of class with the sample() function

In [None]:
positive_result.shape

In [None]:
negative_result.shape

#### t_test function

In [None]:
def t_test(col):
    alpha = 0.02
    stat, p = ttest_ind(negative_result[col], positive_result[col])
    if p < alpha:
        return "H0 rejected"
    else:
        return 0

In [None]:
for col in data:
    print(f'{col :-<30} {t_test(col)}')