<a href="https://colab.research.google.com/github/jean-pierre-gergie/AlertingAPI/blob/main/Week4/Notebook4.1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Problem Statement

In the era of growing medical data and the ascent of Data Science, numerous potential solutions are embracing the challenge of developing predictive indicators for potential diseases. Among these are the cardiovascular diseases (CVDs), which stand out as the leading global cause of mortality, claiming around 17.9 million lives annually and constituting 31% of total global deaths. Within the context of CVDs, heart failure emerges as the most occurrent. Individuals with cardiovascular disease, or those at a heightened cardiovascular risk due to factors like hypertension, diabetes, hyperlipidemia, or pre-existing conditions, necessitate early identification and intervention. This is where the application of machine learning models can play a very life-saving and imensly important role. By doing so, we aim to build this notebook to automate the resolution of yet another natural problem, leveraging AI techniques to transition our focus towards tackling the subsequent challenges.

#### * We continue to build and explaing parts of this notebook as we progress with our Programming for Applied AI course. In the Objectives section, we state the goal of the current version of the notebook.

#### * The objective of providing you with these notebooks is for you to get hands on experience with the different concepts covered thoughout the course lessons.

#### * For practicing, we advise you to run all the cells. You can do this by selection "Run All" from the "Cell" tab in the navigation bar on top. For running individual cells, you have to select the cell you want, the click "Run" in the navigation bar. Please refer to the video in Week 3 / Practical Exercises for more information on how you can use this notebook.

## Objectives

- Understand the heart failure prediction dataset and its content.
- Apply Lists and use the Iterators and Loops covered in week 3.

## Dataset Description

A dataset contains a set of columns, which we refer to as attributes. An instance of these combined attributes forms a row. Each of these attributes gives us additional information about the patient state, which can help in identifying if the patient possiblly has a heart diseas. A collection of rows forms that dataset. This data is the core of our machine learning model that we are going to build throughout the lessons. A "prediction", which can also be called a "label", is the classification of every row refering to a possibility of a heart disease with a yes/no or 0/1.

## Dataset Attributes Description

In the attached documents, you can find 'heart.csv', which cantains the heatt disease dataset.

Below are the list of columns/attributes in the dataset. We explain the meaning of each of the attributes.

- Age : age of the patient [years]
- Sex : sex of the patient [M: Male, F: Female]
- ChestPainType : chest pain type [TA: Typical Angina, ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic]
- RestingBP : resting blood pressure [mm Hg]
- Cholesterol : serum cholesterol [mm/dl]
- FastingBS : fasting blood sugar [1: if FastingBS > 120 mg/dl, 0: otherwise]
- RestingECG : resting electrocardiogram results [Normal: Normal, ST: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), LVH: showing probable or definite left ventricular hypertrophy by Estes' criteria]
- MaxHR : maximum heart rate achieved [Numeric value between 60 and 202]
- ExerciseAngina : exercise-induced angina [Y: Yes, N: No]
- Oldpeak : oldpeak = ST [Numeric value measured in depression]
- ST_Slope : the slope of the peak exercise ST segment [Up: upsloping, Flat: flat, Down: downsloping]
- HeartDisease : output class [1: heart disease, 0: Normal]

# <center><div style="font-family: Trebuchet MS; background-color: #000000; color: #FFFF; padding: 10px; line-height: 1;">Do not worry about the following code cells, we are going to cover it later. For now, after you run all the cells, scroll down to "Dictionaries Practice" and attempt to solve the exercises described in the lessons.</div></center>

#### Import libraries/modules

In [1]:
# Install Packages
!pip install pandas numpy matplotlib seaborn



In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
pd.options.display.float_format = '{:.2f}'.format
import warnings
warnings.filterwarnings('ignore')

### Reading Data

In [3]:
import csv

data = []

with open('./heart.csv', mode='r') as csv_file:
    # Create a CSV reader object
    csv_reader = csv.DictReader(csv_file)

    # Loop through each row in the CSV file
    for row in csv_reader:
        # Append each row (as a dictionary) to the data list
        data.append(row)

### Here is a sample of the first row in the dataset. You can see the keys as the attributes, which are associated with values

In [4]:
data[0]

{'Age': '40',
 'Sex': 'M',
 'ChestPainType': 'ATA',
 'RestingBP': '140',
 'Cholesterol': '289',
 'FastingBS': '0',
 'RestingECG': 'Normal',
 'MaxHR': '172',
 'ExerciseAngina': 'N',
 'Oldpeak': '0',
 'ST_Slope': 'Up',
 'HeartDisease': '0'}

# <center><div style="font-family: Trebuchet MS; background-color: #F93822; color: #FDD20E; padding: 12px; line-height: 1;">Lesson 2 - Dictionaries Practice</div></center>

# <center><div style="font-family: Trebuchet MS; background-color: #F93822; color: #FDD20E; padding: 12px; line-height: 1;">Lesson 3 - Functions Practice</div></center>

In [6]:
count_distinct_chest_pain_type = {}


for r in data :
  ct = r['ChestPainType']
  if ct in count_distinct_chest_pain_type:
      count_distinct_chest_pain_type[ct] = count_distinct_chest_pain_type.get(ct)+1
  else:
    count_distinct_chest_pain_type[ct]=1
count_distinct_chest_pain_type

{'ATA': 173, 'NAP': 203, 'ASY': 496, 'TA': 46}

# <center><div style="font-family: Trebuchet MS; background-color: #F93822; color: #FDD20E; padding: 12px; line-height: 1;">Lesson 4 - File Reading</div></center>

In [7]:
high_ch = {}


for r in data:
  cl = int ( r['Cholesterol'])
  if cl >360 :
    high_ch[r['Age']] = r
high_ch

{'53': {'Age': '53',
  'Sex': 'M',
  'ChestPainType': 'NAP',
  'RestingBP': '145',
  'Cholesterol': '518',
  'FastingBS': '0',
  'RestingECG': 'Normal',
  'MaxHR': '130',
  'ExerciseAngina': 'N',
  'Oldpeak': '0',
  'ST_Slope': 'Flat',
  'HeartDisease': '1'},
 '54': {'Age': '54',
  'Sex': 'M',
  'ChestPainType': 'ASY',
  'RestingBP': '130',
  'Cholesterol': '603',
  'FastingBS': '1',
  'RestingECG': 'Normal',
  'MaxHR': '125',
  'ExerciseAngina': 'Y',
  'Oldpeak': '1',
  'ST_Slope': 'Flat',
  'HeartDisease': '1'},
 '44': {'Age': '44',
  'Sex': 'M',
  'ChestPainType': 'ASY',
  'RestingBP': '135',
  'Cholesterol': '491',
  'FastingBS': '0',
  'RestingECG': 'Normal',
  'MaxHR': '135',
  'ExerciseAngina': 'N',
  'Oldpeak': '0',
  'ST_Slope': 'Flat',
  'HeartDisease': '1'},
 '32': {'Age': '32',
  'Sex': 'M',
  'ChestPainType': 'ASY',
  'RestingBP': '118',
  'Cholesterol': '529',
  'FastingBS': '0',
  'RestingECG': 'Normal',
  'MaxHR': '130',
  'ExerciseAngina': 'N',
  'Oldpeak': '0',
  'ST_

In [9]:
m ={}
f = {}
m['sum'] = 0
m['count']=0

f['sum']=0
f['count']=0


for r in data :
  if r['Sex']== 'M':
    m['sum'] +=int ( r['MaxHR'])
    m['count']+=1
  else:
    f['sum'] += int(r['MaxHR'])
    f['count']+=1

avg_m = m['sum']/m['count']
avg_f = f['sum']/f['count']

print(avg_m)
print(avg_f)



134.32551724137932
146.13989637305698


In [11]:
mx_ch =float('-inf')
r_mx_ch =None

for r in data:
  tmp = int (r['Cholesterol'])
  if tmp > mx_ch:
    mx_ch = tmp
    r_mx_ch = r
r_mx_ch

{'Age': '54',
 'Sex': 'M',
 'ChestPainType': 'ASY',
 'RestingBP': '130',
 'Cholesterol': '603',
 'FastingBS': '1',
 'RestingECG': 'Normal',
 'MaxHR': '125',
 'ExerciseAngina': 'Y',
 'Oldpeak': '1',
 'ST_Slope': 'Flat',
 'HeartDisease': '1'}

In [12]:
age_groups = {'18-30': [], '31-40': [], '41-50': [], '51-60': [], '61+': []}

# Loop through the dataset
for data_point in data:
    age = int(data_point['Age'])
    if age <= 30:
        age_groups['18-30'].append(data_point)
    elif age <= 40:
        age_groups['31-40'].append(data_point)
    elif age <= 50:
        age_groups['41-50'].append(data_point)
    elif age <= 60:
        age_groups['51-60'].append(data_point)
    else:
        age_groups['61+'].append(data_point)

print("Patients in Age Group 18-30:", len(age_groups['18-30']))
print("Patients in Age Group 31-40:", len(age_groups['31-40']))

Patients in Age Group 18-30: 5
Patients in Age Group 31-40: 88


In [14]:
count =0
total = 0

for r in data :
  if r['HeartDisease'] == '1':
    count+=1
  total+=1

rate  = (count/total)*100
rate

55.33769063180828