# Hypertension Prediction Project - Midterm Notebook

## 1. Introduction

In this notebook, we aim to predict the risk of hypertension for individuals based on various health parameters. Hypertension, also known as high blood pressure, is a condition that significantly increases the risk of heart disease, stroke, and other health problems. The goal of this study is to build a machine learning model that can effectively classify individuals as being at risk for hypertension. We will follow the standard process of data analysis, which includes data preparation, exploratory data analysis (EDA), feature engineering, model training, evaluation, and selection.

## 2. Data Preparation

In [7]:
import pandas as pd
import numpy as np
from sklearn.metrics import mutual_info_score, accuracy_score, roc_curve, roc_auc_score
from sklearn.metrics import classification_report, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split, KFold
from sklearn.feature_extraction import DictVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from tqdm.auto import tqdm
import pickle
import seaborn as sns
from matplotlib import pyplot as plt
import warnings

warnings.filterwarnings("ignore", category=FutureWarning)
%matplotlib inline

data = './hypertension_dataset.csv'
!head $data

df = pd.read_csv(data)
df.head(10)

NumMedicalVisits,Cholesterol,BloodPressure,PhysicalActivity,SodiumIntake,BMI,HypertensionPedigreeFunction,Age,Outcome
6,290,109,125,3361,27.5,0.83,31,0
3,286,172,233,4356,32.8,0.14,74,0
10,176,170,7,2705,24.1,0.93,74,0
7,271,151,128,2185,24.7,2.36,25,1
4,154,162,263,3351,35.4,1.6,25,0
6,178,115,156,4935,27.8,1.68,44,0
9,232,155,4,1687,35.6,1.58,79,1
2,285,162,85,3119,36.1,0.5,29,1
6,258,162,174,3187,36.9,1.05,18,0


Unnamed: 0,NumMedicalVisits,Cholesterol,BloodPressure,PhysicalActivity,SodiumIntake,BMI,HypertensionPedigreeFunction,Age,Outcome
0,6,290,109,125,3361,27.5,0.83,31,0
1,3,286,172,233,4356,32.8,0.14,74,0
2,10,176,170,7,2705,24.1,0.93,74,0
3,7,271,151,128,2185,24.7,2.36,25,1
4,4,154,162,263,3351,35.4,1.6,25,0
5,6,178,115,156,4935,27.8,1.68,44,0
6,9,232,155,4,1687,35.6,1.58,79,1
7,2,285,162,85,3119,36.1,0.5,29,1
8,6,258,162,174,3187,36.9,1.05,18,0
9,10,214,129,289,2484,38.2,1.88,60,1


# 3. Exploratory Data Analysis (EDA)