## Analyzing Heart Data

### Importing Libraries

In [1]:
import numpy as np

### Loading Heart Data from CSV File

In [2]:
data=np.genfromtxt("heart.csv", delimiter=",", skip_header=True)

In [3]:
data

array([[63.,  1.,  3., ...,  0.,  1.,  1.],
       [37.,  1.,  2., ...,  0.,  2.,  1.],
       [41.,  0.,  1., ...,  0.,  2.,  1.],
       ...,
       [68.,  1.,  0., ...,  2.,  3.,  0.],
       [57.,  1.,  0., ...,  1.,  3.,  0.],
       [57.,  0.,  1., ...,  1.,  2.,  0.]])

### Extracting Relevant Columns

- CP: Chest Pain
- Trestbps: Resting Blood Pressure
- Chol: Cholesterol
- FBS: fasting Blood Sugar
- Rest ECG: Resting Electrocardiograms
- Thalach: Maximum Heart Rate Achieved
- Exang: Exercise Induced Angina
- Oldpeak: ST Depression Caused by Activity in Comparison at Rest
- Slope: ST Segment Shift Relative to Exercise-Induced Increments in Heart Rate
- Ca: Number of Major Vessels
- Thal: Blood Disorder called Thalssemia
- Target: Heart Disease

In [46]:
# slightly filtered data
age=data[:,0]
print(f"age: {age[:5]}")
sex=data[:,1]
print(f"sex: {sex[:5]}")
cp=data[:,2]
print(f"cp: {cp[:5]}")
chol=data[:,4]
print(f"chol: {chol[:5]}")
thalach=data[:,7]
print(f"thalach: {thalach[:5]}")
slope=data[:,10]
print(f"slope: {slope[:5]}")
target=data[:,13]
print(f"target: {target[:5]}")

age: [63. 37. 41. 56. 57.]
sex: [1. 1. 0. 1. 0.]
cp: [3. 2. 1. 1. 0.]
chol: [233. 250. 204. 236. 354.]
thalach: [150. 187. 172. 178. 163.]
slope: [0. 0. 2. 2. 2.]
target: [1. 1. 1. 1. 1.]


### Basic Descriptive Statistics

Mean, Median, Mode, Standard Deviation, etc

In [47]:
# age
print("---Basic Descriptive Statistics on Age---")
print(f"Mean: {round(np.mean(age))}")
print(f"Median: {np.median(age)}")
print(f"Standard Deviation: {round(np.std(age))}")
print(f"Min: {np.min(age)}")
print(f"Max: {np.max(age)}")

---Basic Descriptive Statistics on Age---
Mean: 54
Median: 55.0
Standard Deviation: 9
Min: 29.0
Max: 77.0


In [48]:
# sex
print("---Basic Descriptive Statistics on Sex---")
print(f"Mean: {round(np.mean(sex))}")
print(f"Median: {np.median(sex)}")
print(f"Standard Deviation: {round(np.std(sex))}")
print(f"Min: {np.min(sex)}")
print(f"Max: {np.max(sex)}")

---Basic Descriptive Statistics on Sex---
Mean: 1
Median: 1.0
Standard Deviation: 0
Min: 0.0
Max: 1.0


In [49]:
# cp
print("---Basic Descriptive Statistics on CP---")
print(f"Mean: {round(np.mean(cp))}")
print(f"Median: {np.median(cp)}")
print(f"Standard Deviation: {round(np.std(cp))}")
print(f"Min: {np.min(cp)}")
print(f"Max: {np.max(cp)}")

---Basic Descriptive Statistics on CP---
Mean: 1
Median: 1.0
Standard Deviation: 1
Min: 0.0
Max: 3.0


In [50]:
# Chol
print("---Basic Descriptive Statistics on Chol---")
print(f"Mean: {round(np.mean(chol))}")
print(f"Median: {np.median(chol)}")
print(f"Standard Deviation: {round(np.std(chol))}")
print(f"Min: {np.min(chol)}")
print(f"Max: {np.max(chol)}")

---Basic Descriptive Statistics on Chol---
Mean: 246
Median: 240.0
Standard Deviation: 52
Min: 126.0
Max: 564.0


In [51]:
# thalach
print("---Basic Descriptive Statistics on Thalach---")
print(f"Mean: {round(np.mean(thalach))}")
print(f"Median: {np.median(thalach)}")
print(f"Standard Deviation: {round(np.std(thalach))}")
print(f"Min: {np.min(thalach)}")
print(f"Max: {np.max(thalach)}")

---Basic Descriptive Statistics on Thalach---
Mean: 150
Median: 153.0
Standard Deviation: 23
Min: 71.0
Max: 202.0


In [52]:
# slope
print("---Basic Descriptive Statistics on Slope---")
print(f"Mean: {round(np.mean(slope))}")
print(f"Median: {np.median(slope)}")
print(f"Standard Deviation: {round(np.std(slope))}")
print(f"Min: {np.min(slope)}")
print(f"Max: {np.max(slope)}")

---Basic Descriptive Statistics on Slope---
Mean: 1
Median: 1.0
Standard Deviation: 1
Min: 0.0
Max: 2.0


### Data Filtering

In [53]:
print(f"Number of rows before filtering: {len(data)}")

Number of rows before filtering: 303


In [54]:
# How many people who are over the age of 50 and have high cholesterol
filtered_data=data[(age>50) & (chol>240)]
print(f"Number of rows after filtering: {len(filtered_data)}")

Number of rows after filtering: 111


### How Many Entries in the Dataset Where:
* Chol over or equal to 240 and 
* Age over or equal to 50 and 
* CP over or equal to 2

In [55]:
filtered_data1=data[(age>=50) & (chol>=240)&(cp>=2)]
print(f"Number of rows after filtering: {len(filtered_data1)}")

Number of rows after filtering: 37


### Number of Unique Categories
* Unique can work for discreet variables

In [56]:
print(f"Unique Age Counts: {np.unique(age)}")
print(f" How Many Unique Ages: {len(np.unique(age))}")

Unique Age Counts: [29. 34. 35. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51.
 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69.
 70. 71. 74. 76. 77.]
 How Many Unique Ages: 41


In [57]:
print(f"Unique Cholesterol Counts: {np.unique(chol)}")
print(f" How Many Unique Cholesterols: {len(np.unique(chol))}")

Unique Cholesterol Counts: [126. 131. 141. 149. 157. 160. 164. 166. 167. 168. 169. 172. 174. 175.
 176. 177. 178. 180. 182. 183. 184. 185. 186. 187. 188. 192. 193. 195.
 196. 197. 198. 199. 200. 201. 203. 204. 205. 206. 207. 208. 209. 210.
 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224.
 225. 226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237. 239.
 240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 252. 253. 254.
 255. 256. 257. 258. 259. 260. 261. 262. 263. 264. 265. 266. 267. 268.
 269. 270. 271. 273. 274. 275. 276. 277. 278. 281. 282. 283. 284. 286.
 288. 289. 290. 293. 294. 295. 298. 299. 300. 302. 303. 304. 305. 306.
 307. 308. 309. 311. 313. 315. 318. 319. 321. 322. 325. 326. 327. 330.
 335. 340. 341. 342. 353. 354. 360. 394. 407. 409. 417. 564.]
 How Many Unique Cholesterols: 152


In [58]:
print(f"Unique CP Counts: {np.unique(cp)}")
print(f" How Many Unique CPs: {len(np.unique(cp))}")

Unique CP Counts: [0. 1. 2. 3.]
 How Many Unique CPs: 4


### Correlation 
Default rowvar= True ( calculate the corelation for each rows)
rowvar= False (each column represent a variable)

In [59]:
# age and chol correlation
corr_age_chol=np.corrcoef(age,chol,rowvar=False)
print(f"Corr price and area {corr_age_chol}")

Corr price and area [[1.         0.21367796]
 [0.21367796 1.        ]]


In [60]:
# age and cp correlation
corr_age_cp=np.corrcoef(age,cp,rowvar=False)
print(f"Corr price and area {corr_age_cp}")

Corr price and area [[ 1.         -0.06865302]
 [-0.06865302  1.        ]]


In [61]:
# cp and chol correlation
corr_cp_chol=np.corrcoef(cp,chol,rowvar=False)
print(f"Corr price and area {corr_cp_chol}")

Corr price and area [[ 1.         -0.07690439]
 [-0.07690439  1.        ]]


###  Correlation Analysis
- Age and Chol: 21% (weak positive)
- Age and CP: -7% (very weak negative) 
- Chol and CP: -8% (very weak negative)

### Summary

This dataset shows that there is a weak positive correlation between age and cholesterol in determining if a person has a heart disease. There is also almost no correlation between age and cp as well as cp and cholesterol. This shows that as a person's age goes up, their cholesterol is likely to go up as well. This is also not reliant on cp so as the age and cholesterol go up, cp is not likely to go up since it is not correlated. Overall, the dataset shows that a person who is older and has higher cholesterol are likely to have heart disease.