# **Body Data**



## **Importing Necessary Python Modules**

Python incorporates a variety of open source add-ins called **modules** that enable us to run the code.

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
from IPython.display import Image
import warnings
warnings.simplefilter('ignore', FutureWarning)

## **Context**


The National Center for Health Statistics (NCHS) offers downloadable public-use data files through the Centers for Disease Control and Prevention's (CDC) FTP file server. Users of this service have access to data sets, documentation, and questionnaires from NCHS surveys and data collection systems.

Public-use data files are prepared and disseminated to provide access to the full scope of the data. This allows researchers to manipulate the data in a format appropriate for their analyses. NCHS makes every effort to release data collected through its surveys and data systems in a timely manner.

More information can be found at https://www.cdc.gov/nchs/data_access/ftp_data.htm.

In [2]:
# Replace 'image_url' with the URL of the image you want to display
image_url = 'https://media.istockphoto.com/id/456054995/photo/dna-molecules-and-virtuvian-man.jpg?s=612x612&w=0&k=20&c=v5qZJ5Ty4RwDbyGRx_v-tYd1-LfTZwTi-Aend5Q_sqA='

# Display the image
Image(url=image_url)


## **About the Dataset**
This dataset contains 301 rows corresponding to a sample of Americans. A total of 16 variables are provided as listed below:



**Variables**

| Column     | Description                                                                 |
|------------|-----------------------------------------------------------------------------|
| AGE        | Age in years|
| GENDER     | Gender: 0=female, 1=male|
| PULSE      | Pulse rate in beats per minute (bpm)|
| SYSTOLIC   | Systolic blood pressure (mm Hg)|
| DIASTOLIC  | Diastolic blood pressure (mm Hg)|
| CATEGORY   | Blood Pressure Category based on the table below from the American Heart Association|
| HDL        | HDL cholesterol (mg/dL)|
| LDL        | LDL cholesterol (mg/dL)|
| WHITE      | White blood cell count (1000 cells/µL) |
| RED        | Red blood cell count (million cells/µL)|
| PLATE      | Platelet count (1000 cells/µL)|
| WEIGHT     | Weight (kg)|
| HEIGHT     | Height (cm)|
| WAIST      | Waist circumference (cm)|
| ARM CIRC   | Arm circumference (cm)|
| BMI        | Body mass index (kg/m²)|

## **Blood Pressure Category Table**

The table below from the American Heart Association classifies blood pressure into five (5) categories based on a combination of the individual's systolic and diastolic blood pressure.

In [3]:
# Replace 'image_url' with the URL of the image you want to display
image_url = 'https://www.heart.org/-/media/Images/Health-Topics/High-Blood-Pressure/Rainbow-Chart/blood-pressure-readings-chart.jpg?h=294&iar=0&mw=440&w=440&sc_lang=en'
# Display the image
Image(url=image_url)


## **A Snippet of the Data**

First let's load the data by importing it directly from the url below.


In [4]:
#Check the file path for any errors
file_path = "https://raw.githubusercontent.com/ksuaray/STAT108F24_Projects_Jupyter/main/Project0/Body%20Data.csv"
df = pd.read_csv(file_path)


Next, we can display the data by typing the name of the DataFrame. To ensure we can see all columns, we'll use the *pd.set_option* method.

In [5]:
# Set display options to show all columns
pd.set_option('display.max_columns', None)
df

Unnamed: 0,AGE,GENDER (1=M),PULSE,SYSTOLIC,DIASTOLIC,CATEGORY,HDL,LDL,WHITE,RED,PLATE,WEIGHT,HEIGHT,WAIST,ARM CIRC,BMI
0,43,0,80,100,70,NORMAL,73,68,8.7,4.80,319,98.6,172.0,120.4,40.7,33.3
1,38,0,94,134,94,HYPERTENSION STAGE 2,36,223,6.9,4.47,297,108.2,154.4,120.3,44.3,45.4
2,69,0,58,138,80,HYPERTENSION STAGE 1,40,140,8.1,4.60,286,79.2,155.7,103.5,34.2,32.7
3,44,0,66,114,66,NORMAL,45,136,8.0,4.09,263,64.2,157.6,89.7,32.5,25.8
4,72,0,56,110,72,NORMAL,53,102,6.9,4.15,215,98.2,168.6,115.3,38.5,34.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
295,18,1,72,106,44,NORMAL,40,124,4.0,5.17,221,71.6,172.8,78.1,31.0,24.0
296,67,1,62,136,82,HYPERTENSION STAGE 1,39,62,7.7,3.90,305,110.2,169.1,125.5,39.0,38.5
297,24,1,94,96,62,NORMAL,43,102,7.0,5.29,260,56.3,162.7,78.4,27.9,21.3
298,53,1,86,132,74,HYPERTENSION STAGE 1,42,112,8.4,4.07,75,102.6,181.0,117.7,36.5,31.3


# **ASSIGNMENT 0 - Learning How To Use the Jupyter Notebook**

**INSTRUCTIONS**
Complete the following questions.

## **1. Categorize Variables**
Determine whether the variables below are qualitative or quantitative. If they are quantitative, specify whether they are continuous or discrete.

| Variable         | Qual or Quant | Disc., Cont., or Neither |
|------------------|---------------|--------------------------|
| **Age**          |               |                          |
| **Gender**       |               |                          |
| **Waist**        |               |                          |
| **Category**     |               |                          |

## **2. Frequency Table**

Construct a frequency table for the CATEGORY variable.

In [22]:
#Frequency table
mc = df['CATEGORY'] #SOLUTION
freq_table = pd.value_counts(mc)
freq_table 

CATEGORY
NORMAL                  119
HYPERTENSION STAGE 1     74
ELEVATED                 57
HYPERTENSION STAGE 2     49
HYPERTENSIVE CRISIS       1
Name: count, dtype: int64

In [23]:
len(freq_table)==5

True

In [24]:
# HIDDEN
freq_table[0]==119

119

## **3. Relative Frequency Table**
Create a relative frequency table for the CATEGORY variable.

In [29]:
#Relative frequency table
rel_freq_table = freq_table/len(df) #SOLUTION
rel_freq_table

CATEGORY
NORMAL                  0.396667
HYPERTENSION STAGE 1    0.246667
ELEVATED                0.190000
HYPERTENSION STAGE 2    0.163333
HYPERTENSIVE CRISIS     0.003333
Name: count, dtype: float64

In [38]:
sum(rel_freq_table)==1

True

In [36]:
# HIDDEN
import numpy as np
np.allclose(rel_freq_table[1],0.24666666666666667, rtol=1e-03, atol=1e-03)

False

## **4. Frequency Distribution Bar Chart**
Create a frequency distribtuion bar chart for the CATEGORY variable.

In [18]:
dfrf = pd.DataFrame(freq_table)
fig = px.bar(x=dfrf.index, y=dfrf['count'], barmode='group',
             title='Frequency Distribution Bar Chart')
fig.update_layout(xaxis_title="Blood Pressure Category")
fig.update_layout(yaxis_title="Frequency")
fig.show()


In [None]:
len(freq_table)==5

True

In [None]:
# HIDDEN
freq_table[0]==119

119

## **5. Describe the Distribution**
Write a few sentences about the distribution of blood pressure category for these individuals.

# **ON YOUR OWN**

## **6. Frequency Table**

Change the variable name in the code below to construct a frequency table for the GENDER variable.

In [44]:
#Frequency table
mc_1 = df['GENDER (1=M)'] #SOLUTION
freq_table_1 = pd.value_counts(mc_1)
freq_table_1

GENDER (1=M)
1    153
0    147
Name: count, dtype: int64

In [49]:
len(freq_table_1)==2

True

In [48]:
# HIDDEN
freq_table_1[0]==147

True

## **7. Relative Frequency Table**
Run the code below to construct a relative frequency table for the GENDER variable.

In [57]:
#Relative frequency table
rel_freq_table_1 = freq_table_1/len(df) #SOLUTION
rel_freq_table_1

GENDER (1=M)
1    0.51
0    0.49
Name: count, dtype: float64

In [51]:
sum(rel_freq_table_1)==1

True

In [56]:
# HIDDEN
rel_freq_table_1[1]==0.51

True

## **8. Frequency Distribution Bar Chart**
Create a RELATIVE frequency distribtuion bar chart for the GENDER variable.

In [59]:
dfrf_1 = pd.DataFrame(rel_freq_table_1)
fig = px.bar(x=dfrf_1.index, y=dfrf_1['count'], barmode='group',
             title='Frequency Distribution Bar Chart')
fig.update_layout(xaxis_title="Gender (1=Male)")
#For your solution, label the vertical axis correctly on the line below
fig.update_layout(yaxis_title="Relative Frequency") #SOLUTION
fig.show()

In [61]:
len(dfrf_1)==2

True

## **9. Describe the Distribution**
Write a few sentences about the distribution of gender for these individuals.

## **10. Save and Turn In**
* After completing 1-9 above, save this as a PDF. Note where you save it.
* Go to Canvas, find the Project 0 assignment, upload the PDF, then Submit.
