In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("Project_0_Body_Data.ipynb")

# **Body Data**



## **Importing Necessary Python Modules**

Python incorporates a variety of open source add-ins called **modules** that enable us to run the code.

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
from IPython.display import Image
import warnings
warnings.simplefilter('ignore', FutureWarning)

## **Context**


The National Center for Health Statistics (NCHS) offers downloadable public-use data files through the Centers for Disease Control and Prevention's (CDC) FTP file server. Users of this service have access to data sets, documentation, and questionnaires from NCHS surveys and data collection systems.

Public-use data files are prepared and disseminated to provide access to the full scope of the data. This allows researchers to manipulate the data in a format appropriate for their analyses. NCHS makes every effort to release data collected through its surveys and data systems in a timely manner.

More information can be found at https://www.cdc.gov/nchs/data_access/ftp_data.htm.

In [None]:
# Replace 'image_url' with the URL of the image you want to display
image_url = 'https://media.istockphoto.com/id/456054995/photo/dna-molecules-and-virtuvian-man.jpg?s=612x612&w=0&k=20&c=v5qZJ5Ty4RwDbyGRx_v-tYd1-LfTZwTi-Aend5Q_sqA='

# Display the image
Image(url=image_url)


## **About the Dataset**
This dataset contains 301 rows corresponding to a sample of Americans. A total of 16 variables are provided as listed below:



**Variables**

| Column     | Description                                                                 |
|------------|-----------------------------------------------------------------------------|
| AGE        | Age in years|
| GENDER     | Gender: 0=female, 1=male|
| PULSE      | Pulse rate in beats per minute (bpm)|
| SYSTOLIC   | Systolic blood pressure (mm Hg)|
| DIASTOLIC  | Diastolic blood pressure (mm Hg)|
| CATEGORY   | Blood Pressure Category based on the table below from the American Heart Association|
| HDL        | HDL cholesterol (mg/dL)|
| LDL        | LDL cholesterol (mg/dL)|
| WHITE      | White blood cell count (1000 cells/µL) |
| RED        | Red blood cell count (million cells/µL)|
| PLATE      | Platelet count (1000 cells/µL)|
| WEIGHT     | Weight (kg)|
| HEIGHT     | Height (cm)|
| WAIST      | Waist circumference (cm)|
| ARM CIRC   | Arm circumference (cm)|
| BMI        | Body mass index (kg/m²)|

## **Blood Pressure Category Table**

The table below from the American Heart Association classifies blood pressure into five (5) categories based on a combination of the individual's systolic and diastolic blood pressure.

In [None]:
# Replace 'image_url' with the URL of the image you want to display
image_url = 'https://www.heart.org/-/media/Images/Health-Topics/High-Blood-Pressure/Rainbow-Chart/blood-pressure-readings-chart.jpg?h=294&iar=0&mw=440&w=440&sc_lang=en'
# Display the image
Image(url=image_url)


## **A Snippet of the Data**

First let's load the data by importing it directly from the url below.


In [None]:
#Check the file path for any errors
file_path = "https://raw.githubusercontent.com/ksuaray/STAT108F24_Projects_Jupyter/main/Project0/Body%20Data.csv"
df = pd.read_csv(file_path)


Next, we can display the data by typing the name of the DataFrame. To ensure we can see all columns, we'll use the *pd.set_option* method.

In [None]:
# Set display options to show all columns
pd.set_option('display.max_columns', None)
df

# **ASSIGNMENT 0 - Learning How To Use the Jupyter Notebook**

**INSTRUCTIONS**
Complete the following questions.

## **1. Categorize Variables**
Determine whether the variables below are qualitative or quantitative. If they are quantitative, specify whether they are continuous or discrete.

| Variable         | Qual or Quant | Disc., Cont., or Neither |
|------------------|---------------|--------------------------|
| **Age**          |               |                          |
| **Gender**       |               |                          |
| **Waist**        |               |                          |
| **Category**     |               |                          |

## **2. Frequency Table**

Construct a frequency table for the CATEGORY variable.

In [None]:
#Frequency table
mc = ...
freq_table = pd.value_counts(mc)
freq_table 

In [None]:
grader.check("q2")

## **3. Relative Frequency Table**
Create a relative frequency table for the CATEGORY variable.

In [None]:
#Relative frequency table
rel_freq_table = ...
rel_freq_table

In [None]:
grader.check("q3")

## **4. Frequency Distribution Bar Chart**
Create a frequency distribtuion bar chart for the CATEGORY variable.

In [None]:
dfrf = pd.DataFrame(freq_table)
fig = px.bar(x=dfrf.index, y=dfrf['count'], barmode='group',
             title='Frequency Distribution Bar Chart')
fig.update_layout(xaxis_title="Blood Pressure Category")
fig.update_layout(yaxis_title="Frequency")
fig.show()


In [None]:
grader.check("q4")

## **5. Describe the Distribution**
Write a few sentences about the distribution of blood pressure category for these individuals.

# **ON YOUR OWN**

## **6. Frequency Table**

Change the variable name in the code below to construct a frequency table for the GENDER variable.

In [None]:
#Frequency table
mc_1 = ...
freq_table_1 = pd.value_counts(mc_1)
freq_table_1

In [None]:
grader.check("q6")

## **7. Relative Frequency Table**
Run the code below to construct a relative frequency table for the GENDER variable.

In [None]:
#Relative frequency table
rel_freq_table_1 = ...
rel_freq_table_1

In [None]:
grader.check("q7")

## **8. Frequency Distribution Bar Chart**
Create a RELATIVE frequency distribtuion bar chart for the GENDER variable.

In [None]:
dfrf_1 = pd.DataFrame(rel_freq_table_1)
fig = px.bar(x=dfrf_1.index, y=dfrf_1['count'], barmode='group',
             title='Frequency Distribution Bar Chart')
fig.update_layout(xaxis_title="Gender (1=Male)")
#For your solution, label the vertical axis correctly on the line below
...
fig.show()

In [None]:
grader.check("q8")

## **9. Describe the Distribution**
Write a few sentences about the distribution of gender for these individuals.

## **10. Save and Turn In**
* After completing 1-9 above, save this as a PDF. Note where you save it.
* Go to Canvas, find the Project 0 assignment, upload the PDF, then Submit.


## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)