In [46]:
%%html
<style>
.toc-item > li {
    list-style-type: upper-alpha;
}
</style>

## Kamal Shaham         
## D210: Data Dashboard and Storytelling

<div>
    <h2>Table of Contents</h2>
    <ul class="toc-item">
        <li><a href="#datasets">Interactive Data Dashboard</a>
            <ul>
                <li><a href="#datasets">Dashboard Datasets</a></li>
                <li><a href="#installation">Dashboard Installation</a></li>
                <li><a href="#usage">Dashboard Usage</a></li>
            </ul>
        </li>
        <li><a href="#story">Storytelling with Data - Panopto</a>
        </li>
        <li><a href="#dashboard-purpose">Reflection Paper</a>
        <ul>
                <li><a href="#dashboard-purpose">Dashboard Purpose Alignment</a></li>
                <li><a href="#dataset-insights">Additional Dataset Insights</a></li>
                <li><a href="#data-representations">Data Representations</a></li>
            <li><a href="#dashboard-controls">Dashboard Controls</a></li>
            <li><a href="#accessibility">Accessibility Support</a></li>
            <li><a href="#data-rep-insights">Data Representation Insights</a></li>
            <li><a href="#audience">Audience Analysis</a></li>
            <li><a href="#universal-access">Universal Access</a></li>
            <li><a href="#storytelling">Storytelling Elements</a></li>
            </ul>
        </li>
        <li><a href="#thirdparty">Third-party code references</a></li>
        <li><a href="#references">References</a></li>
    </ul>
</div>

## A. Dashboard Datasets <a id="datasets"></a>

For convenience, the dashboard used for this analysis is published via Tableau Public for universal access (https://public.tableau.com/app/profile/kamal.shaham/viz/MedicalDataDashboardWGUD210/Story1?publish=yes). This allows easy viewing of the analysis dashboard without the need to install Tableau or any other programs. The analysis utilizes both the WGU medical dataset and the 2017-2018 National Health & Nutrition Examination Survey from the Centers for Disease Control and Prevention (CDC). Throughout this data analysis, we refer to the WGU medical dataset as WGU patients and the NHANES dataset as NHS patients.

Using the code from previous D208 and D209 classes, the data from the WGU dataset is reduced to only include significant columns relevant to this analysis. The WGU dataset was checked for both missing and duplicate values. Column values for patient diagnosis needed to be converted to boolean formats instead of 'Yes' and 'No'. According to our NHANES dictionary, patients over the age of 80 were not included, patients over the age of 80 in the WGU dataset were recorded as 80 to match the NHANES dataset.

For the NHANES dataset, the column names are coded to a format used by the CDC. The CDC data dictionaries referenced are used to decode the column names into a readable format. Both the demographic and medical questionnaires used these data dictionaries for decoding the NHANES dataset. Missing values were checked in the dataset, and for any patient diagnosis relevant to this analysis that did not have a value, it was assumed the patient tested negative and was set to False. The WGU dataset does not include patients under 18, while the NHANES does. All patients under the age of 18 were filtered out of the NHANES dataset to match the results from WGU. After reducing, the remaining patients in the NHANES dataset consist of 5,856 rows, nearly half of the WGU dataset which is 10,000 rows. 

In [47]:
%matplotlib inline

# importing our statistical libraries
import pandas as pd

# importing our initial dataset
wgu=pd.read_csv('medical_clean.csv')

#viewing first 5 rows and column information
print(wgu.head())
print(wgu.columns)

#checking for missing/null values
print(wgu.isnull().sum())

#checking for duplicate values of any rows
print(wgu.duplicated().any())

# checking for duplicate values based on customer_id unique key
print(wgu.duplicated('Customer_id').any())

# encoding categorical binary columns as booleans.
bin_cols = ['ReAdmis', 'HighBlood', 'Stroke', 'Arthritis', 'Diabetes', 
            'Anxiety', 'Asthma', 'Soft_drink', 'Overweight', 'Allergic_rhinitis', 'BackPain', 'Hyperlipidemia', 'Reflux_esophagitis']
bin_dict = {'Yes': True, 'No': False}
for col in bin_cols:
    wgu[col] = wgu[col].replace(bin_dict)

# converting column to category from string
wgu["Marital"] = wgu["Marital"].astype("category")

# creating a dataframe with only the needed columns for this analysis in comparison with additional dataset
wgu = wgu[["Age", "Gender", "Children", "Arthritis", "Diabetes", "HighBlood", "Hyperlipidemia", "Overweight", "Stroke"]]

# since additonal dataset caps ages at 80, reduce any age over 80 to 80 instead
wgu.loc[wgu["Age"] > 80, "Age"] = 80

# adding a source column to WGU dataset
wgu["Source"] = "WGU"

# view new dataset after selecting needed columns
print(wgu.head())
print(wgu.columns)
print(wgu.info())

# variable statistics to check distributions
print(wgu.describe(include='all'))

   CaseOrder Customer_id                           Interaction  \
0          1     C412403  8cd49b13-f45a-4b47-a2bd-173ffa932c2f   
1          2     Z919181  d2450b70-0337-4406-bdbb-bc1037f1734c   
2          3     F995323  a2057123-abf5-4a2c-abad-8ffe33512562   
3          4     A879973  1dec528d-eb34-4079-adce-0d7a40e82205   
4          5     C544523  5885f56b-d6da-43a3-8760-83583af94266   

                                UID          City State        County    Zip  \
0  3a83ddb66e2ae73798bdf1d705dc0932           Eva    AL        Morgan  35621   
1  176354c5eef714957d486009feabf195      Marianna    FL       Jackson  32446   
2  e19a0fa00aeda885b8a436757e889bc9   Sioux Falls    SD     Minnehaha  57110   
3  cd17d7b6d152cb6f23957346d11c3f07  New Richland    MN        Waseca  56072   
4  d2f0425877b10ed6bb381f3e2579424a    West Point    VA  King William  23181   

        Lat       Lng  ...  TotalCharge Additional_charges Item1 Item2  Item3  \
0  34.34960 -86.72508  ...  3726.702860  

In [4]:
import pandas as pd

# importing additional dataset from the NHANES survey
df = pd.read_csv('./demographics.csv')

# Get the column names
column_names = df.columns.tolist()

# Print the column names in a comma-separated list
print(', '.join(column_names))
print(df.describe())

SEQN, SDDSRVYR, RIDSTATR, RIAGENDR, RIDAGEYR, RIDAGEMN, RIDRETH1, RIDRETH3, RIDEXMON, RIDEXAGM, DMQMILIZ, DMQADFC, DMDBORN4, DMDCITZN, DMDYRSUS, DMDEDUC3, DMDEDUC2, DMDMARTL, RIDEXPRG, SIALANG, SIAPROXY, SIAINTRP, FIALANG, FIAPROXY, FIAINTRP, MIALANG, MIAPROXY, MIAINTRP, AIALANGA, DMDHHSIZ, DMDFMSIZ, DMDHHSZA, DMDHHSZB, DMDHHSZE, DMDHRGND, DMDHRAGZ, DMDHREDZ, DMDHRMAZ, DMDHSEDZ, WTINT2YR, WTMEC2YR, SDMVPSU, SDMVSTRA, INDHHIN2, INDFMIN2, INDFMPIR
                SEQN  SDDSRVYR     RIDSTATR     RIAGENDR     RIDAGEYR  \
count    9254.000000    9254.0  9254.000000  9254.000000  9254.000000   
mean    98329.500000      10.0     1.940566     1.507564    34.334234   
std      2671.544029       0.0     0.236448     0.499970    25.500280   
min     93703.000000      10.0     1.000000     1.000000     0.000000   
25%     96016.250000      10.0     2.000000     1.000000    11.000000   
50%     98329.500000      10.0     2.000000     2.000000    31.000000   
75%    100642.750000      10.0     2.00

In [49]:
# based on the NHANES data dictionary, children in household are separated into less than 5 years and between 6-17 years old 
# creating a new column and adding both together for analysis 
nhs_demographic["NHS_Children"] = nhs_demographic["DMDHHSZB"] + nhs_demographic["DMDHHSZA"]

In [50]:
# renaming columns to a readable format for analysis
nhs_demographic.rename(columns={"NHS_Children" : "Children", "RIAGENDR" : "Gender", "RIDAGEYR" : "Age"}, inplace = True)

In [51]:
# remapping gender to string values
gender = {2 : "Female", 1: "Male"}
nhs_demographic["Gender"] = nhs_demographic["Gender"].map(gender)

In [52]:
# filtering dataset to only the columns needed for analysis from demographics dataset
nhs_demographic = nhs_demographic[["Age", "Gender", "Children"]]

In [6]:
import pandas as pd

# importing the questionnaire dataset to select relevant diagnosis columns
df = pd.read_csv('./questionnaire.csv')

# Get the column names
column_names = df.columns.tolist()

# Print the column names in a comma-separated list
print(', '.join(column_names))
print(df.describe())

SEQN, ACD011A, ACD011B, ACD011C, ACD040, ACD110, ALQ111, ALQ121, ALQ130, ALQ142, ALQ270, ALQ280, ALQ290, ALQ151, ALQ170, AUQ054, AUQ060, AUQ070, AUQ080, AUQ090, AUQ400, AUQ410A, AUQ410B, AUQ410C, AUQ410D, AUQ410E, AUQ410F, AUQ410G, AUQ410H, AUQ410I, AUQ410J, AUQ156, AUQ420, AUQ430, AUQ139, AUQ144, AUQ147, AUQ149A, AUQ149B, AUQ149C, AUQ153, AUQ630, AUQ440, AUQ450A, AUQ450B, AUQ450C, AUQ450D, AUQ450E, AUQ450F, AUQ460, AUQ470, AUQ101, AUQ110, AUQ480, AUQ490, AUQ191, AUQ250, AUQ255, AUQ260, AUQ270, AUQ280, AUQ500, AUQ300, AUQ310, AUQ320, AUQ330, AUQ340, AUQ350, AUQ360, AUQ370, AUQ510, AUQ380, BPQ020, BPQ030, BPD035, BPQ040A, BPQ050A, BPQ080, BPQ060, BPQ070, BPQ090D, BPQ100D, CDQ001, CDQ002, CDQ003, CDQ004, CDQ005, CDQ006, CDQ009A, CDQ009B, CDQ009C, CDQ009D, CDQ009E, CDQ009F, CDQ009G, CDQ009H, CDQ008, CDQ010, CBD071, CBD091, CBD111, CBD121, CBD131, CBQ502, CBQ503, CBQ506, CBQ536, CBQ541, CBQ551, CBQ581, CBQ586, CBQ830, CBQ835, CBQ840, CBQ845, CBQ850, CBQ855, CBQ860, CBQ865, CBQ870, CBQ875, 

In [54]:
# renaming columns into a readable format for analysis
nhs_questionnaire.rename(columns={"MCQ160A" : "Arthritis", "BPQ080" : "Hyperlipidemia", "MCQ160F" : "Stroke", "MCQ080" : "Overweight", "DIQ010" : "Diabetes", "BPQ020": "HighBlood"}, inplace = True)

# remapping health condition codes to booleans for analysis
health_codes = {1: True, 2: False, 7: False, 9: False}
# overweight remapping
nhs_questionnaire["Overweight"] = nhs_questionnaire["Overweight"].map(health_codes)
# stroke remapping
nhs_questionnaire["Stroke"] = nhs_questionnaire["Stroke"].map(health_codes)
# hyperlipidemia remapping
nhs_questionnaire["Hyperlipidemia"] = nhs_questionnaire["Hyperlipidemia"].map(health_codes)
# arthritis remapping
nhs_questionnaire["Arthritis"] = nhs_questionnaire["Arthritis"].map(health_codes)
# diabetes remapping
nhs_questionnaire["Diabetes"] = nhs_questionnaire["Diabetes"].map(health_codes)
# high blood pressure remapping
nhs_questionnaire["HighBlood"] = nhs_questionnaire["HighBlood"].map(health_codes)

# selecting only columns needed for this analysis
nhs_questionnaire = nhs_questionnaire[["Diabetes", "HighBlood", "Arthritis", "Stroke", "Hyperlipidemia", "Overweight"]]

In [55]:
# merging the NHANES demographics and medical questionnaires into one single dataframe
nhanes = nhs_demographic.merge(nhs_questionnaire, on = "SEQN")
nhanes.copy()
nhanes.head()

Unnamed: 0_level_0,Age,Gender,Children,Diabetes,HighBlood,Arthritis,Stroke,Hyperlipidemia,Overweight
SEQN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
93703.0,2.0,Female,3.0,False,,,,,
93704.0,2.0,Male,2.0,False,,,,,
93705.0,66.0,Female,0.0,False,True,True,False,False,False
93706.0,18.0,Male,0.0,False,False,,,False,False
93707.0,13.0,Male,3.0,False,,,,,


In [56]:
nhanes.info()

#check for missing/null values
print(nhanes.isnull().sum())

<class 'pandas.core.frame.DataFrame'>
Index: 9254 entries, 93703.0 to 102956.0
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Age             9254 non-null   float64
 1   Gender          9254 non-null   object 
 2   Children        9254 non-null   float64
 3   Diabetes        8713 non-null   object 
 4   HighBlood       6161 non-null   object 
 5   Arthritis       5569 non-null   object 
 6   Stroke          5569 non-null   object 
 7   Hyperlipidemia  6161 non-null   object 
 8   Overweight      6161 non-null   object 
dtypes: float64(2), object(7)
memory usage: 723.0+ KB
Age                  0
Gender               0
Children             0
Diabetes           541
HighBlood         3093
Arthritis         3685
Stroke            3685
Hyperlipidemia    3093
Overweight        3093
dtype: int64


In [57]:
# based on the code above, there are no missing values for Children or Age, only for conditions
# if missing value for a condition, we assume patient does not have that condition, will set to False.
nhanes.fillna(False, inplace=True)
# WGU dataset doesn't include minors, NHANES dataset does, so excluding patients below 18 from NHANES for accurate comparison
nhanes = nhanes[nhanes['Age'] > 17]

In [58]:
# adding a source column to NHANES data
nhanes.loc[:, 'Source'] = "NHS"
nhanes.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5856 entries, 93705.0 to 102956.0
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Age             5856 non-null   float64
 1   Gender          5856 non-null   object 
 2   Children        5856 non-null   float64
 3   Diabetes        5856 non-null   bool   
 4   HighBlood       5856 non-null   bool   
 5   Arthritis       5856 non-null   bool   
 6   Stroke          5856 non-null   bool   
 7   Hyperlipidemia  5856 non-null   bool   
 8   Overweight      5856 non-null   bool   
 9   Source          5856 non-null   object 
dtypes: bool(6), float64(2), object(2)
memory usage: 263.1+ KB


In [59]:
# saving dataframe to CSV
nhanes.to_csv('NHANES_cleaned.csv', index=False)

In [60]:
# merging datasets into a single one so that it can be imported into Tableau
final_data = pd.concat([wgu, nhanes], ignore_index=True)
print(final_data.head())

# save combined datasets to a CSV
final_data.to_csv('combined_WGU_NHANES.csv', index=False)

    Age  Gender  Children  Arthritis  Diabetes  HighBlood  Hyperlipidemia  \
0  53.0    Male       1.0       True      True       True           False   
1  51.0  Female       3.0      False     False       True           False   
2  53.0  Female       3.0      False      True       True           False   
3  78.0    Male       0.0       True     False      False           False   
4  22.0  Female       1.0      False     False      False            True   

   Overweight  Stroke Source  
0       False   False    WGU  
1        True   False    WGU  
2        True   False    WGU  
3       False    True    WGU  
4       False   False    WGU  


### A2. Dashboard Installation Instructions <a id="installation"></a>

The data dashboard is available via Tableau Public, installation is not required to view this dashboard from the published link here: https://public.tableau.com/app/profile/kamal.shaham/viz/MedicalDataDashboardWGUD210/Story1?publish=yes. By avoiding the need to install neccessary software to view this dashboard it makes it accessible to a wider range of users with access to a simple web browser.

### A3. Dashboard Usage <a id="usage"></a>

By using the Tableau Public link, you can control access to this dashboard through the four tabs located at the top of the page: Introduction, Patient Demographics, Condition Metrics, and Conclusions. The Introduction tab describes myself, the analyst who created this dashboard and provides a brief overview of the narrative of this analysis story.

The Patient Demographics tab offers visualizations about the WGU and NHANES patients, including patient ages, children counts, and gender breakdowns. These visualizations can be filtered using the age and gender filters at the top of each tab. Modifying the age filter allows the visualizations to apply to patients within a specific age range. Setting the age filter between 20 and 30 will only display demographics for patients within this age group. A similar scenario applies to the gender filter; selecting all or specific genders will display visualizations for those selected. The visualizations themselves also have built-in filters. Selecting any of the demographic visualizations will filter the rest of the visualizations—for example, selecting the female portion of the pie chart will cause the rest of the visualizations to display only female patients. Deselecting the visualization will revert the rest of the demographics back to their initial state.

The Condition Metrics tab is also an interactive dashboard where visualizations for each medical condition are displayed. The condition rates for each source of patients are broken down into key performance indicators (KPIs) at the bottom of this dashboard, showing the breakdown of the percentage of patients with these conditions. Heatmaps of the condition rates are displayed in 5-year age groups for readability. The patient counts from the NHANES survey are nearly half of the WGU patient dataset. Filtering of these visualizations and KPIs is available via the gender and age filters at the top of this dashboard. Selecting any of the visualizations or KPIs also filters the rest of the metrics in the same way as the Patient Demographics tab.

The Conclusions tab provides key takeaways from the previous two tabs and insights backed by evidence from both data sources. These conclusions are intended to present potential areas for initiatives to improve patient care. Hospital executives can make decisions on resources pertaining to diabetes care and care for elderly patients, which are highlighted in this tab to support the narrative of WGU's hospital resource planning. By enhancing care in these two areas, patients are likely to have lower readmission rates as well.

## B. Storytelling with Data - Panopto<a id="story"></a>

https://wgu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=5182102c-ac0c-4902-8a71-b17200161555

## C. Dashboard Purpose<a id="dashboard-purpose"></a>

In the previous courses D208 and D209, the WGU medical dataset was used to analyze readmission rates among patients. This dashboard does not aim to analyze readmissions but rather looks at specific trends and demographics within both datasets. By taking a broader approach and examining medical condition rates and demographics among patients, it provides a clearer view of the dataset while highlighting additional insights. The insights in-turn can lead to initiative which improve patient care, thus reducing overall admission rates for the WGU's hospital.

For the purpose of this analysis, WGU has a hospital that conducted a study with patients across the country. The demographics tab on the dashboard aims to show the differences and similarities in the WGU study and the NHANES study among patient demographics. By providing gender, age, and children breakdowns for each dataset, WGU can analyze trends among specific demographics for their hospital research. For example, the NHANES study shows a lower count of older patients compared with the WGU patients. This may indicate a need to focus additional resources on caring for older patients based on this data.

The condition metrics tab on this dashboard highlights patient health conditions. By visualizing the breakdown of significant health conditions, leaders at WGU's hospital can gain important insights into where to allocate resources. Patients who are overweight, have diabetes, high blood pressure, or arthritis are displayed in 5-year age groups. Key Performance Indicators (KPIs) are displayed at the top of this dashboard to show the rates of each of these diseases for each dataset. An example from this tab shows that WGU's patients are diagnosed with diabetes at a much higher rate compared with the NHANES patients. This insight can lead to an increase in screenings and preventative care for these patients.

By analyzing trends and insights from this dashboard as highlighted in the conclusions tab, the WGU hospital can gain a better understanding of their patients. While this analysis does not directly address readmission rates, the demographics and critical condition rates of their patients are important to any hospital. Ideally, enhancing patient care and focusing resources based on this analysis will lead to a decrease in hospital readmission rates overall.

### C2. Additional Dataset Insights<a id="dataset-insights"></a>

According to the WGU data dictionary, patient conditions are noted at the time of admission. However, it is not clear from this dataset whether these condition rates are normal or average for a patient sample of this size. Based on the scenario, the hospital chain has patients in almost every state. Ideally, this is a consistent sample that allows analysts to gain insights into this data without having doubts about its accuracy.

The National Health & Nutrition Examination Survey (NHANES) 2017-2018 is the last dataset whose data collection was not affected by COVID-19, which could have potentially influenced these results. The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The survey examines a nationally representative sample of about 5,000 persons each year.

The NHANES study contains hundreds of columns when collecting data from patients. By using the dictionary provided by the CDC, the following columns were converted to a readable format and used in comparison with the results from the WGU study: 

- Patient Gender
- Patient Age
- Number of Children
- Patient Overweight
- Patient High Blood Pressure
- Patient Hyperlipidemia
- Patient Stroke
- Patient Arthritis
- Patient Diabetes

By combining WGU's hospital study dataset with the National Health & Nutrition Examination Survey (NHANES) 2017-2018 results, analysts can compare and contrast patient condition rates and demographics. As the NHANES study indicates, patients have much lower rates of diabetes when surveyed across the U.S. compared to the results from WGU's clinical study. From the condition metrics tab, we see that 27.38% of WGU patients aged 18-80 have diabetes, while the NHANES study shows that 14.99% of patients within the same age group have diabetes. Since the NHANES data is collected from across the U.S., the results from WGU appear to be higher than normal for this particular medical condition. This discrepancy can lead to resources being allocated towards patient screenings and care for conditions that are seen more frequently in WGU clinics.

### C3. Data Representations<a id="data-representations"></a>

The Patient Demographics tab displays visualizations of patient ages, gender, and number of children for each patient. The patient age visualizations break down the age groups into 5-year increments for enhanced readability. The height of each bar highlights the distribution of ages across the dataset. A user can quickly compare visualizations from the datasets to see how evenly the age groups are distributed.

The Condition Metrics tab contains metrics of the patients, with the Key Performance Indicators (KPIs) at the top of the dashboard. Patient ages are again broken down into 5-year groups starting from 15-19 (only including patients 18 and older) and increasing by five years for each group. The table at the top displays the condition rates for both datasets for each significant condition as KPIs. Below the condition rates table, heatmaps display the breakdown of conditions based on age, if there are any differences between groups, they will be shown within these heatmaps. Given that the NHANES dataset is half the size of the WGU study, the results can be somewhat skewed towards a higher positive rate for the WGU data. The filters at the top of this dashboard allow users to analyze the results of these condition rates and heatmaps to show the breakdown of conditions by both age and gender, providing additional insights.

### C4. Dashboard Controls<a id="dashboard-controls"></a>

Both the demographics and condition metrics tabs within the dashboard are interactive, and visualizations on both can be filtered by age or gender via controls at the top of each tab. Modifying the age slider to a specific range will display the visualizations for only that age group. Selecting one or more genders from the gender filter will perform similar filters on the visualizations. By providing these filters, the dashboard offers users a more granular view of the condition rates and demographics for a particular subset of patients. For example, selecting the Male gender within the 40-60 age group can reveal the condition rates for these patients without the need to write a SQL query or generate new graphs.

The visualizations themselves are also interactive; selecting any bar within the graphs will filter the rest of the visualizations. For example, selecting patients who have 0 children on the bar graph will filter the other visualizations to show only patients without children. These filters can also be used in combination with the gender and age filters to provide a more comprehensive view of a specific set of patients.

### C5. Accessibility Support<a id="accessibility"></a>

According to the article "The Best Charts for Color Blind Viewers" from Datylon, there are two main colors: red and blue that can still be distinguished by users with colorblindness. These colors have been primarily selected for visualizations throughout this analysis to support accessibility. Both tabs in the dashboard make use of red and blue in the visualizations, with orange also used as part of the legend for gender visualizations. Based on the graphics from Datylon, users with specific types of colorblindness, such as green-blind or red-blind, will perceive red as a shade of brown. For these users, the blue visualizations should remain the same color.


### C6. Data Representation Insights<a id="data-rep-insights"></a>

For this analysis, WGU has conducted a study with patient medical data at its hospital. The visualizations that support this story by providing interesting insights include the age visualizations and diabetes condition metrics. For WGU, the number of patients at or over the age of 80 is vastly greater than in the NHANES study. By creating an age comparison bar graph between the two sources, the age distributions are clearly shown for each group. Patients aged 80 and over represent the largest number of patients in both datasets, as highlighted in this bar graph.

According to the data dictionary from the CDC, anyone over the age of 80 was categorized as age 80 for the NHANES study. To align with this collection method, patients over age 80 in the WGU study were also categorized as age 80. Despite this distinction, the number of patients aged 80 and over that WGU hospital has cared for is still significantly higher than the samples taken from the NHANES study. This difference raises an interesting question of whether additional resources for elderly patient care should be implemented. By increasing resources for these patients, readmission rates are likely to decrease as well.

Another visualization that supports the story of WGU's hospital clinic is the condition metrics. By comparing condition rates from the NHANES study, WGU can analyze whether these rates or Key Performance Indicators (KPIs) are below or above normal levels. Without a dataset that samples patients from across the country for comparison, this information lacked a reliable frame of reference. The table of condition rates for both sources shows the diabetes metric as being much higher for the WGU hospital compared to the NHANES data. The diabetes heatmap displays these rates by age group and further highlights this particular condition by showing which age groups had the highest levels of diabetes. This insight can potentially influence the support for directing additional resources to diabetes screenings and care to ideally prevent patient readmissions.

### C7. Audience Analysis<a id="audience"></a>

Using the scenario mentioned in the WGU data dictionary, the target audience, including the Senior Vice President of Hospital Operations (SVP), Vice President of Research (VP), a panel of Regional Vice Presidents (Regional VPs), and Data Analytics peers, aligns well with the story presented in this analysis. By focusing on demographics and significant conditions, the comparison of trends against a national sample provided by the CDC provides a concrete frame of reference for this analysis. The insight of higher diabetes rates compared to the NHANES sample would be of interest to the SVP and VP, as they are involved in developing or researching initiatives to improve patient outcomes. A majority of the audience, particularly the SVP, would also likely be interested to note the high number of elderly patients that WGU's hospital is serving. These executives may be able to make new decisions on initiatives and resource allocations from this analysis, specifically within diabetes research or elderly patient care. The data analytics peers will also benefit from visualizations from multiple data sources that support users with colorblindness, along with the various filters and controls that this dashboard supports.

### C8. Universal Access<a id="universal-access"></a>

The combination of using Tableau Public and Panopto allow for anyone to view either of these presentations with a simple web browser. Expensive or resource intentensive software will not be required to access these presentation, making it widely accessible. Panopto provides both visual and audio formats for users with specific impairments. The dashboard also caters to users with colorblindness by intentionally using specifc colors. Every visualization includes titles and descriptive labels to intentionally allow support for text-to-audio software.


### C9. Storytelling Elements<a id="storytelling"></a>

According to an article "What Is Data Storytelling? Components, Benefits, & Examples" from Unite.AI, three major components of data storytelling are identified: data, narrative, and visuals. The data was cleaned and refined from both the NHANES and WGU hospital and then woven into a narrative that provides a frame of reference for WGU's executives. This narrative focuses on highlighting the increased medical condition rates compared to the NHANES patients, and the large number of elderly patients at WGU when compared to a national sample. Visualizations created to highlight this narrative provide a clearer picture for the average user, who may not have additional research to explain the differences and similarities between the two sources. Key Performance Indicators and visualizations supporting these metrics were also provided as evidence for this narrative’s intended conclusion. These visuals, along with the demographic breakdowns, enhance the dashboard's storytelling purpose and assist in presenting the conclusions of the narrative to the hospital executives.

<a id="thirdparty"></a>
## D. Third-party Code References

Kamal Shaham D209 Task 1 
https://wgu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=7cdebfd5-4471-454b-9f5c-b1400072c423

Kamal Shaham D208 Task 2
https://wgu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=28e9a423-53ea-4275-a20a-b105003aef5d


<a id="references"></a>
## E. References

Build a bar chart. (n.d.). Tableau.
https://help.tableau.com/current/pro/desktop/en-us/buildexamples_bar.htm

D210 Datasets. (n.d.). WGU Performance Assessment. Tasks.wgu.edu. Retrieved from
https://lrps.wgu.edu/provision/227080088

National Health & Nutrition Exam Survey 2017-2018. (2024, January 12). Kaggle.
https://www.kaggle.com/datasets/rileyzurrin/national-health-and-nutrition-exam-survey-2017-2018

NHANES 2017-2018 Demographics Variable list. (n.d.).
https://wwwn.cdc.gov/nchs/nhanes/search/variablelist.aspx?Component=Demographics&Cycle=2017-2018

Sajid, H. (2023, March 18). What Is Data Storytelling? Components, Benefits, &amp; Examples. Unite.AI. https://www.unite.ai/what-is-data-storytelling-components-benefits-examples/

The best charts for color blind viewers | Blog | Datylon. (n.d.).
https://www.datylon.com/blog/data-visualization-for-colorblind-readers#what-colors-can-they-see