# Visualizing COVID-19 Hospital Dataset with Seaborn

**Pre-Work:**
1. Ensure that Jupyter Notebook, Python 3, and seaborn (which will also install dependency libraries if not already installed) are installed. (See resources below for installation instructions.)

### **Instructions:**
1. Using Python, import main visualization library, `seaborn`, and its dependencies: `pandas`, `numpy`, and `matplotlib`.
2. Define dataset and read in data using pandas function, `read_json()`. [Notes: a) we're reading in data as an API endpoint; for more about this, see associated workshop slides or resources at bottom of notebook. b) If, instead, you prefer to use your own data, see comment with alternative for `read_csv()` function.]
3. Check data has been read is as expected using `head()` function.
4. Graph two variables with `seaborn`as a lineplot using the `lineplot()` function.
5. Graph these same variables, plus a third, from the source dataset with `seaborn` as a scatterplot using the `relplot()` function.
6. See additional methods, using filtered data and other graphs. Feel free to add new cells (or open a new notebook), and try out your own ideas, using different variables or charts.
7. When ready, save figure using `matplotlib`'s `savefig`.

**Note:**
*If you're new to Jupyter Notebook, see resources below.*

### **Data source:**

[COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries](https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh)," created by the U.S. Department of Health & Human Services, on [HealthData.gov](https://healthdata.gov/).

In [None]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# read JSON data in via healthdata.gov's API endpoint - https://healthdata.gov/resource/g62h-syeh.json?$limit=50000
# because the SODA API defaults to 1,000 rows, we're going to change that with the $limit parameter
# first, we'll write out the function as is and see what that output is
# then, we'll define data as 'covid' and set equal to read function, which we'll end up calling below

# if you want to read in your own data, see resources below, or if you have a CSV, try: mydata = pd.read_csv('')
# and add data filepath inside ''
# be sure to change covid to mydata in code below
# note: when you see In [*] to the right of a cell, that means it's loading; sometimes functions like head
# or charts can take some time to load

In [None]:
# use seaborn to plot inpatient beds used versus whether a critical staffing shortage is occuring
# for this instance, we're going to do just a simple line plot
# we also need to tell seaborn what dataset to use; in this case it's 'covid' as defined above
# variables: inpatient_beds_used_covid; critical_staffing_shortage_today_yes


# save and name fig; uncomment below to run
# plt.savefig('covid_lineplot.png')

In [None]:
# now we're going to try another graph type, a relational graph that will be scatterplot, with the same variables
# and add one more variable, deaths_covid, to color dots based on prevalance of COVID-19 deaths by setting hue
# though feel free to try new variables by browsing them here (scroll down to Columns in this Dataset): https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh
# variables: inpatient_beds_used_covid; critical_staffing_shortage_today_yes; deaths_covid



# save and name fig; uncomment below to run
# plt.savefig('covid_scatterplot.png')

If you'd like to see a completed version of this notebook, visit the [GitHub repo](https://github.com/kthrog/dataviz_workshop/tree/main/materials).

#### Code/Tools Resources:
- Jupyter notebook - about: https://jupyter-notebook.readthedocs.io/en/stable/notebook.html#introduction
- Jupyter notebook - how to use this tool: https://jupyter-notebook.readthedocs.io/en/stable/notebook.html
- Python: https://www.python.org/
- Seaborn: https://seaborn.pydata.org/index.html
- Seaborn tutorial: https://seaborn.pydata.org/tutorial.html
- Seaborn gallery: https://seaborn.pydata.org/examples/index.html
- Seaborn `lineplot()` function: https://seaborn.pydata.org/generated/seaborn.lineplot.html#seaborn.lineplot + https://seaborn.pydata.org/examples/errorband_lineplots.html
- Seaborn `relplot()` function: https://seaborn.pydata.org/generated/seaborn.relplot.html#seaborn.relplot + https://seaborn.pydata.org/examples/faceted_lineplot.html
- Pandas: https://pandas.pydata.org/
- Pandas - how to read / write tabular data: https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html
- Pandas `read.json()` function: https://pandas.pydata.org/docs/reference/api/pandas.io.json.read_json.html?highlight=read_json#pandas.io.json.read_json
- Pandas `head()` function: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html?highlight=head#pandas.DataFrame.head
- Matplotlib: https://matplotlib.org/
- Matplotlib `savefig` function: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
- Socrata Open Data API (SODA) Docs: https://dev.socrata.com/
- SODA Docs for [Dataset](https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh): https://dev.socrata.com/foundry/healthdata.gov/g62h-syeh
- SODA Docs - what is an endpoint: https://dev.socrata.com/docs/endpoints.html

#### Visualization Resources:
- 10 Simple Rules for Better Figures | *PLOS Comp Bio*: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003833

- How to Choose the Right Data Visualization | *Chartio*: https://chartio.com/learn/charts/how-to-choose-data-visualization/

#### Additional Note:
This notebook was created by Kaitlin Throgmorton for a data analysis workshop, as part of an interview for Yale University.