<a href="https://colab.research.google.com/github/forzana/Traccinate/blob/main/traccinate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pearl Hacks 2021
## Project: Visualizing Vaccination Data
### Tora Mullings and Forzana Rime

First, install the packages we need. We recommend using `conda install -c plotly plotly=4.14.3` and `conda install -c conda-forge ipywidgets`. However, feel free to install using `pip` as well.

In [3]:
%pip install plotly==4.14.3
%pip install ipywidgets



## I. Get updated vaccination data
US vaccination data is stored in a github file and is updated daily.

In [15]:
%%bash
DATADIR="vaccination_data"
if test ! -d "$DATADIR";then
    echo "Creating $DATADIR dir"
    mkdir "$DATADIR"
    cd "$DATADIR"
    wget https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/us_state_vaccinations.csv
fi

## II. Load Python modules and data
Import libraries we want to use.

In [16]:
import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

Select the fields we want to look at and print out the first five lines.

In [17]:
selected_variables = ['date',
                      'location',
                      'people_vaccinated',
                      'people_fully_vaccinated_per_hundred',
                      'people_fully_vaccinated',
                      'people_vaccinated_per_hundred',
                      'daily_vaccinations_per_million']

selected_data = pd.read_csv('vaccination_data/us_state_vaccinations.csv', 
                                dtype=str, encoding='unicode_escape',
                                usecols=selected_variables)

## Preview of the first 5 rows of the data subset. 

selected_data.head(10)

Unnamed: 0,date,location,people_vaccinated,people_fully_vaccinated_per_hundred,people_fully_vaccinated,people_vaccinated_per_hundred,daily_vaccinations_per_million
0,2021-01-12,Alabama,70861.0,0.15,7270.0,1.44,
1,2021-01-13,Alabama,74792.0,0.19,9245.0,1.52,1205.0
2,2021-01-14,Alabama,80480.0,,,1.64,1445.0
3,2021-01-15,Alabama,86956.0,0.27,13488.0,1.77,1525.0
4,2021-01-16,Alabama,,,,,1529.0
5,2021-01-17,Alabama,,,,,1531.0
6,2021-01-18,Alabama,,,,,1533.0
7,2021-01-19,Alabama,114319.0,0.33,16346.0,2.33,1534.0
8,2021-01-20,Alabama,121113.0,0.37,17956.0,2.47,1607.0
9,2021-01-21,Alabama,144429.0,0.43,21345.0,2.95,2145.0


Check length of dataframe and print out info.

In [18]:
len(selected_data)

2623

In [19]:
selected_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2623 entries, 0 to 2622
Data columns (total 7 columns):
 #   Column                               Non-Null Count  Dtype 
---  ------                               --------------  ----- 
 0   date                                 2623 non-null   object
 1   location                             2623 non-null   object
 2   people_vaccinated                    2295 non-null   object
 3   people_fully_vaccinated_per_hundred  2075 non-null   object
 4   people_fully_vaccinated              2213 non-null   object
 5   people_vaccinated_per_hundred        2152 non-null   object
 6   daily_vaccinations_per_million       2367 non-null   object
dtypes: object(7)
memory usage: 143.6+ KB


## III. Change data types of columns
people vaccinated --> integer

people fully vaccinated per hundred (percentage of pop) --> float

people fully vaccinated --> integer

people vaccinated per hundred (perc of pop) --> float

daily vaccinations per million --> int


In [20]:
selected_data['people_vaccinated'] = pd.to_numeric(selected_data['people_vaccinated'])
selected_data['people_fully_vaccinated_per_hundred'] = pd.to_numeric(selected_data['people_fully_vaccinated_per_hundred'])
selected_data['people_fully_vaccinated'] = pd.to_numeric(selected_data['people_fully_vaccinated'])
selected_data['people_vaccinated_per_hundred'] = pd.to_numeric(selected_data['people_vaccinated_per_hundred'])
selected_data['daily_vaccinations_per_million'] = pd.to_numeric(selected_data['daily_vaccinations_per_million'])

# print data types to see changes
selected_data.dtypes

date                                    object
location                                object
people_vaccinated                      float64
people_fully_vaccinated_per_hundred    float64
people_fully_vaccinated                float64
people_vaccinated_per_hundred          float64
daily_vaccinations_per_million         float64
dtype: object

Add a column for daily vaccinations per hundred so we can get this as a percentage.

In [21]:
for row in selected_data['daily_vaccinations_per_million']:
  try:
    selected_data.loc[selected_data['daily_vaccinations_per_million'] == row, 'daily_vaccinations_per_hundred'] = row/10000
  except:
    continue

new_data = selected_data.drop(columns=['daily_vaccinations_per_million'])
new_data.head(5)

Unnamed: 0,date,location,people_vaccinated,people_fully_vaccinated_per_hundred,people_fully_vaccinated,people_vaccinated_per_hundred,daily_vaccinations_per_hundred
0,2021-01-12,Alabama,70861.0,0.15,7270.0,1.44,
1,2021-01-13,Alabama,74792.0,0.19,9245.0,1.52,0.1205
2,2021-01-14,Alabama,80480.0,,,1.64,0.1445
3,2021-01-15,Alabama,86956.0,0.27,13488.0,1.77,0.1525
4,2021-01-16,Alabama,,,,,0.1529


## IV. Export as new .csv
This will be useful for anyone who wants to use the data we extracted.

In [11]:
new_data.to_csv(r'us_vaccinations_new.csv', index=False, encoding='utf-8')

## V. Graph It!

Create an interactive graph showing percentage of people fully vaccinated (2 doses) by state/territory. Using the drop down menu, we can change which state is being displayed on the graph.

In [35]:
jurisdictions=["Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas", 
               "California", "Colorado", "Connecticut", "Delaware", "District of Columbia", 
               "Federated States of Micronesia", "Florida", "Georgia", "Guam", "Hawaii", "Idaho", "Illinois", 
               "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Marshall Islands", "Maryland",
               "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", 
               "Nebraska", "New Hampshire", "New Jersey", "New Mexico", "New York State", "Nevada", 
               "North Carolina", "North Dakota", "Northern Mariana Islands", "Ohio", "Oklahoma", "Oregon", 
               "Pennsylvania", "Puerto Rico", "Republic of Palau", "Rhode Island", "South Carolina", 
               "South Dakota", "Tennessee", "Texas", "United States", "Utah", "Vermont", "Virginia", "Virgin Islands", 
               "Washington", "West Virginia", "Wisconsin", "Wyoming"]

# Default graph is of entire United States
def FullyVaccinated(Location="United States"):
    date = new_data.loc[new_data["location"]== Location, "date"]
    percentageVaccinatedTotal = new_data.loc[new_data["location"]== Location, "people_fully_vaccinated_per_hundred"]

    fig = go.Figure()
    # Create and style traces
    fig.add_trace(go.Scatter(x=date, y=percentageVaccinatedTotal, name=Location,
                             connectgaps=True,
                             line=dict(color='firebrick', width=2)))

    # Edit the layout
    fig.update_layout(title='Total Percentage of Population Vaccinated Against Covid in ' + Location,
                       xaxis_title='Date',
                       yaxis_title='Percentage of Population')


    fig.show()

# drop down menu for location
interact(FullyVaccinated, Location=jurisdictions)

interactive(children=(Dropdown(description='Location', index=51, options=('Alabama', 'Alaska', 'American Samoa…

<function __main__.FullyVaccinated>

This graph shows percentage of population that is vaccinated per day. We wanted to see if there was an increasing number of vaccines being administered per day as time went on.

In [37]:
# Default graph is of entire United States
def DailyVaccinations(Location="United States"):
    date = new_data.loc[new_data["location"]== Location, "date"]
    percentageDaily = new_data.loc[new_data["location"]== Location, "daily_vaccinations_per_hundred"]

    fig = go.Figure()
    # Create and style traces
    fig.add_trace(go.Scatter(x=date, y=percentageDaily, name=Location,
                             connectgaps=True,
                             line=dict(color='firebrick', width=2)))

    # Edit the layout
    fig.update_layout(title='Percentage of Population Vaccinated Against Covid per Day in ' + Location,
                       xaxis_title='Date',
                       yaxis_title='Percentage of Population')


    fig.show()

# drop down menu for location
interact(DailyVaccinations, Location=jurisdictions)

interactive(children=(Dropdown(description='Location', index=51, options=('Alabama', 'Alaska', 'American Samoa…

<function __main__.DailyVaccinations>

This graph shows percentage of population that has been given at least one dose of the vaccine. 

In [44]:
# Default graph is of entire United States
def OneDoseOrMore(Location="United States"):
    date = new_data.loc[new_data["location"]== Location, "date"]
    percentageDaily = new_data.loc[new_data["location"]== Location, "people_vaccinated_per_hundred"]

    fig = go.Figure()
    # Create and style traces
    fig.add_trace(go.Scatter(x=date, y=percentageDaily, name=Location,
                             connectgaps=True,
                             line=dict(color='firebrick', width=2)))

    # Edit the layout
    fig.update_layout(title='Total Percentage of Population Vaccinated (At Least 1 Dose) in ' + Location,
                       xaxis_title='Date',
                       yaxis_title='Percentage of Population')


    fig.show()

# drop down menu for location
interact(OneDoseOrMore, Location=jurisdictions)

interactive(children=(Dropdown(description='Location', index=51, options=('Alabama', 'Alaska', 'American Samoa…

<function __main__.OneDoseOrMore>

Finally, we want to display some raw data. This graph shows the number of people that have been fully vaccinated with 2 doses of the covid vaccine.

In [46]:
# Default graph is of entire United States
def PeopleFullyVacc(Location="United States"):
    date = new_data.loc[new_data["location"]== Location, "date"]
    percentageDaily = new_data.loc[new_data["location"]== Location, "people_fully_vaccinated"]

    fig = go.Figure()
    # Create and style traces
    fig.add_trace(go.Scatter(x=date, y=percentageDaily, name=Location,
                             connectgaps=True,
                             line=dict(color='firebrick', width=2)))

    # Edit the layout
    fig.update_layout(title='Total Number of People Fully Vaccinated with 2 Doses in ' + Location,
                       xaxis_title='Date',
                       yaxis_title='Number of People')


    fig.show()

# drop down menu for location
interact(PeopleFullyVacc, Location=jurisdictions)

interactive(children=(Dropdown(description='Location', index=51, options=('Alabama', 'Alaska', 'American Samoa…

<function __main__.PeopleFullyVacc>