<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Population" data-toc-modified-id="Population-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Population</a></span></li><li><span><a href="#Reported-tests" data-toc-modified-id="Reported-tests-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Reported tests</a></span><ul class="toc-item"><li><span><a href="#Clackamas,-Multnomah,-Washington-and-Yamhill-counties" data-toc-modified-id="Clackamas,-Multnomah,-Washington-and-Yamhill-counties-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Clackamas, Multnomah, Washington and Yamhill counties</a></span></li><li><span><a href="#State" data-toc-modified-id="State-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>State</a></span></li></ul></li><li><span><a href="#Cases-and-deaths" data-toc-modified-id="Cases-and-deaths-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Cases and deaths</a></span><ul class="toc-item"><li><span><a href="#Counties" data-toc-modified-id="Counties-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Counties</a></span></li><li><span><a href="#State" data-toc-modified-id="State-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>State</a></span><ul class="toc-item"><li><span><a href="#Check-whether-state-level-data-equals-the-sum-of-the-county-level-data" data-toc-modified-id="Check-whether-state-level-data-equals-the-sum-of-the-county-level-data-3.2.1"><span class="toc-item-num">3.2.1&nbsp;&nbsp;</span>Check whether state-level data equals the sum of the county-level data</a></span></li></ul></li></ul></li><li><span><a href="#Calculating-test-and-case-rates" data-toc-modified-id="Calculating-test-and-case-rates-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Calculating test and case rates</a></span><ul class="toc-item"><li><span><a href="#Combine-case,-death-and-test-data" data-toc-modified-id="Combine-case,-death-and-test-data-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Combine case, death and test data</a></span></li><li><span><a href="#Integrate-population-data-to-calculate-rates" data-toc-modified-id="Integrate-population-data-to-calculate-rates-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Integrate population data to calculate rates</a></span></li></ul></li><li><span><a href="#Appendix-A:-Mobility-data" data-toc-modified-id="Appendix-A:-Mobility-data-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Appendix A: Mobility data</a></span><ul class="toc-item"><li><span><a href="#Pull-Oregon-data-and-restructure" data-toc-modified-id="Pull-Oregon-data-and-restructure-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Pull Oregon data and restructure</a></span></li><li><span><a href="#Describe-my-understanding-of-the-data" data-toc-modified-id="Describe-my-understanding-of-the-data-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Describe my understanding of the data</a></span><ul class="toc-item"><li><span><a href="#Variables" data-toc-modified-id="Variables-5.2.1"><span class="toc-item-num">5.2.1&nbsp;&nbsp;</span>Variables</a></span><ul class="toc-item"><li><span><a href="#Categorical" data-toc-modified-id="Categorical-5.2.1.1"><span class="toc-item-num">5.2.1.1&nbsp;&nbsp;</span>Categorical</a></span></li><li><span><a href="#Numerical" data-toc-modified-id="Numerical-5.2.1.2"><span class="toc-item-num">5.2.1.2&nbsp;&nbsp;</span>Numerical</a></span></li></ul></li><li><span><a href="#m50-index" data-toc-modified-id="m50-index-5.2.2"><span class="toc-item-num">5.2.2&nbsp;&nbsp;</span>m50 index</a></span></li></ul></li></ul></li></ul></div>

In [1]:
import datetime
import glob
import os
import re

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas.plotting import register_matplotlib_converters

register_matplotlib_converters()

# Population

In [2]:
us_pop = pd.read_csv(
    'data/input/co-est2019-alldata.csv',
    dtype={
        'REGION': str,
        'DIVISION': str,
        'STATE': str,
        'COUNTY': str
    },
    encoding="ISO-8859-1")

In [3]:
us_pop['county_fips'] = us_pop['STATE'] + us_pop['COUNTY']

In [4]:
or_population_df = us_pop[(us_pop['STNAME'] == 'Oregon') & (us_pop['COUNTY'] != '000')][[
    'county_fips', 'CTYNAME', 'POPESTIMATE2019'
]].copy()

For expediency, I'm shortening the field name to just `2019pop`, but note that -- though this is directly from the U.S. Census Bureau, and this is a conventional data set to use for per capita analyses -- it's an estimate rather than an actual count (as for the decennial census) and it applies to the previous year's data.

In [5]:
or_population_df.rename(
    columns={'POPESTIMATE2019': '2019pop',
             'CTYNAME': 'county'}, inplace=True)

In [6]:
or_population_df['county'] = or_population_df['county'].str.replace(
    ' County', '')

As a quality check, we should check whether this data corresponds to 36 county identifiers.

In [7]:
or_population_df['county_fips'].nunique() == 36

True

In [8]:
or_population_df['county'].nunique() == 36

True

# Reported tests

## Clackamas, Multnomah, Washington and Yamhill counties

Weekly positive and negative test counts. This is only relying on the four counties in the Multnomah County dashboard so far.

Source: https://multco.us/novel-coronavirus-covid-19/regional-covid-19-data-dashboard

In [9]:
reported_tests = pd.read_csv('data/input/oregon_counties - tests.csv',
                             parse_dates=['date'])

In [10]:
reported_tests

Unnamed: 0,date,county,tests_neg,tests_pos,tests_total
0,2020-02-02,Clackamas,0,0,0
1,2020-02-02,Multnomah,1,0,1
2,2020-02-02,Washington,0,0,0
3,2020-02-02,Yamhill,0,0,0
4,2020-02-09,Clackamas,0,0,0
...,...,...,...,...,...
63,2020-05-17,Yamhill,359,5,364
64,2020-05-24,Clackamas,4,0,4
65,2020-05-24,Multnomah,4,0,4
66,2020-05-24,Washington,1,0,1


## State

Source: The Covid Tracking Project API

To pull the most recent data, you can uncomment the code below to use the API.

To limit hitting their API when I inevitably make a mess of my environment and restart the kernel, and in case people want to work from a CSV without dealing with this notebook, I'll output the data to a CSV and load it again.

In [11]:
ctp_case_api = pd.read_json(
    'https://covidtracking.com/api/v1/states/OR/daily.json',
    dtype={'fips': str})

In [12]:
def output(df, filename, subdir='input', archive=False):
    if archive == True:
        timestamp = datetime.datetime.now().strftime('%Y-%m-%d_%p')
        df.to_csv(f'data/archive/{filename}_{timestamp}.csv', index=False)
    df.to_csv(f'data/{subdir}/{filename}.csv', index=False)

In [13]:
output(ctp_case_api, 'covidtracking_or', archive=True)

In [14]:
ctp_case_df = pd.read_csv('data/input/covidtracking_or.csv',
                          dtype={'fips': str},
                          parse_dates=['date', 'lastUpdateEt', 'dateChecked'])

In [15]:
ctp_case_df.head()

Unnamed: 0,date,state,positive,negative,pending,hospitalizedCurrently,hospitalizedCumulative,inIcuCurrently,inIcuCumulative,onVentilatorCurrently,...,hospitalized,total,totalTestResults,posNeg,fips,deathIncrease,hospitalizedIncrease,negativeIncrease,positiveIncrease,totalTestResultsIncrease
0,2020-05-25,OR,3949,109909,,125.0,747.0,35.0,,23.0,...,747.0,113858,113858,113858,41,0.0,5.0,1641.0,22.0,1663.0
1,2020-05-24,OR,3927,108268,,117.0,742.0,35.0,,18.0,...,742.0,112195,112195,112195,41,1.0,2.0,1949.0,39.0,1988.0
2,2020-05-23,OR,3888,106319,,102.0,740.0,41.0,,15.0,...,740.0,110207,110207,110207,41,0.0,3.0,2344.0,24.0,2368.0
3,2020-05-22,OR,3864,103975,,146.0,737.0,47.0,,16.0,...,737.0,107839,107839,107839,41,2.0,5.0,2568.0,47.0,2615.0
4,2020-05-21,OR,3817,101407,,140.0,732.0,40.0,,14.0,...,732.0,105224,105224,105224,41,1.0,9.0,3059.0,16.0,3075.0


# Cases and deaths

## Counties

Same deal as above re: using a local CSV vs. grabbing the data from the source URL.

In [16]:
nyt_county_url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
nyt_counties = pd.read_csv(nyt_county_url,
                           dtype={
                               'fips': str,
                               'cases': int,
                               'deaths': int
                           },
                           parse_dates=['date'])

In [17]:
output(nyt_counties, 'nyt_us-counties', archive=True)

In [18]:
nyt_counties = pd.read_csv('data/input/nyt_us-counties.csv',
                           dtype={'fips': str},
                           parse_dates=['date'])

In [19]:
nyt_counties.rename(columns={'fips': 'county_fips'}, inplace=True)

In [20]:
or_counties_df = nyt_counties[nyt_counties['state'] == 'Oregon'].copy()

In [21]:
or_counties_df.head()

Unnamed: 0,date,county,state,county_fips,cases,deaths
369,2020-02-28,Washington,Oregon,41067,1,0
391,2020-02-29,Washington,Oregon,41067,1,0
417,2020-03-01,Washington,Oregon,41067,2,0
449,2020-03-02,Washington,Oregon,41067,2,0
484,2020-03-03,Washington,Oregon,41067,2,0


Check the number of unique counties in this data

In [22]:
or_counties_df['county_fips'].nunique()

33

Determine which counties do not exist in the case data

In [23]:
missing = set(or_population_df['county_fips']) - \
    set(or_counties_df['county_fips'])
or_population_df[or_population_df['county_fips'].isin(missing)]

Unnamed: 0,county_fips,county,2019pop
2256,41021,Gilliam,1912
2264,41037,Lake,7869
2280,41069,Wheeler,1332


We should check regardless, but these counties' absence calls for a comparison of the state- and county-level datasets we're pulling from NYT's repository.

In [24]:
or_counties_aggregate = or_counties_df.groupby(
    ['date'])[['cases', 'deaths']].sum().reset_index()

In [25]:
or_counties_aggregate.head()

Unnamed: 0,date,cases,deaths
0,2020-02-28,1,0
1,2020-02-29,1,0
2,2020-03-01,2,0
3,2020-03-02,2,0
4,2020-03-03,2,0


## State

Same deal as above.

In [26]:
nyt_state_url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv'

In [27]:
nyt_states = pd.read_csv(nyt_state_url,
                         dtype={
                             'fips': str,
                             'cases': int,
                             'deaths': int
                         },
                         parse_dates=['date'])

In [28]:
output(nyt_states, 'nyt_us-states', archive=True)

In [29]:
nyt_states = pd.read_csv('data/input/nyt_us-states.csv',
                         dtype={'county_fips': str},
                         parse_dates=['date'])

In [30]:
nyt_states.rename(columns={'state': 'name'}, inplace=True)

In [31]:
nyt_or = nyt_states[nyt_states['name'] == 'Oregon'].copy()

In [32]:
nyt_or

Unnamed: 0,date,name,fips,cases,deaths
225,2020-02-28,Oregon,41,1,0
235,2020-02-29,Oregon,41,1,0
247,2020-03-01,Oregon,41,2,0
262,2020-03-02,Oregon,41,2,0
278,2020-03-03,Oregon,41,2,0
...,...,...,...,...,...
4343,2020-05-20,Oregon,41,3801,144
4398,2020-05-21,Oregon,41,3817,145
4453,2020-05-22,Oregon,41,3864,147
4508,2020-05-23,Oregon,41,3888,147


### Check whether state-level data equals the sum of the county-level data

In [33]:
or_merged = pd.merge(
    nyt_or[['date', 'cases', 'deaths']],
    or_counties_aggregate,
    on='date',
    how='outer',
    suffixes=('_st', '_co'))

In [34]:
or_merged['cases_diff'] = or_merged['cases_st'] - or_merged['cases_co']

In [35]:
or_merged['deaths_diff'] = or_merged['deaths_st'] - or_merged['deaths_co']

In many cases they are identical, but on May 13 and May 14 the state count of cases is lower. Next I might check whether the `cases_diff` value matches any value of a particular county on that day, which could be evidence (but not proof) that this is the origin of the difference.

From now on, I'll proceed with analyses of Oregon on both the county and the state level with the _New York Times_' county-level data, aggregating where I need to.

In [36]:
or_merged[(or_merged['cases_diff'] != 0) | (or_merged['deaths_diff'] != 0)]

Unnamed: 0,date,cases_st,deaths_st,cases_co,deaths_co,cases_diff,deaths_diff
28,2020-03-27,414,12,415,12,-1,0
29,2020-03-28,479,13,480,13,-1,0
30,2020-03-29,548,13,549,13,-1,0
31,2020-03-30,606,16,607,16,-1,0
33,2020-04-01,736,19,737,19,-1,0
34,2020-04-02,826,21,827,21,-1,0
36,2020-04-04,999,26,1000,26,-1,0
75,2020-05-13,3416,134,3468,134,-52,0
76,2020-05-14,3479,137,3600,137,-121,0


# Calculating test and case rates

## Combine case, death and test data

First, because the county test data is at this point a weekly number, we'll pull the data for the same days as the county reports (Sundays).

A previous version of this code erroniously aggregated by week, which is inappropriate for data reporting cumulative counts.

In [37]:
or_counties_df['weekday'] = or_counties_df['date'].apply(datetime.date.weekday)

In [38]:
or_co_weekly_df = or_counties_df[or_counties_df['weekday']==6].copy()

In [39]:
or_co_weekly_df.tail()

Unnamed: 0,date,county,state,county_fips,cases,deaths,weekday
172913,2020-05-24,Union,Oregon,41061,6,0,6
172914,2020-05-24,Wallowa,Oregon,41063,2,0,6
172915,2020-05-24,Wasco,Oregon,41065,18,1,6
172916,2020-05-24,Washington,Oregon,41067,690,17,6
172917,2020-05-24,Yamhill,Oregon,41071,65,7,6


## Integrate population data to calculate rates

In [40]:
weekly_df_with_population = pd.merge(
    or_co_weekly_df, or_population_df, how='left')

In [41]:
rate_df = pd.merge(weekly_df_with_population, reported_tests,
                   on=['date', 'county'], how='outer')

In [42]:
rate_df

Unnamed: 0,date,county,state,county_fips,cases,deaths,weekday,2019pop,tests_neg,tests_pos,tests_total
0,2020-03-01,Washington,Oregon,41067,2.0,0.0,6.0,601592.0,20.0,8.0,28.0
1,2020-03-08,Douglas,Oregon,41019,1.0,0.0,6.0,110980.0,,,
2,2020-03-08,Jackson,Oregon,41029,2.0,0.0,6.0,220944.0,,,
3,2020-03-08,Klamath,Oregon,41035,1.0,0.0,6.0,68238.0,,,
4,2020-03-08,Marion,Oregon,41047,1.0,0.0,6.0,347818.0,,,
...,...,...,...,...,...,...,...,...,...,...,...
327,2020-03-01,Multnomah,,,,,,,36.0,1.0,37.0
328,2020-03-01,Yamhill,,,,,,,10.0,0.0,10.0
329,2020-03-08,Clackamas,,,,,,,111.0,8.0,119.0
330,2020-03-08,Multnomah,,,,,,,288.0,5.0,293.0


In [43]:
def calculate_rate(field, population):
    per_capita = field/population
    rounded_rate = round(per_capita * 100000, 2)
    return rounded_rate

In [44]:
rate_df['cases_per_100k'] = calculate_rate(
    rate_df['cases'], rate_df['2019pop'])

In [45]:
rate_df['deaths_per_100k'] = calculate_rate(
    rate_df['deaths'], rate_df['2019pop'])

In [46]:
rate_df['tests_per_100k'] = calculate_rate(
    rate_df['tests_total'], rate_df['2019pop'])

In [47]:
output(rate_df, 'oregon_rates_by_county', subdir='output', archive=False)

In [48]:
reported_by_multco = rate_df[rate_df['county'].isin(
    ['Clackamas', 'Multnomah', 'Washington', 'Yamhill'])]

In [49]:
reported_by_multco.head()

Unnamed: 0,date,county,state,county_fips,cases,deaths,weekday,2019pop,tests_neg,tests_pos,tests_total,cases_per_100k,deaths_per_100k,tests_per_100k
0,2020-03-01,Washington,Oregon,41067,2.0,0.0,6.0,601592.0,20.0,8.0,28.0,0.33,0.0,4.65
5,2020-03-08,Washington,Oregon,41067,8.0,0.0,6.0,601592.0,170.0,9.0,179.0,1.33,0.0,29.75
6,2020-03-15,Clackamas,Oregon,41005,1.0,0.0,6.0,418187.0,516.0,20.0,536.0,0.24,0.0,128.17
13,2020-03-15,Multnomah,Oregon,41051,1.0,1.0,6.0,812855.0,1730.0,41.0,1771.0,0.12,0.12,217.87
16,2020-03-15,Washington,Oregon,41067,13.0,0.0,6.0,601592.0,697.0,64.0,761.0,2.16,0.0,126.5


# Appendix A: Mobility data

I put this in an appendix because it's fascinating data, but I don't know enough about the data capture or analysis methods by the company producing it to integrate it into the information above.

That said, I think it's worth noting that I learned about this data set from an excellent _[Tampa Bay Times](https://tampabay.com/news/health/2020/05/10/how-florida-slowed-coronavirus-everyone-stayed-home-before-they-were-told-to/)_ analysis of evidence that Florida may not have had an infection rate as bad as people anticipated in part because people stayed home before the statewide stay-at-home order. The authors ran their findings by experts.

## Pull Oregon data and restructure

In [50]:
mobility_data = pd.read_csv(
    'https://raw.githubusercontent.com/descarteslabs/DL-COVID-19/master/DL-us-mobility-daterow.csv',
    dtype={'fips': str},
    parse_dates=['date']
)

In [51]:
mobility_data

Unnamed: 0,date,country_code,admin_level,admin1,admin2,fips,samples,m50,m50_index
0,2020-03-01,US,1,Alabama,,01,133826,8.331,79
1,2020-03-02,US,1,Alabama,,01,143632,10.398,98
2,2020-03-03,US,1,Alabama,,01,146009,10.538,100
3,2020-03-04,US,1,Alabama,,01,149352,10.144,96
4,2020-03-05,US,1,Alabama,,01,144109,10.982,104
...,...,...,...,...,...,...,...,...,...
222937,2020-05-19,US,2,Wyoming,Uinta County,56041,726,4.451,142
222938,2020-05-20,US,2,Wyoming,Uinta County,56041,794,4.378,139
222939,2020-05-21,US,2,Wyoming,Uinta County,56041,757,5.642,180
222940,2020-05-22,US,2,Wyoming,Uinta County,56041,742,6.247,199


In [52]:
or_state_mobility = mobility_data[mobility_data['fips'] == '41'].copy()

In [53]:
or_state_mobility.drop(
    labels=['country_code', 'admin_level', 'admin1', 'admin2'], axis=1, inplace=True)

In [54]:
output(or_state_mobility, 'oregon_mobility_state', subdir='appendix', archive=True)

In [55]:
or_county_mobility = mobility_data[(mobility_data['admin_level'] == 2) & (
    mobility_data['admin1'] == 'Oregon')].copy()

In [56]:
or_county_mobility.drop(
    labels=['country_code', 'admin_level', 'admin1'], axis=1, inplace=True)

In [57]:
or_county_mobility.rename(
    columns={'admin2': 'county', 'fips': 'county_fips'}, inplace=True)

In [58]:
or_county_mobility['county'] = or_county_mobility['county'].str.replace(
    ' County', '')

In [59]:
output(or_county_mobility, 'oregon_mobility_county', subdir='appendix', archive=True)

## Describe my understanding of the data

In [60]:
sample = or_county_mobility[or_county_mobility['county'] == 'Multnomah']

### Variables
#### Categorical

- **Admin**

  - “For a canonical position of each processed node (such as the first or last report of the day in local time, or a derived location such as the centroid of node locations) we reverse geocode the location to a country and administrative region.”

  - **Admin1** is the first-order administrative division, being a primary administrative division of a country, such as a state in the United States.

  - **Admin2** is a second-order administrative division, such as a county or borough in the United States.
  
  

- **Name**
  - "Name is a populated place feature, representing a city, town, village, or other agglomeration of buildings where people live and work."

#### Numerical

- **$m_{max}$**
  - “The maximum Haversine (great circle) distance (in units of km) from the initial location report of the day.”
  - “Outliers (which can occur with bad position reports caused by poor GPS fixes or other errors) are trimmed by eliminating the top 10% of the distribution.”


- **$m50$**
  - The median of **$M_{max}$** values for each area of interest, e.g. **Admin1** or **Admin2**.


- **$m50_{norm}$**

  - “A ‘normal’ value of $m50$ in a region, defined as the median $m50$ in that region during a designated earlier time period.”
  - “We use the median weekday value of $m50$ between the dates of 2020-20-17 and 2020-03-07 as $m50_{norm}$ to investigate COVID-19 related changes in the US.”
  

- **$m50_{index}$**

  - $m50_{index} = 100\frac{m50}{m50_{norm}}$

### m50 index

Just to repeat, this is equation from the paper:

$$m50_{index} = 100\frac{m50}{m50_{norm}}$$

Here's an example of its being applied to real data.

In [61]:
sample.head()

Unnamed: 0,date,county,county_fips,samples,m50,m50_index
159008,2020-03-01,Multnomah,41051,8938,2.939,58
159009,2020-03-02,Multnomah,41051,9142,4.482,89
159010,2020-03-03,Multnomah,41051,9440,4.783,95
159011,2020-03-04,Multnomah,41051,9637,5.023,100
159012,2020-03-05,Multnomah,41051,9275,5.354,106


`samples` for March 1, 2020 equals 8938, meaning that this day's data corresponds to movements of 8,938 anonymized devices located in the Multnomah County area _at the beginning of the day_, which is taken to mean that the device belongs to a person living or staying in Multnomah County at the time.

`m50` starts with an examination of the farthest distance traveled by each device over that day in kilometers, $m_{max}$. Then the researchers calculated the median of all of these values.

  - On March 1, 2020, `m50` is 2.939, which means that half of the devices this day -- 4,469 -- traveled less than 2.939 kilometers (or about 1.8 miles) from where they started.
  
  
**Important**

$m50_{norm}$ is a median of _these medians_ over a time period that the researchers understand to correspond to typical movement among devices in a particular region (in this case, Multnomah County).

Again, $m50_{index}$ equals the following, multiplied by 100:

$$\frac{m50}{m50_{norm}}$$

$$or$$

$$\frac{\mathrm{median\ movement}\ e.g.\mathrm{\ March\ 1}}{\mathrm{typical\ daily\ movement\ over\ before\ the\ pandemic}}$$.

So if the $m50_{index}$ for March 1 is 58, this means that the median of the distances traveled (2.939) is .58 the typical distance prior to the pandemic, or about 5.1 kilometers.

We can test this understanding by selecting a day (like March 4) with an index of 100, which would mean that the $m50$ that day is about what was typical:

In [62]:
sample[sample['date'] == '2020-03-04']

Unnamed: 0,date,county,county_fips,samples,m50,m50_index
159011,2020-03-04,Multnomah,41051,9637,5.023,100


In [63]:
typical = 5.023

In [64]:
sample[sample['date'] == '2020-03-01']

Unnamed: 0,date,county,county_fips,samples,m50,m50_index
159008,2020-03-01,Multnomah,41051,8938,2.939,58


In [65]:
march_1 = 2.939

In [66]:
march_1 - typical * .58

0.025660000000000238

This is about a 2.6% difference, which is not surprising given rounding errors!

Okay.

If you got this far, here's code you can run to output this data to something manageable in visualization software. The data as of 5/24 is in the `appendix` subdirectory.

In [67]:
or_county_mobility['avg_miles_traveled'] = or_county_mobility['m50']/1.609

In [68]:
or_county_mobility['pct_change'] = (or_county_mobility['m50_index']-100)/100

In [69]:
output(or_county_mobility, 'ocm_processed', subdir='appendix', archive=True)