# COVID Weekly

author: camen piho  
last run: 28 December 2020

---

1. [Description](#Description)
1. [Installation](#Installation)
1. [COVID-19 by Week-Zip](#COVID-19-by-Week-Zip)

## Description

## Installation

## COVID-19 by Week-Zip

[Published](https://dev.socrata.com/foundry/data.cityofchicago.org/yhhz-zm2v) by the City of Chicago with [Socratica](https://dev.socrata.com/).

Uses Socratica version 2.1

### Columns

- zip_code
    - (string) Home ZIP Code of the people tested as given by the medical provider.
    - Will only include Chicago ZIP codes.
- week_number
    - (number) Sequential count of weeks, starting at the beginning of 2020.
- week_start
    - (floating timestamp) First date of the week.
- week_end
    - (floating timestamp) Last date of the week.
- cases_weekly
    - (number) Total number of cases in the week.
    - If the cumulative total (cases_cumulative) for the ZIP code is fewer than 5, then this will be NULL to protect privacy.
- cases_cumulative
    - (number) Total number of cases up to and including the week.
    - If the cumulative total for the ZIP code is fewer than 5, then this will be NULL to protect privacy. Therefore a blank indicates a number between 0 and 4.
- case_rate_weekly
    - (number) Case rate per 100,000 population in the week.
- case_rate_cumulative
    - (number) Total case rate per 100,000 population for all cases up to and including the week.
- tests_weekly
    - (number) Number of tests in the week.
    - Tests performed prior to March 1, 2020 are not included.
    - Each test is counted once, therefore, there can be multiple tests for the same person.
    - Only molecular (PCR) tests reported to the CDPF through electronic lab reporting are included.
    - Due to procedural difficulties the number of tests reported prior to May 1, 2020 will be an undercount of community infection.
    - This sohuld be treated as a low-end estimate.
- tests_cumulative
    - (number) Total number of tests up to and including the week.
    - Tests performed prior to March 1, 2020 are not included.
    - Each test is counted once, therefore, there can be multiple tests for the same person.
    - Only molecular (PCR) tests reported to the CDPF through electronic lab reporting are included.
    - Due to procedural difficulties the number of tests reported prior to May 1, 2020 will be an undercount of community infection.
    - This sohuld be treated as a low-end estimate.
- test_rate_weekly
    - (number) Test rate per 100,000 population in the week.
    - Tests performed prior to March 1, 2020 are not included.
    - Each test is counted once, therefore, there can be multiple tests for the same person.
    - Only molecular (PCR) tests reported to the CDPF through electronic lab reporting are included.
    - Due to procedural difficulties the number of tests reported prior to May 1, 2020 will be an undercount of community infection.
    - This should be treated as a low-end estimate.
- test_rate_cumulative
    - (number) Test rate per 100,000 population up to and including the week.
    - Tests performed prior to March 1, 2020 are not included.
    - Each test is counted once, therefore, there can be multiple tests for the same person.
    - Only molecular (PCR) tests reported to the CDPF through electronic lab reporting are included.
    - Due to procedural difficulties the number of tests reported prior to May 1, 2020 will be an undercount of community infection.
    - This should be treated as a low-end estimate.
- percent_tested_positive_weekly
    - (number) Percentage of tests returning positive results in the week, based on speciment collection date.
    - If the cumulative total (cases_cumulative) for the ZIP code is fewer than 5, then this will be NULL to protect privacy.
    - Calculated by dividing the number of positive tests by the number of total tests.
    - Note that "total tests" include individuals being tested multiple times.
    - This should be treated as a high-end estimate.
- percent_tested_positive_cumulative
    - (number) Percentage of tests returning positive results up to and including the week, based on speciment collection date.
    - If the cumulative total (cases_cumulative) for the ZIP code is fewer than 5, then this will be NULL to protect privacy.
    - Calculated by dividing the number of positive tests by the number of total tests.
    - Note that "total tests" include individuals being tested multiple times.
    - This should be treated as a high-end estimate.
- deaths_weekly
    - (number) Number of deaths in the week.
- deaths_cumulative
    - (number) Number of deaths up to and including the week.
- death_rate_weekly
    - (number) Death rate per 100,000 population in the week.
- death_rate_cumulative
    - (number) Death rate per 100,000 population up to and including the week.
- population
    - (number) ZIP code population.
- row_id
    - (string) Unique identifier for the row.
    - Combination of the ZIP code and the week number.
- zip_code_location
    - (point) A point within the ZIP code to allow for geographic analysis.

In [13]:
import io
import os

import pandas as pd
import requests

SOCRATICA_APP_TOKEN = os.environ["SOCRATICA_APP_TOKEN"]

In [91]:
response = requests.get("https://data.cityofchicago.org/resource/yhhz-zm2v.csv", headers={"X-App-Token": SOCRATICA_APP_TOKEN}, params={"$limit": int(1e9)})

In [45]:
if not response.ok:
    print(response.content)

In [97]:
with io.BytesIO(response.content) as buffer:
    df = pd.read_csv(buffer, parse_dates=["week_start", "week_end"])
    df["week_start"] = df["week_start"].dt.date.astype(str)
    df["week_end"] = df["week_end"].dt.date.astype(str)

In [74]:
len(df)

2520

In [75]:
len(df.zip_code.unique())
print(df.zip_code.unique())

['60603' '60604' '60602' '60611' '60601' '60619' '60606' '60607' '60609'
 '60608' '60605' '60610' '60621' '60633' '60636' '60612' '60625' '60626'
 '60613' '60614' '60615' '60622' '60640' '60642' '60643' '60618' '60645'
 '60630' '60631' '60616' '60617' '60646' '60647' '60639' '60649' '60644'
 '60827' '60620' '60623' '60656' '60629' '60624' '60637' '60655' '60666'
 '60628' '60638' '60652' '60634' '60632' '60654' '60657' '60659' '60660'
 'Unknown' '60641' '60651' '60653' '60661' '60707']


In [76]:
print(df.week_start.min(), "to", df.week_end.max())

2020-03-01 to 2020-12-19


In [78]:
df.columns

Index(['zip_code', 'week_number', 'week_start', 'week_end', 'cases_weekly',
       'cases_cumulative', 'case_rate_weekly', 'case_rate_cumulative',
       'tests_weekly', 'tests_cumulative', 'test_rate_weekly',
       'test_rate_cumulative', 'percent_tested_positive_weekly',
       'percent_tested_positive_cumulative', 'deaths_weekly',
       'deaths_cumulative', 'death_rate_weekly', 'death_rate_cumulative',
       'population', 'row_id', 'zip_code_location', 'tested_positive_weekly',
       'tested_positive_cumulative'],
      dtype='object')

In [99]:
parsed = df.pivot(index="zip_code", columns="week_start", values=["cases_weekly", "cases_cumulative", "case_rate_weekly"])
parsed.columns = ['_'.join(col).strip() for col in parsed.columns.values]

In [100]:
parsed.head()

Unnamed: 0_level_0,cases_weekly,cases_weekly,cases_weekly,cases_weekly,cases_weekly,cases_weekly,cases_weekly,cases_weekly,cases_weekly,cases_weekly,...,case_rate_weekly,case_rate_weekly,case_rate_weekly,case_rate_weekly,case_rate_weekly,case_rate_weekly,case_rate_weekly,case_rate_weekly,case_rate_weekly,case_rate_weekly
week_start,2020-03-01,2020-03-08,2020-03-15,2020-03-22,2020-03-29,2020-04-05,2020-04-12,2020-04-19,2020-04-26,2020-05-03,...,2020-10-11,2020-10-18,2020-10-25,2020-11-01,2020-11-08,2020-11-15,2020-11-22,2020-11-29,2020-12-06,2020-12-13
zip_code,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
60601,,,10.0,9.0,5.0,5.0,3.0,7.0,13.0,7.0,...,245.3,190.8,258.9,402.0,497.4,354.3,327.1,313.5,177.2,374.8
60602,,,,,,,2.0,0.0,0.0,3.0,...,482.3,160.8,241.2,643.1,321.5,401.9,562.7,160.8,80.4,160.8
60603,,,,,,,,,,,...,170.4,0.0,255.5,0.0,425.9,766.6,425.9,255.5,85.2,170.4
60604,,,,,,,,2.0,5.0,2.0,...,127.9,127.9,767.3,1150.9,895.1,1023.0,767.3,255.8,255.8,0.0
60605,,,20.0,16.0,13.0,13.0,28.0,25.0,17.0,14.0,...,159.9,265.3,290.7,381.6,425.2,570.5,210.8,276.2,268.9,221.7


In [103]:
parsed.head()

Unnamed: 0_level_0,cases_weekly_2020-03-01,cases_weekly_2020-03-08,cases_weekly_2020-03-15,cases_weekly_2020-03-22,cases_weekly_2020-03-29,cases_weekly_2020-04-05,cases_weekly_2020-04-12,cases_weekly_2020-04-19,cases_weekly_2020-04-26,cases_weekly_2020-05-03,...,case_rate_weekly_2020-10-11,case_rate_weekly_2020-10-18,case_rate_weekly_2020-10-25,case_rate_weekly_2020-11-01,case_rate_weekly_2020-11-08,case_rate_weekly_2020-11-15,case_rate_weekly_2020-11-22,case_rate_weekly_2020-11-29,case_rate_weekly_2020-12-06,case_rate_weekly_2020-12-13
zip_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
60601,,,10.0,9.0,5.0,5.0,3.0,7.0,13.0,7.0,...,245.3,190.8,258.9,402.0,497.4,354.3,327.1,313.5,177.2,374.8
60602,,,,,,,2.0,0.0,0.0,3.0,...,482.3,160.8,241.2,643.1,321.5,401.9,562.7,160.8,80.4,160.8
60603,,,,,,,,,,,...,170.4,0.0,255.5,0.0,425.9,766.6,425.9,255.5,85.2,170.4
60604,,,,,,,,2.0,5.0,2.0,...,127.9,127.9,767.3,1150.9,895.1,1023.0,767.3,255.8,255.8,0.0
60605,,,20.0,16.0,13.0,13.0,28.0,25.0,17.0,14.0,...,159.9,265.3,290.7,381.6,425.2,570.5,210.8,276.2,268.9,221.7


In [83]:
parsed.to_csv("example_output.csv")