# Data from HTML tables: L.A. County covid cases

In [1]:
import pandas as pd

#### L.A. County COVID cases, deaths by community

In [2]:
url = 'http://publichealth.lacounty.gov/media/Coronavirus/locations.htm'

#### Read the "CITY/COMMUNITY" cases table into a dataframe

In [3]:
df = pd.read_html(url)[1]

#### Define cleaner column names

In [4]:
df.columns = ['place', 'cases', 'case_rate', 'deaths', 'death_rate']

In [5]:
df.head(10)

Unnamed: 0,place,cases,case_rate,deaths,death_rate
0,City of Agoura Hills,4088,19576.0,22,105.0
1,City of Alhambra,16488,19012.0,249,287.0
2,City of Arcadia,7867,13622.0,163,282.0
3,City of Artesia,4109,24466.0,88,524.0
4,City of Avalon,62,1602.0,0,0.0
5,City of Azusa,13590,27158.0,155,310.0
6,City of Baldwin Park,22676,29538.0,383,499.0
7,City of Bell,13697,37700.0,138,380.0
8,City of Bell Gardens,14013,32535.0,134,311.0
9,City of Bellflower,23061,29666.0,252,324.0


---

#### How many deaths in Los Angeles - Del Rey? Encino?

In [6]:
df[df['place'].str.contains('Del Rey')].sort_values('deaths', ascending=False).head(10)

Unnamed: 0,place,cases,case_rate,deaths,death_rate
112,Los Angeles - Del Rey,5852,19548.0,42,140.0
172,Los Angeles - Playa Del Rey,397,12422.0,3,94.0
246,Unincorporated - Del Rey,87,27358.0,1,314.0


#### How many cases in places outside Los Angeles? [Hint](https://www.google.com/search?q=Python+pandas+string+does+not+contain&oq=Python+pandas+string+does+not+contain)

In [7]:
df[~df['place'].str.contains('Los Angeles')]['cases'].sum()

1454356

#### Which large cities/neighborhoods have the highest rates? 

In [8]:
df[df['cases'] > 10000].sort_values('case_rate', ascending=False).head(10)

Unnamed: 0,place,cases,case_rate,deaths,death_rate
220,Los Angeles - Wholesale District*,16159,44726.0,134,371.0
66,City of San Fernando,10367,42122.0,80,325.0
113,Los Angeles - Downtown*,11530,41917.0,65,236.0
166,Los Angeles - Pacoima,31875,41407.0,279,362.0
208,Los Angeles - Vernon Central,21527,41400.0,194,373.0
125,Los Angeles - Florence-Firestone,19303,40685.0,154,325.0
191,Los Angeles - Sylmar*,33213,40308.0,305,370.0
260,Unincorporated - Florence-Firestone,26055,40267.0,254,393.0
104,Los Angeles - Century Palms/Cove,13406,39703.0,130,385.0
90,Los Angeles - Arleta,13560,39453.0,133,387.0


---

#### Bonus: Create a true/false column for Los Angeles

In [9]:
df['in_la'] = df['place'].str.contains('Los Angeles')

#### Bonus: Were there more cases in the county vs. the city of Los Angeles? 

In [10]:
(df['in_la'].value_counts(normalize=True)*100).round()

False    58.0
True     42.0
Name: in_la, dtype: float64

#### Discussion: How would you determine whether there were disproportionately more cases in the county? 