# Data from HTML tables: L.A. County covid cases

In [5]:
import pandas as pd

#### L.A. County COVID cases, deaths by community

In [6]:
url = 'http://publichealth.lacounty.gov/media/Coronavirus/locations.htm'

#### Read the "CITY/COMMUNITY" cases table into a dataframe

In [26]:
cases_df = pd.read_html(url)[1]

#### Define cleaner column names

In [27]:
cases_df.head()

Unnamed: 0,CITY/COMMUNITY**,Cases,Case Rate1,Deaths,Death Rate2
0,City of Agoura Hills,4094,19604.0,22,105.0
1,City of Alhambra,16498,19024.0,249,287.0
2,City of Arcadia,7869,13625.0,163,282.0
3,City of Artesia,4109,24466.0,88,524.0
4,City of Avalon,62,1602.0,0,0.0


In [28]:
cases_df.columns = ["place", "cases", "cases_rate", "deaths","death_rate"]

---

In [29]:
cases_df.head()

Unnamed: 0,place,cases,cases_rate,deaths,death_rate
0,City of Agoura Hills,4094,19604.0,22,105.0
1,City of Alhambra,16498,19024.0,249,287.0
2,City of Arcadia,7869,13625.0,163,282.0
3,City of Artesia,4109,24466.0,88,524.0
4,City of Avalon,62,1602.0,0,0.0


#### How many deaths in Los Angeles - Del Rey? Encino?

In [30]:
cases_df[cases_df["place"].str.contains("Del Rey")]

Unnamed: 0,place,cases,cases_rate,deaths,death_rate
112,Los Angeles - Del Rey,5857,19565.0,42,140.0
172,Los Angeles - Playa Del Rey,399,12484.0,3,94.0
246,Unincorporated - Del Rey,87,27358.0,1,314.0


In [31]:
cases_df[cases_df["place"].str.contains("Encino")]

Unnamed: 0,place,cases,cases_rate,deaths,death_rate
120,Los Angeles - Encino,10075,22304.0,87,193.0


#### How many cases in places outside Los Angeles? [Hint](https://www.google.com/search?q=Python+pandas+string+does+not+contain&oq=Python+pandas+string+does+not+contain)

In [32]:
new_cases= cases_df[~cases_df["place"].str.contains("Los Angeles")]

In [33]:
new_cases

Unnamed: 0,place,cases,cases_rate,deaths,death_rate
0,City of Agoura Hills,4094,19604.0,22,105.0
1,City of Alhambra,16498,19024.0,249,287.0
2,City of Arcadia,7869,13625.0,163,282.0
3,City of Artesia,4109,24466.0,88,524.0
4,City of Avalon,62,1602.0,0,0.0
...,...,...,...,...,...
337,Unincorporated - Whittier,724,19133.0,10,264.0
338,Unincorporated - Whittier Narrows,64,533333.0,1,8333.0
339,Unincorporated - Willowbrook,13618,39006.0,128,367.0
340,Unincorporated - Wiseburn,1398,23196.0,14,232.0


#### Which large cities/neighborhoods with more than 10K cases have the highest rates? 

In [38]:
cases_df[cases_df["cases"] > 10000].sort_values("cases_rate", ascending = False).head()

Unnamed: 0,place,cases,cases_rate,deaths,death_rate
220,Los Angeles - Wholesale District*,16161,44731.0,134,371.0
66,City of San Fernando,10368,42126.0,80,325.0
113,Los Angeles - Downtown*,11540,41953.0,65,236.0
166,Los Angeles - Pacoima,31883,41418.0,279,362.0
208,Los Angeles - Vernon Central,21536,41417.0,194,373.0


---

#### Bonus: Create a true/false column for Los Angeles

In [41]:
cases_df["is_la"]=cases_df["place"].str.contains("Los Angeles")

In [42]:
cases_df.sample(5)

Unnamed: 0,place,cases,cases_rate,deaths,death_rate,is_la
299,Unincorporated - Roosevelt,217,23308.0,2,215.0,False
332,Unincorporated - West Rancho Dominguez,384,28256.0,8,589.0,False
130,Los Angeles - Hancock Park,3267,19174.0,10,59.0,True
43,City of Lakewood,18740,23319.0,164,204.0,False
264,Unincorporated - Harbor Gateway,4,400000.0,0,0.0,False


#### Bonus: Were there more cases in the county vs. the city of Los Angeles? 

In [44]:
cases_df[cases_df["is_la"] == True]["cases"].sum()

2357279

In [45]:
cases_df[cases_df["is_la"] == False]["cases"].sum()

1454948

#### Discussion: How would you determine whether there were disproportionately more cases in the county? 