# Webscraping from LA County COVID-19 Site and Creating Plot
- Patty Jula, <pattyjula@gmail.com>

LA County Public Health has been providing daily counts of the number of cases and rates Los Angeles County from COVID-19. This script provides a way to download the day's counts, load to a database, in this case a CSV, and create a plot. 

Source: <http://publichealth.lacounty.gov/media/Coronavirus/locations.htm>
## Note:
This type of webscraping is only available from sites that reveal their source code. The best practice is to ask permission before scraping so an organization's servers are not overloaded.

In [82]:
# Dependencies
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import requests

In [83]:
# Boilerplate code
url = 'http://publichealth.lacounty.gov/media/Coronavirus/locations.htm'
res = requests.get(url)
html_page = res.content

## Parse through the html

In [84]:
soup = BeautifulSoup(html_page, 'html.parser')
#print(soup.prettify())


## Load data to a Pandas dataframe

In [95]:
# column names
column = ['location', 'count', 'rate']

# empty list
data = []
count = 0
table = soup.find("table", {"table table-striped table-bordered table-sm"})#.findAll('tr')
for element in table.findAll("tr"):
    count += 1
    if count > 29: # first 29 rows are not needed
        # find cells containing td
        cells = element.findAll("td")
        info = [cell.text for cell in cells] # get the cell text
        data.append(info) # append to data list


In [111]:
df = pd.DataFrame(data, columns= column,) # convert to dataframe
df = df[df.rate != '--'] # drop select records
#df.to_csv("county.csv", encoding='utf-8', index=False)

In [115]:
# handle empty cells, they are not read as NaN by default
df['rate'].replace('', np.nan, inplace=True)
# now they can be deleted
df.dropna(subset=['rate'], inplace=True)

In [117]:
df.to_csv("county.csv", encoding='utf-8', index=False)
df = pd.read_csv('county.csv', index_col=False)
#df.to_csv("county.csv", encoding='utf-8', index=False)
df2=df[df['location'].apply(lambda x: x.startswith('Los Angeles -'))]
df2.rate.astype(float)
df2 = df2.sort_values(by='rate',ascending=False)

df2.head(10)

Unnamed: 0,location,count,rate
114,Los Angeles - Hancock Park,33,200.91
82,Los Angeles - Beverly Crest,23,185.56
88,Los Angeles - Carthay,24,178.86
97,Los Angeles - Crestview,20,174.73
158,Los Angeles - South Carthay,17,160.44
81,Los Angeles - Bel Air,13,158.19
90,Los Angeles - Century City,19,156.87
136,Los Angeles - Melrose,121,155.86
77,Los Angeles - Adams-Normandie,12,141.39
85,Los Angeles - Brentwood,44,140.72


## To be continued

## foo

# bar

## Explicitly convert Dat field to date

## Again, explicitly set date

# Create plot with seaborn