# Exercise - Gathering Multiple Datasets

In this excersise, you will gather the hospital building data using three different gathering methods. The data includes information on hospital buildings such as height, number of stories, etc.

Ensure you programmatically load your dataset(s) into the notebook.

In [1]:
#Imports - can be modified
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

## 1. Extract a dataset via API

### 1.1 Extract a dataset via API
You may use the requests library to do so programmatically, or manually access the dataset via an API:

https://data.chhs.ca.gov/api/3/action/datastore_search?resource_id=d97adf28-ebaf-4204-a29e-bb6bdb7f96b9

In [2]:
#FILL IN
#Exteact data via API
url = 'https://data.chhs.ca.gov/api/3/action/datastore_search?resource_id=d97adf28-ebaf-4204-a29e-bb6bdb7f96b9'
api_metadata = requests.get(url)

#Raise an exception if we made a request resulting in an error
api_metadata.raise_for_status()

#Get the JSON
api_text = api_metadata.json()
print(api_text)

{'help': 'https://data.chhs.ca.gov/api/3/action/help_show?name=datastore_search', 'success': True, 'result': {'include_total': True, 'resource_id': 'd97adf28-ebaf-4204-a29e-bb6bdb7f96b9', 'fields': [{'type': 'int', 'id': '_id'}, {'type': 'text', 'id': 'County Code'}, {'type': 'text', 'id': 'Perm ID'}, {'type': 'text', 'id': 'Facility Name'}, {'type': 'text', 'id': 'City'}, {'type': 'text', 'id': 'Building Nbr'}, {'type': 'text', 'id': 'Building Name'}, {'type': 'text', 'id': 'Building Status'}, {'type': 'text', 'id': 'SPC Rating *'}, {'type': 'text', 'id': 'Building URL'}, {'type': 'text', 'id': 'Height (ft)'}, {'type': 'text', 'id': 'Stories'}, {'type': 'text', 'id': 'Building Code'}, {'type': 'text', 'id': 'Building Code Year'}, {'type': 'text', 'id': 'Year Completed'}, {'type': 'text', 'id': 'AB 1882 Notice'}, {'type': 'text', 'id': 'Latitude'}, {'type': 'text', 'id': 'Longitude'}, {'type': 'text', 'id': 'Count'}], 'records_format': 'objects', 'records': [{'Count': '1', 'Facility Na

### 1.2 Parse the obtained data
Parse the obtained data to get the **first** relevant data value or record from your JSON file. 

**Note:** Please ensure the result you obtain is in text and is relevant to hospital building data.

In [3]:
#Fill in - get the first data record/value from the JSON results
api_text['result']['records'][0]

{'Count': '1',
 'Facility Name': 'Alameda Hospital',
 'County Code': '01 - Alameda',
 'Height (ft)': '44.17',
 'Building Code Year': None,
 'City': 'Alameda',
 'Building Code': 'Unknown',
 'Building Nbr': 'BLD-01278',
 'Building URL': 'https://esp.oshpd.ca.gov/CitizenAccess/Cap/CapDetail.aspx?Module=Permits&TabName=Permits&capID1=26HIS&capID2=00000&capID3=00002&agencyCode=OSHPD',
 'Year Completed': '1926',
 'Building Status': 'No Gen Acute Care - OSHPD Bldg',
 'SPC Rating *': 'N/A',
 'Building Name': 'Original Hospital',
 'Longitude': '-122.2538986',
 'Latitude': '37.7626572',
 'Stories': '4',
 'Perm ID': '11210',
 '_id': 1,
 'AB 1882 Notice': None}

*Fill in*: What data did you see in the output value?  
Answer:  
I see data related to a hospital building in Alemeda. The facility name is Alameda Hospital and the building has four stories. The buildging Nbr is `BLD-01278`.

## 2. Extract a dataset via manual download

### 2.1 Download a dataset manually
We provided you a csv file `hospital_building_data.csv`. You can think that we pre-downlowed the data for you.

Load the dataset into this notebook.

In [4]:
#FILL IN - load a dataset that was downloaded manually into a dataframe
cal_hhs = pd.read_csv('hospital_building_data.csv', encoding='utf-8')

### 2.2 Parse the obtained data
Parse the obtained data to get the **first** relevant data value or record from your manually downloaded dataset.

**Note:** Please ensure the result you obtain is relevant to the hostiple building data.

In [5]:
#Fill in - get the first data record/value from the manually downloaded file
cal_hhs.head(1)

Unnamed: 0,County Code,Perm ID,Facility Name,City,Building Nbr,Building Name,Building Status,SPC Rating *,Building URL,Height (ft),Stories,Building Code,Building Code Year,Year Completed,AB 1882 Notice,Latitude,Longitude,Count
0,01 - Alameda,11210,Alameda Hospital,Alameda,BLD-01278,Original Hospital,No Gen Acute Care - OSHPD Bldg,,https://esp.oshpd.ca.gov/CitizenAccess/Cap/Cap...,44.17,4.0,Unknown,,1926.0,,37.762657,-122.253899,1


*Fill in*: What data did you see in the output? Is that the same with the data you gathered from the API?

## 3. Extract a dataset via scraping

### 3.1 Extract your dataset via scraping
Data webpage url:

https://data.chhs.ca.gov/datastore/odata3.0/d97adf28-ebaf-4204-a29e-bb6bdb7f96b9

Extract your dataset via scraping using `requests`, and `BeautifulSoup`.

In [6]:
##FILL IN 
#Extract a dataset via scraping
url = 'https://data.chhs.ca.gov/datastore/odata3.0/d97adf28-ebaf-4204-a29e-bb6bdb7f96b9'
data = requests.get(url)

#Raise an exception if we made a request resulting in an error
data.raise_for_status()

#Access the content of the response in Unicode
data_txt = data.text
print(data_txt)

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed xml:base="https://data.chhs.ca.gov/datastore/odata3.0/" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom">
  <title type="text">Hospital Building Data (CSV)</title>
  <id>https://data.chhs.ca.gov/datastore/odata3.0/d97adf28-ebaf-4204-a29e-bb6bdb7f96b9</id>
  <updated>2023-06-16T20:58:49.601875Z</updated>
  <link rel="self" title="d97adf28-ebaf-4204-a29e-bb6bdb7f96b9" href="https://data.chhs.ca.gov/datastore/odata3.0/d97adf28-ebaf-4204-a29e-bb6bdb7f96b9" />
  
  <entry>
    <id>https://data.chhs.ca.gov/datastore/odata3.0/d97adf28-ebaf-4204-a29e-bb6bdb7f96b9(1)</id>
    <title type="text">Row 1</title>
    <updated>2023-06-16T20:58:49.601875Z</updated>
    <author>
      <name>ckan</name>
    </author>
    <category term="ckan.odata.d97adf28-ebaf-4204-a29e-bb6bdb7f96b9" scheme="http://schemas.microsoft.com

In [7]:
#FILL IN
# Use BeautifulSoup to parse the result
soup_text = BeautifulSoup(data_txt)

# Print the prettified version
soup_text.prettify()

### 3.2 Parse the obtained data 
**Note:** Please ensure the result you obtain is in text (not with HTML tags) and is relevant to the hospital buidling data. Hint: you can use the `find_all()` methood with tags like `d:buildingname`, `d:buildingcode`, etc.

In [8]:
#FILL IN - parse specific records from scraped data
[hs.get_text() for hs in soup_text.find_all('d:buildingname')]

*FILL IN*: Brief description of specific data parsed.  
Answer:  
I parsed all the hospital building in the dataset.