# Exercise - Gathering Multiple Datasets

In this excersise, you will gather the hospital building data using three different gathering methods. The data includes information on hospital buildings such as height, number of stories, etc.

Ensure you programmatically load your dataset(s) into the notebook.

In [1]:
#Imports - can be modified
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

## 1. Extract a dataset via API

### 1.1 Extract a dataset via API
You may use the requests library to do so programmatically, or manually access the dataset via an API:

https://data.chhs.ca.gov/api/3/action/datastore_search?resource_id=d97adf28-ebaf-4204-a29e-bb6bdb7f96b9

In [4]:
#FILL IN
#Extract data via API
data_api = requests.get('https://data.chhs.ca.gov/api/3/action/datastore_search?resource_id=d97adf28-ebaf-4204-a29e-bb6bdb7f96b9')
#Raise an exception if we made a request resulting in an error
data_api.raise_for_status()
#Get the JSON
data_api_json = data_api.json()
data_api_json

{'help': 'https://data.chhs.ca.gov/api/3/action/help_show?name=datastore_search',
 'success': True,
 'result': {'include_total': True,
  'limit': 100,
  'records_format': 'objects',
  'resource_id': 'd97adf28-ebaf-4204-a29e-bb6bdb7f96b9',
  'total_estimation_threshold': None,
  'records': [{'_id': 1,
    'County Code': '01 - Alameda',
    'Perm ID': '11210',
    'Facility Name': 'Alameda Hospital',
    'City': 'Alameda',
    'Building Nbr': 'BLD-01278',
    'Building Name': 'Original Hospital',
    'Building Status': 'No Gen Acute Care - OSHPD Bldg',
    'SPC Rating *': 'N/A',
    'Building URL': 'https://esp.oshpd.ca.gov/CitizenAccess/Cap/CapDetail.aspx?Module=Permits&TabName=Permits&capID1=26HIS&capID2=00000&capID3=00002&agencyCode=OSHPD',
    'Height (ft)': '44.17',
    'Stories': '4',
    'Building Code': 'Unknown',
    'Building Code Year': None,
    'Year Completed': '1926',
    'AB 1882 Notice': None,
    'Latitude': '37.7626572',
    'Longitude': '-122.2538986',
    'Count': '1

### 1.2 Parse the obtained data
Parse the obtained data to get the **first** relevant data value or record from your JSON file. 

**Note:** Please ensure the result you obtain is in text and is relevant to hospital building data.

In [10]:
#Fill in - get the first data record/value from the JSON results
data_api_json['result']['records'][1]

{'_id': 2,
 'County Code': '01 - Alameda',
 'Perm ID': '11210',
 'Facility Name': 'Alameda Hospital',
 'City': 'Alameda',
 'Building Nbr': 'BLD-01279',
 'Building Name': 'Stephens Wing',
 'Building Status': 'In Service',
 'SPC Rating *': '2',
 'Building URL': 'https://esp.oshpd.ca.gov/CitizenAccess/Cap/CapDetail.aspx?Module=Permits&TabName=Permits&capID1=56HIS&capID2=00000&capID3=00012&agencyCode=OSHPD',
 'Height (ft)': '35',
 'Stories': '3',
 'Building Code': '1952 Uniform Building Code (UBC)',
 'Building Code Year': '1952',
 'Year Completed': '1956',
 'AB 1882 Notice': 'This building does not significantly jeopardize life, but may not be repairable or functional following an earthquake.',
 'Latitude': '37.7626572',
 'Longitude': '-122.2538986',
 'Count': '1'}

What data did you see in the output value?
Its about a facility named Alameda Hospital and details about it. Such as location, year completed, building number and building names.

## 2. Extract a dataset via manual download

### 2.1 Download a dataset manually
We provided you a csv file `hospital_building_data.csv`. You can think that we pre-downloaded the data for you.

Load the dataset into this notebook.

In [11]:
#FILL IN - load a dataset that was downloaded manually into a dataframe
df = pd.read_csv('hospital_building_data.csv')
df.head()

Unnamed: 0,County Code,Perm ID,Facility Name,City,Building Nbr,Building Name,Building Status,SPC Rating *,Building URL,Height (ft),Stories,Building Code,Building Code Year,Year Completed,AB 1882 Notice,Latitude,Longitude,Count
0,01 - Alameda,11210,Alameda Hospital,Alameda,BLD-01278,Original Hospital,No Gen Acute Care - OSHPD Bldg,,https://esp.oshpd.ca.gov/CitizenAccess/Cap/Cap...,44.17,4.0,Unknown,,1926.0,,37.762657,-122.253899,1
1,01 - Alameda,11210,Alameda Hospital,Alameda,BLD-01279,Stephens Wing,In Service,2,https://esp.oshpd.ca.gov/CitizenAccess/Cap/Cap...,35.0,3.0,1952 Uniform Building Code (UBC),1952.0,1956.0,This building does not significantly jeopardiz...,37.762657,-122.253899,1
2,01 - Alameda,11210,Alameda Hospital,Alameda,BLD-01280,West Wing,In Service,2,https://esp.oshpd.ca.gov/CitizenAccess/Cap/Cap...,,2.0,1964 Uniform Building Code (UBC),1964.0,1968.0,This building does not significantly jeopardiz...,37.762657,-122.253899,1
3,01 - Alameda,11210,Alameda Hospital,Alameda,BLD-01281,South Wing,In Service,3s,https://esp.oshpd.ca.gov/CitizenAccess/Cap/Cap...,,3.0,1976 California Building Code (CBC),1976.0,1983.0,,37.762657,-122.253899,1
4,01 - Alameda,11210,Alameda Hospital,Alameda,BLD-01282,Radiology Addition,In Service,5s,https://esp.oshpd.ca.gov/CitizenAccess/Cap/Cap...,,2.0,1985 California Building Code (CBC),1985.0,1995.0,,37.762657,-122.253899,1


### 2.2 Parse the obtained data
Parse the obtained data to get the **first** relevant data value or record from your manually downloaded dataset.

**Note:** Please ensure the result you obtain is relevant to the hospital building data.

In [12]:
#Fill in - get the first data record/value from the manually downloaded file
df.iloc[1]

County Code                                                01 - Alameda
Perm ID                                                           11210
Facility Name                                          Alameda Hospital
City                                                            Alameda
Building Nbr                                                  BLD-01279
Building Name                                             Stephens Wing
Building Status                                              In Service
SPC Rating *                                                          2
Building URL          https://esp.oshpd.ca.gov/CitizenAccess/Cap/Cap...
Height (ft)                                                        35.0
Stories                                                             3.0
Building Code                          1952 Uniform Building Code (UBC)
Building Code Year                                               1952.0
Year Completed                                                  

What data did you see in the output? Is that the same with the data you gathered from the API?
Yes, I see the same record and data about the Alameda Hospital record.

## 3. Extract a dataset via scraping

### 3.1 Extract your dataset via scraping
Data webpage url:

https://data.chhs.ca.gov/datastore/odata3.0/d97adf28-ebaf-4204-a29e-bb6bdb7f96b9

Extract your dataset via scraping using `requests`, and `BeautifulSoup`.

In [17]:
##FILL IN 
#Extract a dataset via scraping
response = requests.get('https://data.chhs.ca.gov/datastore/odata3.0/d97adf28-ebaf-4204-a29e-bb6bdb7f96b9')
#Raise an exception if we made a request resulting in an error
response.raise_for_status()
#Access the content of the response in Unicode
response_text = response.text

In [18]:
#FILL IN
# Use BeautifulSoup to parse the result
bs = BeautifulSoup(response_text)
# Print the prettified version
bs.prettify()



'<?xml version="1.0" encoding="utf-8" standalone="yes"?>\n<feed xml:base="https://data.chhs.ca.gov/datastore/odata3.0/" xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">\n <title type="text">\n  Hospital Building Data (CSV)\n </title>\n <id>\n  https://data.chhs.ca.gov/datastore/odata3.0/d97adf28-ebaf-4204-a29e-bb6bdb7f96b9\n </id>\n <updated>\n  2024-05-23T18:17:35.464040Z\n </updated>\n <link href="https://data.chhs.ca.gov/datastore/odata3.0/d97adf28-ebaf-4204-a29e-bb6bdb7f96b9" rel="self" title="d97adf28-ebaf-4204-a29e-bb6bdb7f96b9"/>\n <entry>\n  <id>\n   https://data.chhs.ca.gov/datastore/odata3.0/d97adf28-ebaf-4204-a29e-bb6bdb7f96b9(1)\n  </id>\n  <title type="text">\n   Row 1\n  </title>\n  <updated>\n   2024-05-23T18:17:35.464040Z\n  </updated>\n  <author>\n   <name>\n    ckan\n   </name>\n  </author>\n  <category scheme="http://schemas.microsoft.com/ado/2

### 3.2 Parse the obtained data 
**Note:** Please ensure the result you obtain is in text (not with HTML tags) and is relevant to the hospital buidling data. Hint: you can use the `find_all()` method with tags like `d:buildingname`, `d:buildingcode`, etc.

In [29]:
#FILL IN - parse specific records from scraped data

for buildingname in bs.find_all("d:buildingname"):
    print(buildingname.text)

Original Hospital
Stephens Wing
West Wing
South Wing
Radiology Addition
Medical Gas Storage
Compactor Shed
Emergency Room Relocation
LOX Tank
Ehman Building
North Wing
East Wing
Original West Wing
West Service Wing - Building 1
Physio-Therapy Building
Original Emergency Wing
Special Procedures Addition
Emergency Department Expansion
Cogeneration Building
Emergency Generator Building
Transformer Building
South Wing - Phase 2
West Service Wing - Building 2 - Structurally connected to BLD-00699
Orig Emergency Wing - East Wing - Structurally connected to BLD-00702
Patient Care Pavilion
New LOX Tank
Emergency Generator Yard
Main Hospital Building
Main Entrance Canopy
Bridge at Link
Admin.
Laundry/Hall of Health
Boiler
1968 Building
1975 Building
1985 Building
Electric Service Building
Oxygen Tank Enclosure
Replacement Hospital
Chiller Building
Main Entrance Canopy
Emergency Entrance Canopy
Building B
Building H
Critical Care Building
New Acute Care Building
Critical Care Parking Structure
O

Brief description of specific data parsed.
The parsed data contains buildings names of hospitals