# Capstone Project - The Battle of Neighborhoods (Week 1) - Visit Hawaii

**Applied Data Science Capstone by IBM/Coursera**

The purpose of this Capstone assignment is to showcase students skills and the tools using location data to explore a geographical location. Students will have the opportunity to be as *creative* as we want and come up with an idea to leverage the *Foursquare* location data to explore or compare neighborhoods or cities of our choice or to come up with a problem that we can use the *Foursquare* location data to solve.

In [9]:
# global setting for verbose
verbose = False

In [10]:
# The code was removed by Watson Studio for sharing.
google_api_key = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # TODO: REMOVE THE KEY

In [11]:
# The code was removed by Watson Studio for sharing.
foursquare_client_id = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' # your Foursquare ID
foursquare_client_secret = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' # your Foursquare Secret
foursquare_version = '20180724' # Foursquare API version

# Foursquare category - outdoors
outdoor_categoryId = '4d4b7105d754a06377d81259'
outdoor_categoryName = 'outdoors'

# Foursquare category - art
art_categoryId = '4d4b7104d754a06370d81259'
art_categoryName = 'art'

# Foursquare category - food
food_categoryId = '4d4b7105d754a06374d81259'
food_categoryName = 'food'

# Foursquare category - bars
bars_categoryId = '4d4b7105d754a06376d81259'
bars_categoryName = 'bars'

## Section 2: Data

### Data Description

There are 6 major islands to visit in Hawaii: Kauai, Oahu, Molokai, Lanai, Maui, and the island of Hawaii. Each island has its own distinct personality, characteristics, adventures, activities and sights. To briefly understand each of these Hawaiian islands it will definitely help visitors to experience the *Hawaii*, and it will also help the data science consultant to better prepare for the project.

As mentioned previously, due to resource constraints, only the following datasets will be explored and presented in the report:
1. Create a *custom* Hawaiian islands dataset. Wikipedia.org and State of Hawaii websites will be the primary source of the data.
2. Query each Hawaiian island's 20 top sites using the **Foursquare Explore API**.
3. Use the **Foursquare Search API** to retrieve the following information about each site:
    - Arts & entertainment
    - Restaurants
    - Events
4. Finally, use the open source dataset to check on the COVID-19 statistics in Hawaii. This critical information will help visitors to time and plan for the exciting trip.

### Hawaiian Islands Dataset

Obviously, we need to start collecting data about these 6 major islands in Hawaii, here is the list of islands in Hawaii, starting from the North to the South:
- Kauai
- Oahu
- Molokai
- Lanai
- Maui
- Big Island

In [12]:
# import pandas library for DataFrame support
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [13]:
# import requests and sys libraries
import requests
import sys

# method: Use Google API to retrieve coordinates using the address information
def get_coordinates(api_key, address):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        e = sys.exc_info()[0]
        return [None, None]

# test: "get_coordinates"
hawaii_address = 'Honolulu, Hawaii, USA'
hawaii_coordinates = get_coordinates(google_api_key, hawaii_address)

if verbose:
    print('Coordinate of {}: {}'.format(hawaii_address, hawaii_coordinates))

In [14]:
# method: Use Google API to retrieve neighborhood information using the address information
def get_neighborhood(api_key, address):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        results = response['results']
        neighborhood_data = results[0]['address_components'] # get neighborhood
        for item in neighborhood_data:
            if item['types'][0] == 'neighborhood':
                return item['long_name']              
        return[None]
    except:
        e = sys.exc_info()[0]
        return [None]    

# test: "get_neighborhood"
hawaii_address = '96830' # Zip code for Waikiki
hawaii_neighborhood = get_neighborhood(google_api_key, hawaii_address)

if verbose:
    print('Coordinate of {}: {}'.format(hawaii_address, hawaii_neighborhood))

In [15]:
# import folium for map
import folium

# once we have the Honolulu coordinate (almost center-located in Hawaii), we can use it to display the Hawaii map
hawaii_map = folium.Map(location=hawaii_coordinates, zoom_start=8)
hawaii_map

While researching the availability of data, the consultant realized that the granular neighborhood data can be obtained by collecting the zip code information. Due to the size of the islands and local transportation, most of the neighborhoods have their own local post offices, and each post office has its own unique 5-digit zip code assigned. Being able to collect the neighborhood zip codes is crucial for the first step of this data science project. So data cleansing and wrangling is required before the data science process.

With a strong background in database architecture, the data science consultant comes up with a visual diagram below which help to illustrates the relationship of data entities to business, and at the same time the diagram will help him to focus on data gathering process.

<img src="https://raw.githubusercontent.com/finesketch/Coursera_Capstone/master/images/data_relationship.png" alt="Data Relationship" width="1000" />

Let's follow the numbers to review each data entity.

1. `State of Hawaii`: That's right, here is the starting point of what consultant is going to focus on - Hawaii.
2. `Counties`: From the high-level, there are 4 counties in Hawaii, so consultant will use Wikipedia to collect information on these counties: Kauai County, Honolulu County, Maui County, and Hawaii County.
3. `6 Major Hawaii Islands`: Just to be clear, most of the counties above have one or more islands. For example, if you take a look at the map of Kauai County, you will notice that there is an island called Niihau to the south west of Kauai island. But the focus of this project is only for the 6 major Hawaiian islands. There is a "one-to-many" relationship between county and islands. It is very important to understand this. For example, Maui County consists of three major islands: Maui, Molokai, and Lanai.
4. `Zip Codes`: Now we get to the fun part of the data - zip codes. We can say that there is a "one-to-many" between island and zip codes. In other words, an island can have more than one zip codes associated to it. In Hawaii, usually there is a post office building for each zip code assigned.
5. `Neighborhoods`: Finally, there are the neighborhoods we need to apply the data science studies on. We can say that there is a "one-to-one" relationship between a zip code and a neighborhood. Understanding of this relationship will help us to make a call to Foursquare API to retrieve the venues information or help us to explore the neighborhood.

> Note: Kalawao County on Molokai island maybe listed as the fifth county in Hawaii, however this is very small, like less than 100 population using 2017 reported number. The county is under the sole jurisdiction and control of the state health department, owing to the county's history as a treatment colony for individuals suffering from Hansen's disease.*

> Note: "Hawaii County" or "Island of Hawaii" is for the Big Island. If you read "Hawaii", then it is for the whole State of Hawaii.

## Data Acquisition and Cleaning

Understanding of the **Hawaiian Islands Dataset** section will definitely help the consultant to focus on the tasks and narrow his research to break down a bigger task into much smaller activities and achievable tasks. 

In [16]:
# import BeautifulSoup
!conda install -c anaconda beautifulsoup4 --yes
from bs4 import BeautifulSoup

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    beautifulsoup4-4.9.1       |           py36_0         168 KB  anaconda
    ca-certificates-2020.1.1   |                0         132 KB  anaconda
    certifi-2020.4.5.1         |           py36_0         159 KB  anaconda
    openssl-1.1.1g             |       h7b6447c_0         3.8 MB  anaconda
    soupsieve-2.0.1            |             py_0          33 KB  anaconda
    ------------------------------------------------------------
                                           Total:         4.3 MB

The following NEW packages will be INSTALLED:

  beautifulsoup4     anaconda/linux-64::beautifulsoup4-4.9.1-py36_0
  soupsieve          a

In [17]:
# install lxml
!conda install -c anaconda lxml --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - lxml


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    libxslt-1.1.33             |       h7d1a2b0_0         577 KB  anaconda
    lxml-4.5.1                 |   py36hefd8a0e_0         1.4 MB  anaconda
    ------------------------------------------------------------
                                           Total:         1.9 MB

The following NEW packages will be INSTALLED:

  libxslt            anaconda/linux-64::libxslt-1.1.33-h7d1a2b0_0
  lxml               anaconda/linux-64::lxml-4.5.1-py36hefd8a0e_0



Downloading and Extracting Packages
lxml-4.5.1           | 1.4 MB    | ##################################### | 100% 
libxslt-1.1.33       | 577 KB    | ##################################### 

### Data Acquisition

You may have noticed that references were added to the bottom of above diagram. Let's go through each one of them how the consultant plans to acquire the data from the varied sources.

#### (1) State of Hawaii & (2) Counties

The information for the list of counties in Hawaii is available in Wikipedia. The consultant plans to use the wen page scraping tool to retrieve these data points:
- Counties (Note: Kalawao County will be excluded from this project.)
- Capital cities
- Population based US Census report in 2007
- Area in square miles

In [18]:
# screen-scarping for counties in Hawaii
url = 'https://en.wikipedia.org/wiki/List_of_counties_in_Hawaii'

# read the data into dataframe format using pandas and lxml parser
county_dfs = pd.read_html(url)
county_dfs = county_dfs[1].head(10)

# raw data of county_dfs
if verbose:
    county_dfs.head()

#### (3) Islands

Here is the list of the data points will be screen-scraped and retrieved from the Wikipedia and GoHawaii websites. Below is the first datasets, the consultant tries to build:

| Island Dataset | Data Point Description |
|----------------|-------------|
| `island` | Name of the Hawaiian Islands. |
| `nickname` | Each island has its own nick name, it may be helpful to learn this as a tourist. |
| `county` | Name of county associated to the island. |
| `capital _city` | Capital city of the county. |
| `population` | Island population. |
| `density` | Density of population. |
| `area` | Area size of an island. |
| `latitude` | Island latitude. |
| `longitude` | Island longitude. |

> Note: To keep data consistency, the *okina* character (\`) will be removed from the data.

In [19]:
# screen-scraping for the Hawaiian islands using Wikipedia
url = 'https://en.wikipedia.org/wiki/Hawaiian_Islands'

# load content into memory
page = requests.get(url)

# use BeautifulSoup to parser the content
soup = BeautifulSoup(page.content, 'html.parser')

# only grab the content in "wikitable sortable" table
main_islands = soup.find_all('table', class_='wikitable sortable')
island = main_islands[0]

# raw data of island, just to confirm
if verbose:
    for header in island.find_all('th'):
        print(header.get_text())

In [20]:
# Data Wrangling: Combining datasets from "county_dfs" and "island"

# the column names for the island dataframe
island_columns = [
    'island', 
    'nickname', 
    'county', 
    'capital_city',
    'population',
    'density',
    'area_size',
    'latitude', 
    'longitude']

# create the empty island dataframe
island_df = pd.DataFrame(columns=island_columns)

island_name = ""
nickname = ""
county = ""
capital_city_name = ""
population = ""
density = ""
area = ""

for row in island.find_all('tr'):
    col_index = 0
    for col in row.find_all('td'):
        # island name
        if col_index == 0:
            island_name = col.get_text()
            if len(island_name) > 3 and island_name.index('[') > 0:
                island_name = island_name[0:island_name.index('[')]
                island_name = island_name.replace("ʻ", '')
                island_name = island_name.replace("ā", 'a')
            else:
                island_name = "no island_name"
            # county
            if island_name == 'Molokai' or island_name == 'Lanai':
                county = "Maui County"
            elif island_name == 'Oahu':
                county = "Honolulu County"                
            else:
                county = island_name + " County"
        # nickname
        if col_index == 1:
            nickname = col.get_text().strip()
        # area
        if col_index == 2:
            area = col.get_text()
            if len(area) > 3 and area.index('(') > 0:
                area = area[0:area.index('(')]
        # population
        if col_index == 3:
            population = col.get_text().strip()
        # density
        if col_index == 4:
            density = col.get_text()
            if len(density) > 3 and density.index('(') > 0:
                density = density[0:density.index('(')]
        col_index += 1
        
    # Add islands to dataframe
    capital_city_name = ""
    if island_name not in island_df.values:
        island_coordinates = get_coordinates(google_api_key, island_name)
        capital_city = county_dfs.loc[county_dfs['County'] == county, "County seat[7]"]
        for city in capital_city:
            capital_city_name = city
        island_df = island_df.append({'island': island_name,
                                      'nickname': nickname,
                                      'county': county,
                                      'capital_city': capital_city_name,
                                      'population': population,
                                      'density': density,
                                      'area_size': area,
                                      'latitude': island_coordinates[0],
                                      'longitude': island_coordinates[1]}, ignore_index=True)
    print(' .', end='')


# Remove other islands or empty row
drop_index = island_df[(island_df.island == '')].index
island_df.drop(drop_index, inplace=True)
drop_index = island_df[(island_df.island == 'Niihau')].index
island_df.drop(drop_index, inplace=True)
drop_index = island_df[(island_df.island == 'Kahoolawe')].index
island_df.drop(drop_index, inplace=True)

# reset index
island_df.reset_index(drop=True, inplace=True)

# persist the data into local file system
island_df.to_pickle('./island_df.pkl')

# output updated dataframe
if verbose:
    island_df

 . . . . . . . . .

#### (4) Zip Codes & (5) Neighborhoods

Since each neighborhood is assigned a given zip code, so zip code will be used to query neighborhood information using Foursquare API. So to retrieve all the zip codes in Hawaii, the consultant will use ZipcodeToGo.com to gather the data. 

Here is the second dataset the consultant will need to create:

| Neighborhood Dataset | Data Point Description |
|----------------------|-------------|
| `island` | Name of the Hawaiian Islands. |
| `county` | Name of county associated to the island. |
| `zip_code` | Name of the Hawaiian Islands. |
| `neighborhood` | Each island has its own nick name, it may be helpful to learn this as a tourist. |
| `latitude` | Neighborhood latitude. |
| `longitude` | Neighborhood longitude. |
| `crime_incidents` | Public records for crime committed using CrimeMapping data source (captured data offline and saved in CSV). |
| `cost_of_living` | Cost of living for a given zip code or neighborhood. The US standard is 100 (the base), the state of Hawaii is 176 for the cost of living. |
| `jan_avg_temperature` | Average temperature in January. |
| `feb_avg_temperature` | Average temperature in February. |
| `mar_avg_temperature` | Average temperature in March. |
| `apr_avg_temperature` | Average temperature in April. |
| `may_avg_temperature` | Average temperature in May. |
| `jun_avg_temperature` | Average temperature in June. |
| `jul_avg_temperature` | Average temperature in July. |
| `aug_avg_temperature` | Average temperature in August. |
| `sep_avg_temperature` | Average temperature in September. |
| `oct_avg_temperature` | Average temperature in October. |
| `nov_avg_temperature` | Average temperature in November. |
| `dec_avg_temperature` | Average temperature in December. |
| `housing_rent` | The cheaper rent for a smaller unit in a given neighborhood. Rent can go up as high as several thousands for a larger unit. |
| `housing_real_estate` | Average home price in a neighborhood. |


Finally, once the neighborhood dataset is collected, the consultant can proceed with venue research using Foursquare API.

In [21]:
# screen-scraping Hawaii zip codes
url = 'https://www.zipcodestogo.com/Hawaii/'

# load the data info a dataframe
zipcode_df = pd.read_html(url)
zipcode_df = zipcode_df[1]

# drop first two unused rows (headers)
drop_index = zipcode_df[(zipcode_df.index == 0)].index
zipcode_df.drop(drop_index, inplace=True)
drop_index = zipcode_df[(zipcode_df.index == 1)].index
zipcode_df.drop(drop_index, inplace=True)

# reset the dataframe index
zipcode_df.reset_index(drop=True, inplace=True)

# rename the column
zipcode_df.columns = ['zipcode', 'neighborhood', 'county', 'map']

# drop unused column
del zipcode_df['map']

# review the dataframe
if verbose:
    zipcode_df.head()

In [22]:
# load crime data CSV into dataframe
crime_df = pd.read_csv('hawaii_crime_zipcodes.csv')

if verbose:
    crime_df.head()

In [23]:
# Data Wrangling: Combining datasets from zip codes, neighborhood, plus others

# The column names for the island dataframe
neighborhood_columns = [
    'island', 
    'county', 
    'zipcode', 
    'neighborhood',
    'latitude', 
    'longitude']

# Create the empty neighborhood dataframe
neighborhood_df = pd.DataFrame(columns=neighborhood_columns)

# additional columns
neighborhood_df['crime_incidents'] = 0
neighborhood_df['cost_of_living'] = 0
neighborhood_df['jan_avg_temperature'] = 0
neighborhood_df['feb_avg_temperature'] = 0
neighborhood_df['mar_avg_temperature'] = 0
neighborhood_df['apr_avg_temperature'] = 0
neighborhood_df['may_avg_temperature'] = 0
neighborhood_df['jun_avg_temperature'] = 0
neighborhood_df['jul_avg_temperature'] = 0
neighborhood_df['aug_avg_temperature'] = 0
neighborhood_df['sep_avg_temperature'] = 0
neighborhood_df['oct_avg_temperature'] = 0
neighborhood_df['nov_avg_temperature'] = 0
neighborhood_df['dec_avg_temperature'] = 0
neighborhood_df['housing_rent'] = 0
neighborhood_df['housing_real_estate'] = 0

for i, data in zipcode_df.iterrows():
    # island
    if data[0] == "96763": # Lanai
        island = "Lanai"
    elif data[0] == "96770" or data[0] == "96729" or data[0] == "96757" or data[0] == "96748": # Molokai
        island = "Molokai"
    elif data[2] == "Honolulu":
        island = "Oahu"
    else:
        island = data[2]
    
    # Get Honolulu neighborhood
    if data[1] == "Honolulu":
        neighborhood = get_neighborhood(google_api_key, data[0])
        if neighborhood is None:
            neighborhood = data[1]
    else:
        neighborhood = data[1]
        
    # Update neighborhood if available
    if neighborhood is None:
        neighborhood = crime_df.loc[crime_df['zipcode'] == int(data[0]), 'neighborhood'].values[0]        
        
    # Get coordinates
    zipcode_coordinates = get_coordinates(google_api_key, data[0])
    
    # Get crime information
    crime_incidents = crime_df.loc[crime_df['zipcode'] == int(data[0]), 'crime_incidents'].values[0]
    
    # Insert data
    neighborhood_df = neighborhood_df.append({
                                  'island': island,
                                  'county': data[2] + " County",
                                  'zipcode': data[0],
                                  'neighborhood': neighborhood,
                                  'latitude': zipcode_coordinates[0],
                                  'longitude': zipcode_coordinates[1],
                                  'crime_incidents': crime_incidents
                                  }, ignore_index=True)
    
    print(' .', end='')

if verbose:
    neighborhood_df.head()    

 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In [24]:
# Screen-scriping cost of living information

# iterate through the zip codes and query for the additional informatation on cost of living, weather, rental
for i, data in zipcode_df.iterrows():
    
    # search for location using zip code
    url = 'https://www.areavibes.com/ax_search_full/?query={}'.format(data[0])
    response = requests.get(url).json()    
    results = response
    if verbose:
        print('')
        print('query URL =>', url)    
        print(results)
    
    if results:
        try:
            state_abbr = results[0]['state_abbr']
            city = results[0]['city']
            hood = results[0]['hood']
            
            # cost of living
            if hood:
                cost_of_living_url = 'https://www.areavibes.com/{}-{}/{}/cost-of-living/'.format(city, state_abbr, hood).replace(' ', '+')
            else:
                cost_of_living_url = 'https://www.areavibes.com/{}-{}/cost-of-living/'.format(city, state_abbr).replace(' ', '+')
            if verbose:
                print('cost_of_living_url URL =>', cost_of_living_url)
            cost_of_living_df = pd.read_html(cost_of_living_url)
            cost_of_living = cost_of_living_df[0][1][1]
            # update
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'cost_of_living'] = cost_of_living

            # weather
            if hood:
                weather_url = 'https://www.areavibes.com/{}-{}/{}/weather/'.format(city, state_abbr, hood).replace(' ', '+')
            else:
                weather_url = 'https://www.areavibes.com/{}-{}/weather/'.format(city, state_abbr).replace(' ', '+')
            if verbose:
                print('weather_url URL =>', weather_url)
            weather_df = pd.read_html(weather_url)
            weather = weather_df[0][1][1]
            # update
            temperature = weather_df[1][3][1]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'jan_avg_temperature'] = temperature
            temperature = weather_df[1][3][2]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'feb_avg_temperature'] = temperature    
            temperature = weather_df[1][3][3]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'mar_avg_temperature'] = temperature  
            temperature = weather_df[1][3][4]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'apr_avg_temperature'] = temperature  
            temperature = weather_df[1][3][5]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'may_avg_temperature'] = temperature  
            temperature = weather_df[1][3][6]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'jun_avg_temperature'] = temperature 
            temperature = weather_df[1][3][7]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'jul_avg_temperature'] = temperature
            temperature = weather_df[1][3][8]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'aug_avg_temperature'] = temperature    
            temperature = weather_df[1][3][9]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'sep_avg_temperature'] = temperature  
            temperature = weather_df[1][3][10]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'oct_avg_temperature'] = temperature  
            temperature = weather_df[1][3][11]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'nov_avg_temperature'] = temperature  
            temperature = weather_df[1][3][12]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'dec_avg_temperature'] = temperature
            
            # real estate and rent
            if hood:
                real_estate_rent_url = 'https://www.areavibes.com/{}-{}/{}/apartments-for-rent/'.format(city, state_abbr, hood).replace(' ', '+')
            else:
                real_estate_rent_url = 'https://www.areavibes.com/{}-{}/apartments-for-rent/'.format(city, state_abbr).replace(' ', '+')
            if verbose:
                print('real_estate_rent_url URL =>', real_estate_rent_url)
            real_estate_rent_df = pd.read_html(real_estate_rent_url)
            # update
            housing_price = real_estate_rent_df[0][1][1]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'housing_real_estate'] = housing_price
            rent = real_estate_rent_df[0][1][2]
            neighborhood_df.loc[neighborhood_df['zipcode'] == data[0], 'housing_rent'] = rent
            
            print(' .', end='')
        except:
            e = sys.exc_info()[0]
            if verbose:
                print( "<p>Error: %s</p>" % e )            

if verbose:                
    neighborhood_df.head()
    
# persist the data into local file system
neighborhood_df.to_pickle('./neighborhood_df.pkl')

 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

## Island Exploration using the Foursquare API

At this pint, we should have the basic neighborhood information in each 6 major Hawaiian islands. This island and neighborhood information should help visitors to decide on which neighborhood they plan to the extended stay. Now we should use Foursquare API to explore each island.

Looking at the categories documented in Foursquare.com, we will try to explore venues in these categories:
- `Outdoors & Recreation (4d4b7105d754a06377d81259)`: This is the most important exploration category for what Hawaii has to offer. I believe most visitors coming to islands, they want to experience the beauty of the nature. This global category should include places like beaches, parks, mountains, botanical gardens, etc.
- `Art & Entertainment (4d4b7104d754a06370d81259)`: It should include places like museum, music venue, art performance, zoo, theme parks, etc.
- `Food & Restaurants (4d4b7105d754a06374d81259)`: This is a global category for all kind of food, I think it is fine.
- `Nightlife & Bars (4d4b7105d754a06376d81259)`: OK, this query may be very important for visitors who may decide to stay the Honolulu Downtown area.

Due to the size of the data, the the API should just retrieve the top 20 most popular venues in each of above categories.

In [25]:
# method: toppicks
def get_toppicks_venues(island, categoryId, categoryName, limit=20):
    venues_columns = [
        'category', 
        'categoryId', 
        'name', 
        'subcategory',
        'latitude', 
        'longitude',
        'address']

    # Create the empty neighborhood dataframe
    venues_df = pd.DataFrame(columns=venues_columns)    
    
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&near={}&categoryId={}&limit={}&sortByPopularity=1&section=topPicks'.format(
        foursquare_client_id, foursquare_client_secret, foursquare_version, island, categoryId, limit)
    results = requests.get(url).json()['response']['groups'][0]['items']

    for item in results:
        venues_df = venues_df.append({
                              'category': categoryName,
                              'categoryId': item['venue']['id'],
                              'name': item['venue']['name'],
                              'subcategory': item['venue']['categories'][0]['name'],
                              'latitude': item['venue']['location']['lat'],
                              'longitude': item['venue']['location']['lng'],
                              'address': ', '.join(item['venue']['location']['formattedAddress'])
                              }, ignore_index=True)
    return venues_df


In [35]:
# method: get_island_map with venue information
def get_island_map(island, zoom_level=11):
    island_address = island
    island_coordinates = get_coordinates(google_api_key, island_address)  

    outdoor_venues_df = get_toppicks_venues(island_address, outdoor_categoryId, outdoor_categoryName)
    art_venues_df = get_toppicks_venues(island_address, art_categoryId, art_categoryName)
    food_venues_df = get_toppicks_venues(island_address, food_categoryId, food_categoryName)
    bars_venues_df = get_toppicks_venues(island_address, bars_categoryId, bars_categoryName)

    island_map = folium.Map(location=island_coordinates, zoom_start=zoom_level)
    folium.Marker(island_coordinates, popup=island_address).add_to(island_map)

    for i, data in outdoor_venues_df.iterrows():
        folium.CircleMarker([data[4], data[5]], radius=3, color='green', fill=True, fill_color='green', fill_opacity=0.5).add_to(island_map)

    for i, data in art_venues_df.iterrows():
        folium.CircleMarker([data[4], data[5]], radius=3, color='brown', fill=True, fill_color='brown', fill_opacity=0.5).add_to(island_map)

    for i, data in food_venues_df.iterrows():
        folium.CircleMarker([data[4], data[5]], radius=3, color='red', fill=True, fill_color='red', fill_opacity=0.5).add_to(island_map)

    for i, data in bars_venues_df.iterrows():
        folium.CircleMarker([data[4], data[5]], radius=3, color='blue', fill=True, fill_color='blue', fill_opacity=0.5).add_to(island_map)
    
    return island_map

In [27]:
# Kauai with venues
map = get_island_map('Kauai, Hawaii, USA')
map

In [28]:
# Oahu with venues
map = get_island_map('Oahu, Hawaii, USA')
map

In [29]:
# Molokai with venues
map = get_island_map('Molokai, Hawaii, USA')
map

In [30]:
# Lanai with venues
map = get_island_map('Lanai, Hawaii, USA')
map

In [31]:
# Maui with venues
map = get_island_map('Maui, Hawaii, USA')
map

In [45]:
# Big Island with venues
map = get_island_map('Island of Hawaii, Hawaii, USA', 10)
map

## COVID-19 in Hawaii

Finally we cannot talk about tourism without bringing up this topic, this is very real. **Capstone Travel** wants to keep all its employees and customers safe while traveling to anywhere.  Generally speaking, State of Hawaii has done an excellent job of keeping the COVID-19 cases in control. 

So let data speak for itself. The most up-to-date information is available on Hawaii government Department of Health website here, https://health.hawaii.gov/coronavirusdisease2019/. 

In [49]:
# screen-scarping for COVID-19 information
url = 'https://health.hawaii.gov/coronavirusdisease2019'

# load content into memory
page = requests.get(url)

# use BeautifulSoup to parser the content
soup = BeautifulSoup(page.content, 'html.parser')

# only grab the content in "wikitable sortable" table
health_info = soup.find_all('dl', class_='data_list')
report_cases = health_info[0]

# case(s) reported today
for item in report_cases.find_all('dd'):
    print(item.get_text())

Total cases: 728 (5 newly reported)
Hawai’i County: 82
Honolulu County: 493
Kaua’i County: 21
Maui County: 120†
Pending: 0
Residents diagnosed outside of Hawai‘i: 12


In [56]:
# additional screen-scraping Hawaii Department of Health website
url = 'https://health.hawaii.gov/coronavirusdisease2019/what-you-should-know/current-situation-in-hawaii/'

# load the data info a dataframe
health_df = pd.read_html(url)

In [51]:
# total of cases in Hawaii
health_df[0]

Unnamed: 0,0,1
0,Total Cases,728 (5 newly reported)
1,Released from Isolation†,629
2,Required Hospitalization,91
3,Deaths,17


In [52]:
# total of cases on Big Island
health_df[1]

Unnamed: 0,HAWAII COUNTY,Unnamed: 1
0,Total Cases,82 total
1,Released from Isolation†,81
2,Required Hospitalization,1
3,Deaths,0


In [53]:
# total of cases in Honolulu County
health_df[2]

Unnamed: 0,HONOLULU COUNTY,Unnamed: 1
0,Total Cases,493 total
1,Released from Isolation†,415
2,Required Hospitalization,66
3,Deaths,11


In [54]:
# total of cases in Kauai County
health_df[3]

Unnamed: 0,KAUAI COUNTY,Unnamed: 1
0,Total Cases,21 total
1,Released from Isolation†,20
2,Required Hospitalization,1
3,Deaths,0


In [55]:
# total of cases in Maui County (Maui, Molokai, and Lanai)
health_df[4]

Unnamed: 0,MAUI COUNTY,Unnamed: 1
0,Total Cases,120‡ total
1,Released from Isolation†,113
2,Required Hospitalization,22
3,Deaths,6


At this point, we have mapped out all venues for each island, as well as the neighborhood information documented in the first part of this section. So visitors should have the good information to have make decision on which island they plan to visit. In addition, we also review the COVID-19 data available on Hawaii government Department of Health website.

This concludes the data gathering phase - we're now ready to use this data for analysis to produce the report on how to visit the island.