# Capstone Project - The Battle of Neighborhoods (Week 1) - Visit Hawaii



## Introduction

### Background

After COVID-19 shelter-in-place policy is ended, a Seattle-based travel agency, **Aloha Capstone Travel Inc.** (aka **Capstone Travel** in short), wants to run a promotion **Visit Hawaii** to bring tourists to Hawaii for a short-term vacationing or stay, in which the length of stay can be anywhere from 14 days to several months. Since there are six major islands in Hawaii, so one of first few questions **Capstone Travel** needs to help customers to decide is - *Which island customers should visit and stay based on their preferences or needs?*

### Problem

Already, there are plenty of information about Hawaii available online, so **Capstone Travel** needs to stay ahead and provide the *useful* and *decisive* information to customers on Hawaii by aggregating information on island activities, places of interest, local culture, food, accommodation, weather, local events, and etc.

To get started with the **Visit Hawaii** project, **Capstone Travel** will hire an outside data science consultant to help together a proposal for this new initiative to explore the 10 most common venues for all six major islands in Hawaii, so the final report should include venues for Kauai, Maui, Oahu, Molokai, Lanai, and Big Island. The report should also include the analysis of k-mean clustering using visual elements for business proposal presentations.

> Note: Big Island is also called "Island of Hawaii".

### Interest

Around the world, public places, schools, workplaces, and etc. have been shut down to keep people safe. All the previously well-planned and non-essential travel were either cancelled or postponed. In some areas, people have been coped up with this *lockdown* at least for two months. Currently, **Capstone Travel** wants to get ready when governments have lifted the *shelter-in-place* policy when pandemic is under control. Eventually, people will need to plan an *escape* and to recover from the physiological and emotional impact of this extended *lockdown*.

Finally, this data science report will be targeted to stakeholders interested in *visiting Hawaii* for a *short-term stay* from 14 days to several months *after lockdown has been lifted*.

### Scope of the Project

The scope of the project will be focused on these areas:
- Use *Foursquer API* to retrieve location data
- Data acquisition and cleaning should be discussed
- Methodology will be detailed in the report
- Use visual tool for illustration, presentation, or any decision made
- Scope of data gathering should be scoped to these six major Hawaii islands: Kauai, Oahu, Molokai, Lanai, Maui, and Big Island.
- Economic aspect of local business activities will not be covered or discussed in this project
- Kalawao County will be excluded from this project.

### Assumptions

There is one assumption needs to be made here just to set the expectation:
- Pricing or cost will not be discussed and included as part of the report 

### Success Criteria

For this project to be successful, it needs to meet several important criteria:
- It needs to help **Capstone Travel** to provide useful information to customers before before traveling to Hawaii
- The project should provide good recommendation of island choice based on customers preference

### Thought Process

The diagram below illustrated is the consultant's thought process and how he plans to tackle the problem.

<img src="images/thought_process.png" alt="Thought Process" width="1000" />



## Brief Overview of Islands in Hawaii

There are six major islands to visit in Hawaii: Kauai, Oahu, Molokai, Lanai, Maui, and the island of Hawaii! Each has its own distinct personality, adventures, activities and sights. To briefly understand each of these island will help tourists to experience Hawaii.

Wikipedia.org and State of Hawaii websites will be the primary source of the data to describe the overview of these six Hawaii islands.

| Island | Nick Name | Island Capital | Population | # of Zipcodes | Area Size | Latitude | Longitude |
|--------|-----------|---------------|------------|---------------|-----------|----------|-----------|
| Kauai  | Garden Isle | Lihue | 72,000 | 10 | 1000 sq/m | 50.10 | -172.30 |

### Data Availability

Obviously, we need to start collecting data about these six major islands in Hawaii, here is the list of islands in Hawaii, starting from the North to the South:
- Kauai
- Oahu
- Molokai
- Lanai
- Maui
- Big Island

< display Hawaii map here >

While researching the availability of data, the consultant realized that the granular neighborhood data can be obtained by collecting the zipcode information. Due to the size of the islands and local transportation, most of the neighborhood have their own local post offices, and each post office has its an unique 5-digit zipcode assigned. Being able to collect the neighborhoods' zipcodes is crucial for the first step of this data science project. So data cleaning and wrangling is required before the data analysis process.

With the strong background in database architecture, the data science consultant comes up with he diagram below which help to illustrates the relationship of data entities to business, and at the same time the diagram will help to focus on data gathering process.

<img src="images/data_relationship.png" alt="Data Relationship" width="1000" />

Let's follow the numbers to review each data entity.

#### (1) State of Hawaii

That's right, here is the starting point of what consultant is going to focus.

#### (2) Counties

From the high-level, there are four counties in Hawaii:
- Kauai County
- Honolulu County
- Maui County
- Hawaii County

*Note: Kalawao County on Molokai island maybe listed as the fifth county in Hawaii, this is very small, like less than 100 population using 2017 reported number. The county is under the sole jurisdiction and control of the state health department, owing to the county's history as a treatment colony for individuals suffering from Hansen's disease.*

#### (3) Six Major Hawaii Islands

Just to be clear, most of the counties above have one or more islands. For example, if you take a look at the map of Kauai County, you will notice that there is a island called Niihau to the south west of Kauai main island. But the focus of this project is only for the six major Hawaii islands.

So there is a "one-to-many" relationship between county and islands. It is very important to understand this. For example, Maui County consists of three major islands: Maui, Molokai, and Lanai.

> Note: "Hawaii County" or "Island of Hawaii" is for the Big Island. If you read "Hawaii", then it is for the whole State of Hawaii.

#### (4) Zipcodes

Now we get to the fun part of the data - zipcodes. We can say that there is a "one-to-many" between island and zipcodes. In other words, an island can have more than one zipcodes associated to it. In Hawaii, usually there is a post office building for each zipcode number assigned.

#### (5) Neighborhoods

Finally, there are the neighborhoods we need to apply the data science studies on. First of all, there is a "one-to-one" relationship between a zipcode and a neighborhood. Understanding of this relationship will help us to make a call to Foursquare API to retrieve the venues information or help us to explore the neighborhood.



## Data Acquisition and Cleaning

Understanding of **Brief Overview of Islands in Hawaii** section is going to help the consultant to focus on the tasks and narrow his research to break down the bigger tasks into much smaller and achievable tasks. 

### Data Acquisition

You may have noticed that references were added to the bottom of above diagram. Let's go through each one of them how the consultant plans to acquire the data from the varied sources.

Since we already have the numbering in place, 

#### (1) State of Hawaii & (2) Counties

The information for the list of counties in Hawaii is available in Wikipedia. The consultant plans to use the wen page scraping tool to retrieve these data points:
- Counties (Note: Kalawao County will be excluded from this project.)
- Capital cities
- Population based US Census report in 2007
- Area in square miles

Any additional data will be needed to make this dataset complete:
- None

> References:
> - List of counties in Hawaii, https://en.wikipedia.org/wiki/List_of_counties_in_Hawaii
> - List of counties in the United States, https://en.wikipedia.org/wiki/List_of_United_States_counties_and_county_equivalents*

#### (3) Islands

Here is the list of the data points will be screen-scraped and retrieved from the Wikipedia and GoHawaii websites:
- Island
- Nickname
- Area
- Population (Reported in 2010)
- Density
- Latitude
- Longitude

Additional data will be needed to make this dataset complete:
- None

> References:
> - Wikipedia - Hawaiian Islands, https://en.wikipedia.org/wiki/Hawaiian_Islands
> - GoHawaii - Hawaiian Islands, https://www.gohawaii.com/islands

> Note: To keep data consistency, the **okina** character (\`) will be removed from the data.

#### (4) Zipcodes

Since each neighborhood is assigned a given zipcode, so zipcode will be used to query neighborhood information using Foursquare API. So to retrieve all the zipcodes in Hawaii, the consultant will use ZipcodeToGo.com to gather the data. Here is the list of data points will be collected:
- Zipcode
- City / Neighborhood
- County
- Island
- Latitude
- Longitude

Additional data will be needed to make this dataset complete:
- None

> References:
> - ZipcodeToGo.com, https://zipcodetogo.com/hawaii
> - Google Maps, http://maps.google.com
> - UnitedStatesZipcode.org, https://unitedstateszipcodes.org

> Note: To keep data consistency, the **okina** character (\`) will be removed from the data.

#### (5) Neighborhoods

Finally, once the neighborhood information is collected in (4) step, the consultant can proceed with venue research using Foursquare.

> References:
> - Foursquare.com, https://foursquare.com/



### Data Cleaning




### Brief Overview of each Island in Hawaii
- Wikipedia.org

### Location (Exploration)
- Felium Map

### Weather


### Flight
- OpenFlight.org, https://openflights.org/data.html
- FlightStats.com, https://developer.flightstats.com/api-docs/scheduledFlights/v1
- https://www.hawaiitourismauthority.org

### Accommodation
- Airbnb listings, https://github.com/nderkach/airbnb-python
- Pricing prediction, https://github.com/samuelklam/airbnb-pricing-prediction
- Good sample report, https://towardsdatascience.com/airbnb-rental-listings-dataset-mining-f972ed08ddec
- Airbnb data download, http://insideairbnb.com/get-the-data.html

### Coronavirus Cases in Hawaii
- https://www.nytimes.com/interactive/2020/us/hawaii-coronavirus-cases.html
- https://www.hawaii-aloha.com/podcast/2020/05/04/next-steps-for-re-opening-hawaii/
- https://www.gohawaii.com/special-alerts-information
- https://health.hawaii.gov/coronavirusdisease2019/#situation
- https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/

### Activities
- Calender
- https://www.gohawaii.com/experiences
- 



# References
- https://en.wikipedia.org/wiki/List_of_counties_in_Hawaii
- List of zipcodes in Hawaii, https://www.zipcodestogo.com/Hawaii/
- Detail zipcode lookup, https://www.melissa.com/v2/lookups/mapzip/zipcode/?zipcode=96850
- Detail zipcode lookup in JSON, https://www.melissa.com/v2/lookups/mapzip/zipcode/?zipcode=96850&fmt=json&id=
- https://www.unitedstateszipcodes.org/96850/

### Python Libraries
- https://www.w3resource.com/python-exercises/geopy/python-geopy-nominatim_api-exercise-3.php
- https://pypi.org/project/uszipcode/

# Worksheet

In [1]:
import pandas as pd

In [2]:
!conda install -c anaconda lxml --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - lxml


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.1.1   |                0         132 KB  anaconda
    certifi-2020.4.5.1         |           py36_0         159 KB  anaconda
    libxslt-1.1.33             |       h7d1a2b0_0         577 KB  anaconda
    lxml-4.5.0                 |   py36hefd8a0e_0         1.6 MB  anaconda
    openssl-1.1.1g             |       h7b6447c_0         3.8 MB  anaconda
    ------------------------------------------------------------
                                           Total:         6.2 MB

The following NEW packages will be INSTALLED:

  libxslt            anaconda/linux-64::libxslt-1.1.33-h7d1a2b0_0
  lxml               anaconda/linu