# Capstone Project - The Battle of the Neighborhoods (Week 1)
# Vacation Planner using K-Means Clustering
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)



## Introduction: Business Problem <a name="introduction"></a>


### Background

Planning a trip to a destination with as many choices as Los Angeles can be intimidating. Planning is not easy. Especially when it involves traveling to unknown places. 

There are so many websites and guidebooks you could wade through, that the amount of information is overwhelming. So I've tried to take some of the stress out of the process by organizing the best planning resources for you all in one place. This Los Angeles Vacation Planner will help guide you through the decisions you will need to make to maximize your Los Angeles vacation, whether you are traveling on a tight budget, or looking for a luxury getaway.

### Problem

In this project we will try to outline optimal travel plans for a city. Specifically, this report will be targeted to travellers interested in finding **popular spots** to visit during their trip to **Los Angeles**, California.

Since there are lots of scenic places and spots to visit in Los Angeles, we will try to pick **venues that are most popular in Los Angeles based on Foursquare data**. 

We will use our data science powers to **generate an itinerary** based on this criteria. 

### Interest
This will be of interest to travellers who have never been to Los Angeles and who would like to make the best use of their time. We will help maximize their vacation with this itinerary, filled with a wide variety of attractions, activities, and dining options that will truly showcase the best of L.A.

We will help travelers to plan their travel itinerary. The algorithm should be able to provide an **optimal itinerary recommendation** in terms of *distance* and *popular spots* in the city. 

This can be extended to any place of interest(city/country). The user can provide a desired place of interest and we will generate a travel plan for them based on popular venues in and around the place of interest.

## Data <a name="data"></a>

Considering the problem, we need to generate a travel itinerary for a trip to Los Angeles. 
We can obtain data from the following sources:

 * Popular venues in Los Angeles, California **(from Foursquare API)**

Foursquare is a company best known for its eponymous city guide and has the most trusted, crowdsourced recommendations for the best things to eat, see, and do near you.

We need to obtain and clean the data to get a list of popular venues in Los Angeles that contains the following columns:
 * Venue name, 
 * Venue category, 
 * Venue latitude, and 
 * Venue longitude.

### Import libraries

In [14]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Libraries imported.')

Libraries imported.


### Obtaining data from Foursquare

In [15]:
# @hidden_cell

CLIENT_ID = 'JX1EFM0ONRLEAUWS4G32ESMTOXODGMK50FZ4GEE3PY4MA01V' # your Foursquare ID
CLIENT_SECRET = 'LUGQWQZRJIC4QBVPD3Z2JIJCQ0QBU1O1MBYXJTPRIQ3R0Q3G' # your Foursquare Secret
VERSION = '20200119'

In [16]:
city = 'Los Angeles, CA'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude
print("The coordinates of {} are {}, {}".format(city,latitude, longitude))

The coordinates of Los Angeles, CA are 34.0536909, -118.2427666


#### Define the corresponding URL to retrieve a list of popular spots

In [17]:
base_url = 'https://api.foursquare.com/v2/'
url = base_url + 'venues/explore?near={}&sortByPopularity=1&client_id={}&client_secret={}&v={}'.format(city,CLIENT_ID,CLIENT_SECRET,VERSION)
print(url)

https://api.foursquare.com/v2/venues/explore?near=Los Angeles, CA&sortByPopularity=1&client_id=JX1EFM0ONRLEAUWS4G32ESMTOXODGMK50FZ4GEE3PY4MA01V&client_secret=LUGQWQZRJIC4QBVPD3Z2JIJCQ0QBU1O1MBYXJTPRIQ3R0Q3G&v=20200119


#### Send the GET request and parse the results

In [18]:
results = requests.get(url).json()

Let's define a function to get the category type

In [19]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### Get relevant part of JSON and transform it into a *pandas* dataframe

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [20]:
# assign relevant part of JSON to venues
venues = results['response']['groups'][0]['items']

# transform venues into a dataframe
nearby_venues = json_normalize(venues)

# filter columns that include venue name, categories and anything that is associated with location (lat/long)
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.rename(columns={'categories':'category'},inplace=True)
nearby_venues.head()

Unnamed: 0,name,category,lat,lng
0,Universal Studios Hollywood,Theme Park,34.136999,-118.355473
1,Griffith Park,Park,34.135117,-118.304965
2,Third Street Promenade,Shopping Plaza,34.015445,-118.496161
3,Venice Beach,Beach,33.986001,-118.475833
4,The Grove,Shopping Mall,34.071964,-118.357365


In [21]:
print('{} venues near {} were returned by Foursquare.'.format(nearby_venues.shape[0],city))

30 venues near Los Angeles, CA were returned by Foursquare.


## Methodology <a name="methodology"></a>

In this project we will direct our efforts on preparing an itinerary with the most popular venues. We will limit our analysis to 30 most popular venues around Los Angeles.

In the first step, we have collected the required **data: location and type (category) of every venue near Los Angeles**. We have also **identified most popular venues** (according to Foursquare categorization).

Second step in our analysis will be exploration of '**venues**' across different areas of Los Angeles - we will use **Folium maps** to visualize the popular venues.

In the third and final step, we will focus on creating **clusters of venues that are close to each other** (using **k-means clustering**) and generate a travel itinerary based on the number of travel days (vacation period).