## Week 4 assignment: Defining problems and relevant data

### Question 1

**Instruction**: Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.

In this project, I will implement a system that recommends neighborhoods to move in based on the preferences of a person looking for an apartment. People pick a neighborhood to live in for different reasons (affordability, distance to their work or school, etc.), but one factor that a lot of them consider is whether the neighborhood has certain types of stores or places nearby that they like. For instance, a person might want to live in a neighborhood with lots of good restaurants, or another person might want a park close to their place. I am thus planning to create a system that could be used on an online apartment marketplace (like [Zillow](https://www.zillow.com/)) that gathers information about different neighborhoods in a city via Foursquare location data, asks the customer what kind of stores or places they want their place to be close to, and pushes neighborhood recommendations based on the preferences. Although realistically customers should be able to choose which city they live in, I will focus on the city of Seattle, where I spent a year and I would LOVE to go back there one day! 

### Question 2

**Instruction**: Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

For retrieving information about neighborhoods in Seattle, I will use [Apartment List](https://www.apartmentlist.com/renter-life/average-rent-in-seattle), a website that lists neighborhoods in a city as well as the average rent (and average rent per 750 sqft) for each neighborhood. In order to tie each neighborhood with a specific longitude and latitude, I will use google map to look up each neighborhood and write down their longitude and latitude. Although this is a tedious step, I believe this will not be too time-consuming for one city. Besides, I could not find any datasets that display both geographical information such as zip codes and neighborhood names in Seattle, along with their average rent.

For collecting information about businesses and amenities in different neighborhoods, I will use the Foursquare location data. The data enables us to search for a specific type of venues around a given location, learn more about the specific venues such as the tips provided by Foursquare users, and explore trending venues around a given location. I will mostly use the last feature in order to figure out what type of venues one can expect the most from each neighborhood, and make recommendations accordingly.

### Example of datasets

Below is the first 3 rows of what my Seattle neighborhood + rent + longitude&latitude dataset should look like.

Neighborhood  | Average_Rent  | Average_Rent_750     | Longitude          | Latitude
------------- | ------------- | -------------------  | ------------------ | -------------
Belltown      | 2,245         | 2,359                | 47.614709015632435 | -122.34526800898152
Lake Union    | 2,146         | 2,275                | 47.64139838880749  | -122.3329762894856
Downtown      | 2,119         | 2,301                | 47.60619289008479  | -122.33253325887549

## Week 5: Implementation, analysis and write-up

### Step 1: Preparing a Seattle neighborhood dataset with longitudes and latitudes

First of all, I will generate a dataset with different neighborhoods in Seattle and their average rent by scraping from [this website](https://www.zumper.com/rent-research/seattle-wa). It is different from the website that I said I would refer to in the last week's assignment - this one has more information from a lot more neighborhoods.

Let's start with a cell importing the libraries we will need.

In [153]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

In [145]:
source = requests.get('https://www.zumper.com/rent-research/seattle-wa').text
soup = BeautifulSoup(source, 'html5lib')

In [131]:
My_table = soup.find('table', {'class': 'NeighborhoodRent_table__2AiTW'})

In [111]:
tds = My_table.findAll('td')

Neighborhoods = []
Average_Rent = []
for i in range(len(tds)):
    if i in list(range(0,len(tds),2)):
        Neighborhoods.append(tds[i].text.strip())
    elif i in list(range(1,len(tds),2)):
        Average_Rent.append(tds[i].text.strip())

In [150]:
dat = {'Neighborhood':Neighborhoods, 'Average_Rent':Average_Rent}
df_Seattle = pd.DataFrame(dat)
df_Seattle.head()

Unnamed: 0,Neighborhood,Average_Rent
0,Broadway,"$1,495"
1,Belltown,"$1,935"
2,University District,"$1,253"
3,Lower Queen Anne,"$1,669"
4,First Hill,"$1,581"


Now I have to add information about the longitude and latitude of each neighborhood. Though this is not ideal, I will do so by copying and pasting the lon&lat information from google map and adding the results on the `df_Seattle` dataset. 
After that, let's import the updated dataset and make sure everything looks good.

In [164]:
df_Seattle_lonlat = pd.read_csv('df_Seattle_lonlat.csv')
df_Seattle_lonlat.head()

Unnamed: 0.1,Unnamed: 0,Neighborhood,Average_Rent,Latitude,Longitude
0,0,Broadway,"$1,495",47.625314,-122.324739
1,1,Belltown,"$1,935",47.614646,-122.344796
2,2,University District,"$1,253",47.663065,-122.314097
3,3,Lower Queen Anne,"$1,669",47.624803,-122.356765
4,4,First Hill,"$1,581",47.609792,-122.323554


Let's visualize the table above to where different neighborhoods are in Seattle.

In [165]:
address = 'Seattle, WA'

geolocator = Nominatim(user_agent="wa_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [166]:
map_Seattle = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, neighborhood in zip(df_Seattle_lonlat['Latitude'], df_Seattle_lonlat['Longitude'], df_Seattle_lonlat['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Seattle)

map_Seattle

### Step 2: Combining the dataset with Foursquare venue info

I'll first define Foursquare credentials and version.

In [168]:
CLIENT_ID = 'JQTSUMMAW4B4PCAVBUSJCXJBCX4ICX353MVBHQANDEKSYYKU'
CLIENT_SECRET = 'VWYLGGF3HHKBC1ZAPMAUF5DYWBN4ZWQTDCWOB4FOIO11U0LL'
VERSION = '20201212'
LIMIT = 100

Let's create a function to get 100 venues in all the neighborhoods in Seattle within a radius of 500 meters.

In [169]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the above function on each neighborhood of Seattle, and create a new dataframe.

In [171]:
Seattle_venues = getNearbyVenues(df_Seattle_lonlat['Neighborhood'], df_Seattle_lonlat['Latitude'], df_Seattle_lonlat['Longitude'])
Seattle_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Broadway,47.625314,-122.324739,Top Pot Doughnuts,47.62463,-122.32547,Donut Shop
1,Broadway,47.625314,-122.324739,Harry's Fine Foods,47.624402,-122.326771,Restaurant
2,Broadway,47.625314,-122.324739,Sol Liquor Lounge,47.624605,-122.325434,Cocktail Bar
3,Broadway,47.625314,-122.324739,Barjot,47.625701,-122.326748,Café
4,Broadway,47.625314,-122.324739,Single Shot,47.624688,-122.325431,American Restaurant
