## Week 4 assignment: Defining problems and relevant data

### Question 1

**Instruction**: Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.

In this project, I will implement a system that recommends neighborhoods to move in based on the preferences of a person looking for an apartment. People pick a neighborhood to live in for different reasons (affordability, distance to their work or school, etc.), but one factor that a lot of them consider is whether the neighborhood has certain types of stores or places nearby that they like. For instance, a person might want to live in a neighborhood with lots of good restaurants, or another person might want a park close to their place. I am thus planning to create a system that could be used on an online apartment marketplace (like [Zillow](https://www.zillow.com/)) that gathers information about different neighborhoods in a city via Foursquare location data, asks the customer what kind of stores or places they want their place to be close to, and pushes neighborhood recommendations based on the preferences. Although realistically customers should be able to choose which city they live in, I will focus on the city of Seattle, where I spent a year and I would LOVE to go back there one day! 

### Question 2

**Instruction**: Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

For retrieving information about neighborhoods in Seattle, I will use [Apartment List](https://www.apartmentlist.com/renter-life/average-rent-in-seattle), a website that lists neighborhoods in a city as well as the average rent (and average rent per 750 sqft) for each neighborhood. In order to tie each neighborhood with a specific longitude and latitude, I will use google map to look up each neighborhood and write down their longitude and latitude. Although this is a tedious step, I believe this will not be too time-consuming for one city. Besides, I could not find any datasets that display both geographical information such as zip codes and neighborhood names in Seattle, along with their average rent.

For collecting information about businesses and amenities in different neighborhoods, I will use the Foursquare location data. The data enables us to search for a specific type of venues around a given location, learn more about the specific venues such as the tips provided by Foursquare users, and explore trending venues around a given location. I will mostly use the last feature in order to figure out what type of venues one can expect the most from each neighborhood, and make recommendations accordingly.

### Example of datasets

Below is the first 3 rows of what my Seattle neighborhood + rent + longitude&latitude dataset should look like.

Neighborhood  | Average_Rent  | Average_Rent_750     | Longitude          | Latitude
------------- | ------------- | -------------------  | ------------------ | -------------
Belltown      | 2,245         | 2,359                | 47.614709015632435 | -122.34526800898152
Lake Union    | 2,146         | 2,275                | 47.64139838880749  | -122.3329762894856
Downtown      | 2,119         | 2,301                | 47.60619289008479  | -122.33253325887549

## Week 5: Implementation, analysis and write-up

### Step 1: Preparing a dataset

First of all, I will generate a dataset with different neighborhoods in Seattle and their average rent by scraping from [this website](https://www.rentcafe.com/average-rent-market-trends/us/wa/seattle/). It is different from the website that I said I would refer to in the last week's assignment - this one has more information from a lot more neighborhoods.

In [57]:
import pandas as pd
import numpy as np
import requests

from bs4 import BeautifulSoup

source = requests.get('https://www.rentcafe.com/average-rent-market-trends/us/wa/seattle/').text
soup = BeautifulSoup(source, 'html5lib')

I extract the table on the website 

In [71]:
My_table = soup.find('div', {'class': 'table-neighborhood'})
tds = My_table.findAll('td')
Average_Rent = [td.text.strip() for td in tds]
ths = My_table.findAll('th')
Neighborhoods = [th.text.strip() for th in ths[2:]]

In [51]:
Neighborhoods = []
Averate_Rent = []
Averate_Rent_750 = []

for i,td in enumerate(tds):
    if i in list(range(0,len(tds),3)):
        Neighborhoods.append(td.get('value'))
    elif i in list(range(1,len(tds),3)):
        Averate_Rent.append(td.get('value'))
    elif i in list(range(2,len(tds),3)):
        Averate_Rent_750.append(td.get('value'))

print(Neighborhoods)

['Belltown', 'Lake Union', 'Downtown', 'Lower Queen Anne', 'Atlantic', 'Wallingford', 'Interbay', 'Capitol Hill', 'Roosevelt', 'Central District']
