# Capstone Project: Relocating to Chittenden County, Vermont

### Author M.Yu

### 1. Discussion/Background

**CommunityDataHelp** is a volunteer-based non-profit group to promote the use of data science to help 
community members in variety of issues. Lately, a client contacted us. The client's family 
wanted to relocate to Chittenden County, Vermont. 

After a long discussion with our client, we gathered the client's requirements as follows:
1. Need to buy a house, but do not want to pay real estate agent fees.
2. Price range \$350,000 to \$450,000.
3. Need to have 3 or 4 bedrooms.
4. Need to have minimum 2 baths.
5. Need to have a reputable supermarket or grocery store within an mile from the candidate location.
6. Need to have a reputable pizza restraunt within an mile from the candidate location.

Now we have client's requirement list. Let's address how to resolve them.

***First***, we searched online websites, and found that there exists a **For-Sale-Buy-Owner** 
website https://www.picketfencepreview.com/, where home sellers advertise their home 
with property location, address, price, and features including the number of bedrooms and baths. 
Potential buyers can contact sellers directly without a real estate agent being involved. 
This way, both buys and sellers do not need to pay any real estate agent fees. 

***Second***, we found that https://www.picketfencepreview.com has an extensive sale-list coverage
for Vermont counties including **Chitteden**: 
https://www.picketfencepreview.com/buy-a-home/homes-for-sale-by-county/state/VT/county/Chittenden, 
where we can use BeautifulSoup to scrape address, price, address, baths information. 
Then we will perform query to obtain the list of home-for-sale which meet client's requirement
for price, bedrooms, baths requirements.
              
***Third***, we shall obtain geographical coordinate location data of each home-for-sale in 
Chittenden county. Then we will obtain Foursquare location data for nearby venues and 
their corresponding ratings and tips to determine the venue quality.

***Finaly***, based on all above, we shall pick a neighborhood that meets our client's requirements.


**Important Note**: 
For privacy concerns, contact information is not scraped for this project.

### 2. Data Section

In this section, we shall gather data as follows:
##### 1. Home-for-sale data (address, price, bedrooms, baths) in Chittenden Count, Vermont from website
https://www.picketfencepreview.com/buy-a-home/homes-for-sale-by-county/state/VT/county/Chittenden
##### 2. Obtain geo-location data (latitude and longitude) for each house for sale.
##### 3. Using Foursquare API (https://developer.foursquare.com/) to obtain Foursquare data of nearby venues for name, category, location, and ratings etc.

First let's import all neccessary libraries.

In [1]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup as BS
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from geopy.geocoders import Nominatim
import folium
from folium import IFrame
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
print('libraries imported')

libraries imported


##### 1, get Home-for-sale data (address, price, bedrooms, baths) in Chittenden Count, Vermont 
from website https://www.picketfencepreview.com/buy-a-home/homes-for-sale-by-county/state/VT/county/Chittenden

Define url, request, and create a soup object, echo request status

In [2]:
url = 'https://www.picketfencepreview.com/buy-a-home/homes-for-sale-by-county/state/vt/county/chittenden'
rq = requests.get(url)
soup = BS (rq.text, 'html5lib')
rq.status_code

200

Obtain home-sale href link for each home

In [3]:
# extract Chittenden County home list from soup object
pd.options.display.max_colwidth = 100
l =[]
for div1 in soup.find_all('div', class_='content'):
     for div2 in div1.find_all('div', class_='listing-list'):
            for link in div2.find_all('a'):
                if link.has_attr('href'):
                    l.append(link['href'])
                  
l

['/buy-a-home/view-property/id/51-Cumberland-Road-Burlington-VT-05408-Chittenden',
 '/buy-a-home/view-property/id/51-Cumberland-Road-Burlington-VT-05408-Chittenden',
 '/buy-a-home/view-property/id/74-Overlake-Park-Burlington-VT-05401-Chittenden',
 '/buy-a-home/view-property/id/74-Overlake-Park-Burlington-VT-05401-Chittenden',
 '/buy-a-home/view-property/id/558-West-Lakeshore-Drive-Colchester-VT-05446-Chittenden',
 '/buy-a-home/view-property/id/558-West-Lakeshore-Drive-Colchester-VT-05446-Chittenden',
 '/buy-a-home/view-property/id/208-Whitecap-Rd-Colchester-VT-05446-Chittenden',
 '/buy-a-home/view-property/id/208-Whitecap-Rd-Colchester-VT-05446-Chittenden',
 '/buy-a-home/view-property/id/7-Saxonhollow-Drive-Essex-VT-05452-Chittenden',
 '/buy-a-home/view-property/id/7-Saxonhollow-Drive-Essex-VT-05452-Chittenden',
 '/buy-a-home/view-property/id/7-Wilkinson-Dr-Essex-Junction-VT-05452-Chittenden',
 '/buy-a-home/view-property/id/7-Wilkinson-Dr-Essex-Junction-VT-05452-Chittenden',
 '/buy-a-h

Append above href link to home root page https://www.picketfencepreview.com

In [4]:
homepg ='https://www.picketfencepreview.com'
for i in range(len(l)):
    l[i]=homepg+l[i]
l

['https://www.picketfencepreview.com/buy-a-home/view-property/id/51-Cumberland-Road-Burlington-VT-05408-Chittenden',
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/51-Cumberland-Road-Burlington-VT-05408-Chittenden',
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/74-Overlake-Park-Burlington-VT-05401-Chittenden',
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/74-Overlake-Park-Burlington-VT-05401-Chittenden',
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/558-West-Lakeshore-Drive-Colchester-VT-05446-Chittenden',
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/558-West-Lakeshore-Drive-Colchester-VT-05446-Chittenden',
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/208-Whitecap-Rd-Colchester-VT-05446-Chittenden',
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/208-Whitecap-Rd-Colchester-VT-05446-Chittenden',
 'https://www.picketfencepreview.com/buy-a-home/view-property/

Store above link in a dataframe

In [5]:
column_names = ['link']
df = pd.DataFrame(l, columns=column_names)
df

Unnamed: 0,link
0,https://www.picketfencepreview.com/buy-a-home/view-property/id/51-Cumberland-Road-Burlington-VT-...
1,https://www.picketfencepreview.com/buy-a-home/view-property/id/51-Cumberland-Road-Burlington-VT-...
2,https://www.picketfencepreview.com/buy-a-home/view-property/id/74-Overlake-Park-Burlington-VT-05...
3,https://www.picketfencepreview.com/buy-a-home/view-property/id/74-Overlake-Park-Burlington-VT-05...
4,https://www.picketfencepreview.com/buy-a-home/view-property/id/558-West-Lakeshore-Drive-Colchest...
5,https://www.picketfencepreview.com/buy-a-home/view-property/id/558-West-Lakeshore-Drive-Colchest...
6,https://www.picketfencepreview.com/buy-a-home/view-property/id/208-Whitecap-Rd-Colchester-VT-054...
7,https://www.picketfencepreview.com/buy-a-home/view-property/id/208-Whitecap-Rd-Colchester-VT-054...
8,https://www.picketfencepreview.com/buy-a-home/view-property/id/7-Saxonhollow-Drive-Essex-VT-0545...
9,https://www.picketfencepreview.com/buy-a-home/view-property/id/7-Saxonhollow-Drive-Essex-VT-0545...


Make list consists of unique list of home sale web link

In [6]:
lsale = pd.unique(df['link'])
print(lsale)

['https://www.picketfencepreview.com/buy-a-home/view-property/id/51-Cumberland-Road-Burlington-VT-05408-Chittenden'
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/74-Overlake-Park-Burlington-VT-05401-Chittenden'
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/558-West-Lakeshore-Drive-Colchester-VT-05446-Chittenden'
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/208-Whitecap-Rd-Colchester-VT-05446-Chittenden'
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/7-Saxonhollow-Drive-Essex-VT-05452-Chittenden'
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/7-Wilkinson-Dr-Essex-Junction-VT-05452-Chittenden'
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/7-G-Raceway-Rd-Jericho-VT-05465-Chittenden'
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/192-Nashville-Road-Jericho-VT-05489-Chittenden'
 'https://www.picketfencepreview.com/buy-a-home/view-property/id/140A-VT-Route-15-Jer

Examine the home sale web link and drop the last one which has no address

In [7]:
lsale=lsale[:-1]
lsale

array(['https://www.picketfencepreview.com/buy-a-home/view-property/id/51-Cumberland-Road-Burlington-VT-05408-Chittenden',
       'https://www.picketfencepreview.com/buy-a-home/view-property/id/74-Overlake-Park-Burlington-VT-05401-Chittenden',
       'https://www.picketfencepreview.com/buy-a-home/view-property/id/558-West-Lakeshore-Drive-Colchester-VT-05446-Chittenden',
       'https://www.picketfencepreview.com/buy-a-home/view-property/id/208-Whitecap-Rd-Colchester-VT-05446-Chittenden',
       'https://www.picketfencepreview.com/buy-a-home/view-property/id/7-Saxonhollow-Drive-Essex-VT-05452-Chittenden',
       'https://www.picketfencepreview.com/buy-a-home/view-property/id/7-Wilkinson-Dr-Essex-Junction-VT-05452-Chittenden',
       'https://www.picketfencepreview.com/buy-a-home/view-property/id/7-G-Raceway-Rd-Jericho-VT-05465-Chittenden',
       'https://www.picketfencepreview.com/buy-a-home/view-property/id/192-Nashville-Road-Jericho-VT-05489-Chittenden',
       'https://www.picketfen

Extract address, price, bedrooms, baths from above links and store in dataframe ***house***

In [8]:
col_name = ['Neighborhood', 'Price ($)', 'Bedrooms', 'Baths']
house = pd.DataFrame(columns = col_name)

for j in range(len(lsale)): 
    rq5 = requests.get(lsale[j])
    soup5 = BS (rq5.text, 'html5lib')
    
    for div1 in soup5.find_all('div', class_='content-wide'):
        
        # get sale price
        for i in div1.find_all('div', class_='content-row gray'):
            for h in i.find('h1'):
                h = h.split('$')[1].strip()
                h = h.split(',')
                h = list(map(int, h))
                p=0
                for i in range(len(h)):
                    p=p+h[i]*np.power(1000,(len(h)-i-1))
                    
        # get house-for-sale address, and the number of bedrooms and baths          
        for i in div1.find_all('div', class_='content-row'):
            
            # get house-for-sale address
            for n in i.find_all('div', class_='address'):
                n=n.text.strip()
                n= n.split('\n')[1].strip()
                
            # get the number of bedrooms and baths
            for m in i.find_all('ul', class_='features-list'):
                l = [o.strip() for o in m.text.split('\n')]
                bd = l[l.index('Bedrooms')+1]
                br = l[l.index('Baths')+1]
                
    house.at[j, 'Neighborhood'] = n            
    house.at[j, 'Price ($)'] = p
    house.at[j, 'Bedrooms'] = bd
    house.at[j, 'Baths'] = br


house

Unnamed: 0,Neighborhood,Price ($),Bedrooms,Baths
0,"51 Cumberland Road Burlington, VT",319900,3,2.5
1,"74 Overlake Park Burlington, VT",1400000,5,4.0
2,"558 West Lakeshore Drive Colchester, VT",408000,2,1.0
3,"208 Whitecap Rd Colchester, VT",469800,4,2.0
4,"7 Saxonhollow Drive Essex, VT",250000,3,2.5
5,"7 Wilkinson Dr Essex Junction, VT",410000,4,2.5
6,"7 G Raceway Rd Jericho, VT",197900,2,1.5
7,"192 Nashville Road Jericho, VT",295000,3,1.5
8,"140A VT Route 15 Jericho, VT",433000,5,3.5
9,"65 Bear Trap Road Milton, VT",449900,3,2.5


Replace non-standard address 'VT Route 15' with standard address 'VT-15'

In [9]:
house = house.copy()     # make a copy to suppress warning message before string replacement operation
house['Neighborhood'] = house['Neighborhood'].str.replace('VT Route 15', 'VT-15')
house

Unnamed: 0,Neighborhood,Price ($),Bedrooms,Baths
0,"51 Cumberland Road Burlington, VT",319900,3,2.5
1,"74 Overlake Park Burlington, VT",1400000,5,4.0
2,"558 West Lakeshore Drive Colchester, VT",408000,2,1.0
3,"208 Whitecap Rd Colchester, VT",469800,4,2.0
4,"7 Saxonhollow Drive Essex, VT",250000,3,2.5
5,"7 Wilkinson Dr Essex Junction, VT",410000,4,2.5
6,"7 G Raceway Rd Jericho, VT",197900,2,1.5
7,"192 Nashville Road Jericho, VT",295000,3,1.5
8,"140A VT-15 Jericho, VT",433000,5,3.5
9,"65 Bear Trap Road Milton, VT",449900,3,2.5


##### 2. Obtain geo-location data (latitude and longitude) for each house for sale.

Append latitude and longitude of each house to the dataframe

In [10]:
m = 0

for i in house['Neighborhood']:
    address = i
    geolocator = Nominatim(user_agent="vt_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
      
    # print('The geograpical coordinate of', address, 'are {}, {}.'.format(latitude, longitude))
    house.at[m, 'Latitude'] = latitude
    house.at[m, 'Longitude'] = longitude
    m = m+1
    
house

Unnamed: 0,Neighborhood,Price ($),Bedrooms,Baths,Latitude,Longitude
0,"51 Cumberland Road Burlington, VT",319900,3,2.5,44.506136,-73.264664
1,"74 Overlake Park Burlington, VT",1400000,5,4.0,44.467523,-73.201929
2,"558 West Lakeshore Drive Colchester, VT",408000,2,1.0,44.549261,-73.224043
3,"208 Whitecap Rd Colchester, VT",469800,4,2.0,44.554249,-73.296052
4,"7 Saxonhollow Drive Essex, VT",250000,3,2.5,44.489599,-73.053225
5,"7 Wilkinson Dr Essex Junction, VT",410000,4,2.5,44.492927,-73.129137
6,"7 G Raceway Rd Jericho, VT",197900,2,1.5,44.511314,-72.968638
7,"192 Nashville Road Jericho, VT",295000,3,1.5,44.451174,-72.937363
8,"140A VT-15 Jericho, VT",433000,5,3.5,44.509031,-72.982474
9,"65 Bear Trap Road Milton, VT",449900,3,2.5,44.631695,-73.174255


Get the geograpical coordinates of **Chittenden County, Vermont**

In [11]:
address = 'Chittenden County, Vermont'
geolocator = Nominatim(user_agent="vt_explorer")
location = geolocator.geocode(address)
cclat = location.latitude
cclng = location.longitude
print('The geograpical coordinate of', address, 'are {}, {}.'.format(cclat, cclng))

The geograpical coordinate of Chittenden County, Vermont are 44.4531756, -73.0673673.


Create a house-for-sale map using latitude and longitude coordinates from house dataframe. Each blue dot below represents the location of each house for-sale-by-owner

In [12]:
# create house-for-sale map using latitude and longitude values in Chittenden County, Vermont
map_house = folium.Map(location=[cclat, cclng], zoom_start=10)

# add markers to map
for lat, lng, address in zip(house['Latitude'], house['Longitude'], house['Neighborhood']):
    label = '{}'.format(address)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_house)  
    
map_house

##### 3. Using Foursquare API (https://developer.foursquare.com/) to obtain Foursquare data of nearby venues for name, category, location, and ratings etc

First set Foursquare credentials

In [None]:
CLIENT_ID = 'XXX' # your Foursquare ID
CLIENT_SECRET = 'XXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [13]:
CLIENT_ID = 'FUENPRKFXZY4NMJGDI5PX2XEBSLM2KDOEBYVSYDVLIG1HMYM' # your Foursquare ID
CLIENT_SECRET = 'EXZQBX0NXQ3TJ0RVHZ1C5KI4ZBDW2TGAMK4MXEKBCX5K05GF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

Then define a function retrieve Fourquare json file into dataframe.
Set radius within a mile range (1600 meters)

In [14]:
def getVenues(names, latitudes, longitudes, radius=1600, LIMIT = 50):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['id']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue_Latitude', 
                  'Venue_Longitude', 
                  'Venue_Category',
                  'Venue_ID']
    
    return(nearby_venues)

Get venues for each neighborhood. Take a look at first five venues

In [15]:
venues = getVenues(names=house['Neighborhood'],
                    latitudes=house['Latitude'],
                    longitudes=house['Longitude'])
                                
venues.head()

51 Cumberland Road Burlington, VT
74 Overlake Park Burlington, VT
558 West Lakeshore Drive Colchester, VT
208 Whitecap Rd Colchester, VT
7 Saxonhollow Drive Essex, VT
7 Wilkinson Dr Essex Junction, VT
7 G Raceway Rd Jericho, VT
192 Nashville Road Jericho, VT
140A VT-15 Jericho, VT
65 Bear Trap Road Milton, VT
420 Thompson Rd Shelburne, VT
113 Covington Lane Shelburne, VT
444 Poker Hill Road Underhill, VT
339 Huntley Rd Westford, VT


Unnamed: 0,Neighborhood,Latitude,Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category,Venue_ID
0,"51 Cumberland Road Burlington, VT",44.506136,-73.264664,ArtsRiot Food Truck Rally,44.50844,-73.265353,Food Truck,51bb8f01498ee2e3f0a43dff
1,"51 Cumberland Road Burlington, VT",44.506136,-73.264664,Starr Farm Dog Park,44.512719,-73.267168,Dog Run,4beec1212c082d7f99cc3042
2,"51 Cumberland Road Burlington, VT",44.506136,-73.264664,Leddy Park,44.503319,-73.251482,Beach,4c17bbbe834e2d7fbdce2780
3,"51 Cumberland Road Burlington, VT",44.506136,-73.264664,Bessery's Quality Market,44.512613,-73.251729,Butcher,51880051498ef49c9967ba36
4,"51 Cumberland Road Burlington, VT",44.506136,-73.264664,Snap Fitness Burlington,44.507809,-73.247012,Gym,4c77e02a947ca1cde4b34837
