This Notebook has been created for the purpose of documenting and coding a solution to a problem to be identified later using location data.

# IBM Data Science Professional Certification Capstone 


# by Jeanna Stewart

## Introduction/Business Problem

As an experienced real estate investor, there are many factors that go into choosing a property to purchase. Location is a primary consideration and is unique to the individual investor. For example, in one area of Michigan a lot of investors of Class C type properties look for conviences the residents can walk to with access to an ATM, food, lottery etc. In Michigan, those types of businesses are called 'Party Stores', otherwise known as Convenience stores. This project will attempt to narrow down specific locations near party stores to explore properties to prepare a marketing campaign.

## Data

The following data sources will be used to extract data to address the business problem in locating the target real estate investment areas close to party stores in Ingham County Michigan.

Ingham County Michigan Zip Code website will be used to find all cities/zip codes in the target area.
https://www.zip-codes.com/county/mi-ingham.asp


Four Square API will be used to find location data for party stores in the Ingham County Michigan/City of Lansing area.




## Methodology



This data exploration/analysis set out to use public data to obtain geographic coordinates of cities in Ingham County Michigan via webscraping code written in Python using the BeautifulSoup libraries. The geographic location data extracted will be used in querying the Four Square API data source which houses user generated data about events/venus.

The data extracted was visualized again using Python code and the Matplot libraries. The first visualization performed allows us to see the 7 main cities in Ingham county. Although this projected is only targeting the City of Lansing, code was included to see the other 6 cities for future use and analysis. This map is the base layer for use in assiting a real estate investor in Ingham County Michigan to target locations to invest in based on the hypothesis that perpective residents of residential rental housing would prefer to live in an area within walking distance of conveniences such as fast food, party stores, grocery stores, and other conveniences. 

After the geographic data was pulled it was used to query the Four Square data for several search parameters to try to find these party stores, etc. Upon initial searches using terms such as 'convenience'and 'party', the response back from the API was that there were too few results to return and the criteria should be broadened. I ended up having to use the search term 'fast' to obtain a result set that could be analyzed futher. This was a huge set back to the analysis we set out to perform. The limitation of using Four Square is that it does not appear to be used heavily in Michigan and specifically in the smaller metro of Ingham Couty where the population is much smaller than larger metros such as Detroit and Grand Rapids. Further research on more robust API's for obtaining target business information in smaller metros needs to be explored in the future. For example, in Ingham County there is a chain of stores called Quality Dairy. These stores are only in this part of the state of Michigan and are what we previously referred to as 'party stores'. From experience, I know that finding a real estate investment near a Quality Dairy is an ideal scenario becuase they are usually in safe areas, clean, have ATM's, grocery items, lottery, liquor, snacks, etc. that make them the ideal place for residents to walk to.

Despite having to adjust our analysis from 'party stores' to 'fast food', we were able to make some progress in our data analysis. The fast food result set allowed us to perform K-mean model clustering to see which cities had the greater density of these types of businesses. Since we are tartgeting the City of Lansing we again used the Matplot Python library to visually show the locatins of fast food establishments in the area so that an investor can hone in on locations to seek out investments.




# The CODE

### Install Libraries and Packages Needed for This Project

In [None]:
## First Let's gather all Libraries and nessesary conponent for computation first
!pip install beautifulsoup4 #install beatifulsoup
!pip install lxml # install csv text
!conda install -c conda-forge geopy --yes # install geopy
!conda install -c conda-forge folium=0.5.0 --yes # install folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0           conda-forge
    geopy:         1.22.0-pyh9f0ad1d_0 conda-forge


Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

In [4]:
import pandas as pd
import requests
import lxml
from bs4 import BeautifulSoup
import numpy as np # library to handle data in a vectorized manner
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt 
# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn import metrics 
from scipy.spatial.distance import cdist 
import folium # map rendering library
import numpy as np 


print('Libraries imported.')

Libraries imported.


### Gather Public Location Data on Ingham County Cities

In [36]:
# Get Ingham County city data
# Web Scrape Data from zipcode site by using beautiful soup
req = requests.get("https://www.zip-codes.com/county/mi-ingham.asp")
soup = BeautifulSoup(req.content,'html5lib')

table = soup.find_all('table', class_='statTable')
#print(table)

#define pandas reader in string from web
df = pd.read_html(str(table))

columns = ['Zip','Classification','City','Population','Timezone', 'AreaCode']
ingham_data=pd.DataFrame(columns=columns)
ingham_data

ingham_zip


Unnamed: 0,0,1,2,3,4,5
0,ZIP Code,Classification,City,Population,Timezone,Area Code(s)
1,ZIP Code 48805,P.O. Box,Okemos,0,Eastern,517
2,ZIP Code 48819,General,Dansville,2701,Eastern,517
3,ZIP Code 48823,General,East Lansing,51302,Eastern,517
4,ZIP Code 48824,General,East Lansing,0,Eastern,517
5,ZIP Code 48825,General,East Lansing,12596,Eastern,517
6,ZIP Code 48826,P.O. Box,East Lansing,0,Eastern,517
7,ZIP Code 48840,General,Haslett,12501,Eastern,517
8,ZIP Code 48842,General,Holt,20432,Eastern,517
9,ZIP Code 48854,General,Mason,18598,Eastern,517


In [35]:


# View Data Frame created 
ingham_zip.shape



(35, 6)

In [37]:
#Remove Population 0 PO Box rows


# Filter Data for city
ingham_zip_filtered = ingham_zip[ingham_zip[3] != '0']
ingham_zip_filtered




Unnamed: 0,0,1,2,3,4,5
0,ZIP Code,Classification,City,Population,Timezone,Area Code(s)
2,ZIP Code 48819,General,Dansville,2701,Eastern,517
3,ZIP Code 48823,General,East Lansing,51302,Eastern,517
5,ZIP Code 48825,General,East Lansing,12596,Eastern,517
7,ZIP Code 48840,General,Haslett,12501,Eastern,517
8,ZIP Code 48842,General,Holt,20432,Eastern,517
9,ZIP Code 48854,General,Mason,18598,Eastern,517
10,ZIP Code 48864,General,Okemos,20148,Eastern,517
11,ZIP Code 48892,General,Webberville,4382,Eastern,517
12,ZIP Code 48895,General,Williamston,11189,Eastern,517


In [38]:
ingham_zip_filtered.shape

(18, 6)

### Add Latitude and Longitude to Area Data

In [57]:
#Data for target city of Lansing
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="jeannamstew@gmail.com")
location_lansing = geolocator.geocode("Lansing, MI",)
print(location_lansing.address)
print((location_lansing.latitude, location_lansing.longitude))

#code to obtain lat/long of other cities within Ingham county for future use
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="jeannamstew@gmail.com")
location_dansville = geolocator.geocode("Dansville, MI",)
print(location_dansville.address)
print((location_dansville.latitude, location_dansville.longitude))

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="jeannamstew@gmail.com")
location_eastlansing = geolocator.geocode("East Lansing, MI",)
print(location_eastlansing.address)
print((location_eastlansing.latitude, location_eastlansing.longitude))

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="jeannamstew@gmail.com")
location_haslett = geolocator.geocode("Haslett, MI",)
print(location_haslett.address)
print((location_haslett.latitude, location_haslett.longitude))

     
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="jeannamstew@gmail.com")
location_mason = geolocator.geocode("Mason, MI",)
print(location_mason.address)
print((location_mason.latitude, location_mason.longitude))
      
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="jeannamstew@gmail.com")
location_okemos = geolocator.geocode("Okemos, MI",)
print(location_okemos.address)
print((location_okemos.latitude, location_okemos.longitude))
      
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="jeannamstew@gmail.com")
location_webberville = geolocator.geocode("Webberville, MI",)
print(location_webberville.address)
print((location_webberville.latitude, location_webberville.longitude))

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="jeannamstew@gmail.com")
location_williamston = geolocator.geocode("Williamston, MI",)
print(location_williamston.address)
print((location_williamston.latitude, location_williamston.longitude))


#Create a data frame table of all the cities for future use

df_ingham_cities3 = pd.DataFrame({'City': ['Lansing', 'Dansville', 'East Lansing', 'Haslett','Okemos','Webberville','Williamston'],
                     'Latitude': [42.7337712, 42.555869, 42.7355416, 42.7467159,42.7220755,42.6668257,42.689151],
                    'Longitude': [-84.5553805, -84.3033013, -84.4852469, -84.4011807,-84.4275232,-84.174432,-84.2832798]}
                      )
#print (df_ingham_cities3)

Lansing, Ingham County, Michigan, United States of America
(42.7337712, -84.5553805)
Dansville, Ingham Township, Ingham County, Michigan, United States of America
(42.555869, -84.3033013)
East Lansing, Ingham County, Michigan, United States of America
(42.7355416, -84.4852469)
Haslett, Meridian Charter Township, Ingham County, Michigan, 48840, United States of America
(42.7467159, -84.4011807)
Mason County, Michigan, United States of America
(43.9778831, -86.246291)
Okemos, Meridian Charter Township, Ingham County, Michigan, 48864-1146, United States of America
(42.7220755, -84.4275232)
Webberville, Leroy Township, Ingham County, Michigan, United States of America
(42.6668257, -84.174432)
Williamston, Ingham County, Michigan, United States of America
(42.689151, -84.2832798)
           City    Latitude    Longitude
0       Lansing  42.7337712  -84.5553805
1     Dansville   42.555869  -84.3033013
2  East Lansing  42.7355416  -84.4852469
3       Haslett  42.7467159  -84.4011807
4        

## First Visualization of the City Locations in Ingham County

In [60]:
address = 'Lansing, MI'

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="jeannamstew@gmail.com")
location_lansing = geolocator.geocode("Lansing, MI",)
latitude= location_lansing.latitude
longitude= location_lansing.longitude
print('The geograpical coordinate of Lansing MI are {}, {}.'.format(latitude, longitude))

# create map of Lansing using latitude and longitude values
map_ingham = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, City in zip(df_ingham_cities3['Latitude'], df_ingham_cities3['Longitude'], df_ingham_cities3['City']):
    label = '{}'.format(City)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_ingham)  
    
map_ingham

The geograpical coordinate of Lansing MI are 42.7337712, -84.5553805.


## Foursquare API

In [39]:
CLIENT_ID = 'CQFVG4L43JRXSVKV0JSFBRB3LLTQA2CIVYJAU0YGVKLZNYYC' # your Foursquare ID
CLIENT_SECRET = 'ADWF3O0GBCAGEFVLFNAXKCXIS0OLA531LPWD5KG1LCIRA0WG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CQFVG4L43JRXSVKV0JSFBRB3LLTQA2CIVYJAU0YGVKLZNYYC
CLIENT_SECRET:ADWF3O0GBCAGEFVLFNAXKCXIS0OLA531LPWD5KG1LCIRA0WG


In [62]:
df_ingham_cities3.loc[0]

City         Lansing
Latitude     42.7338
Longitude   -84.5554
Name: 0, dtype: object

In [171]:
# code to pull Lansing data with search criteria 'Fast' from Four Square
city_latitude = df_ingham_cities3.loc[0, 'Latitude'] # Lansing latitude value
city_longitude = df_ingham_cities3.loc[0, 'Longitude'] # Lansing longitude value

city_name = df_ingham_cities3.loc[0, 'City'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(city_name, 
                                                               city_latitude, 
                                                               city_longitude))

radius = 10000 # define radius
LIMIT = 200 # limit of number of venues returned by Foursquare API
search_query = 'Fast'
radius = 10000
print(search_query + ' .... OK!')


url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    city_latitude, 
    city_longitude, 
    radius, 
    LIMIT,
    search_query)
url


Latitude and longitude values of Lansing are 42.7337712, -84.5553805.
Fast .... OK!


'https://api.foursquare.com/v2/venues/explore?&client_id=CQFVG4L43JRXSVKV0JSFBRB3LLTQA2CIVYJAU0YGVKLZNYYC&client_secret=ADWF3O0GBCAGEFVLFNAXKCXIS0OLA531LPWD5KG1LCIRA0WG&v=20180605&ll=42.7337712,-84.5553805&radius=10000&limit=200&query=Fast'

Get requests and examine results

In [172]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ed80424b1cac0001b62cb3c'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Lansing',
  'headerFullLocation': 'Lansing',
  'headerLocationGranularity': 'city',
  'query': 'fast',
  'totalResults': 118,
  'suggestedBounds': {'ne': {'lat': 42.82377129000009,
    'lng': -84.43307928027484},
   'sw': {'lat': 42.64377110999991, 'lng': -84.67768171972516}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b59e3f7f964a520ab9e28e3',
       'name': 'Olympic Broil',
       'location': {'address': '1320 N Grand River Ave',
        'lat': 42.749413615398836,
        'lng': -84.55466603032504,
        'labeledLatLngs':

In [173]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Clean json and structure in pd df

In [164]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()




Unnamed: 0,name,categories,lat,lng
0,Olympic Broil,Fast Food Restaurant,42.749414,-84.554666
1,Zoup!,Soup Place,42.731927,-84.552346
2,Dairy Queen,Fast Food Restaurant,42.740497,-84.593521
3,Culver's,Fast Food Restaurant,42.764863,-84.518232
4,SUBWAY,Fast Food Restaurant,42.712147,-84.538584


In [165]:
nearby_venues.name


0                        Olympic Broil
1                                Zoup!
2                          Dairy Queen
3                             Culver's
4                               SUBWAY
5                          Dairy Queen
6                         Jimmy John's
7                             Taco 911
8                    Rally's Hamburger
9                               SUBWAY
10                        Fish & Chips
11                              SUBWAY
12              Big John Steak & Onion
13                              SUBWAY
14                              Arby's
15                           Taco Bell
16                             Rally's
17                              SUBWAY
18                             Wendy’s
19                     Los Tres Amigos
20                       Bangkok House
21                     Fleetwood Diner
22                             Blimpie
23    DeLuca's Restaurant and Pizzeria
24                              Arby's
25                       

## Visualize Data On Map Where Fast Food/Conveniences are Located

In [174]:
venues2_map = folium.Map(location=[latitude, longitude], zoom_start=12) # generate map centred around Lansing

# add a red circle marker to represent lansing
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Lansing',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues2_map)

# add the fast food as blue circle markers
for lat, lng, label in zip(nearby_venues.lat, nearby_venues.lng, nearby_venues.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues2_map)

# display map
venues2_map

## Clustering and Analysis

In [75]:
print(ingham_venues.shape)
ingham_venues.head()

(140, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lansing,42.733771,-84.55538,Weston's Kewpee Sandwich,42.73281,-84.55274,Burger Joint
1,Lansing,42.733771,-84.55538,Peanut Shop,42.732883,-84.552206,Snack Place
2,Lansing,42.733771,-84.55538,BIGGBY COFFEE,42.734663,-84.553379,Coffee Shop
3,Lansing,42.733771,-84.55538,FLEXCity Fitness,42.734392,-84.552275,Gym
4,Lansing,42.733771,-84.55538,Tavern & Tap,42.733489,-84.551934,Restaurant


In [76]:
# check venues per Neighborhood
ingham_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dansville,5,5,5,5,5,5
East Lansing,43,43,43,43,43,43
Haslett,8,8,8,8,8,8
Lansing,37,37,37,37,37,37
Okemos,18,18,18,18,18,18
Webberville,9,9,9,9,9,9
Williamston,20,20,20,20,20,20


In [77]:
#unique categories from above result set
print('There are {} uniques categories.'.format(len(ingham_venues['Venue Category'].unique())))

There are 70 uniques categories.


Analyze Lansing

In [79]:
# one hot encoding
ingham_onehot = pd.get_dummies(ingham_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ingham_onehot['Neighborhood'] = ingham_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ingham_onehot.columns[-1]] + list(ingham_onehot.columns[:-1])
ingham_onehot = ingham_onehot[fixed_columns]

ingham_onehot

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Bagel Shop,Bakery,Bank,Bar,Beer Bar,Beer Garden,Bike Shop,Bistro,Bookstore,Burger Joint,Clothing Store,Coffee Shop,Comic Shop,Convenience Store,Creperie,Department Store,Dessert Shop,Diner,Dry Cleaner,Electronics Store,Farmers Market,Fast Food Restaurant,Food,Frozen Yogurt Shop,Gas Station,Gluten-free Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hookah Bar,Ice Cream Shop,Indie Movie Theater,Insurance Office,Intersection,Irish Pub,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Liquor Store,Lounge,Martial Arts Dojo,Massage Studio,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,New American Restaurant,Nightclub,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Restaurant,Salon / Barbershop,Sandwich Place,Sculpture Garden,Snack Place,Soup Place,Supplement Shop,Sushi Restaurant,Thai Restaurant,Theater,Wings Joint,Yoga Studio
0,Lansing,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Lansing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,Lansing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Lansing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Lansing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
5,Lansing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
6,Lansing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,Lansing,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Lansing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Lansing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Group data 

In [80]:
ingham_grouped = ingham_onehot.groupby('Neighborhood').mean().reset_index()
ingham_grouped

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Bagel Shop,Bakery,Bank,Bar,Beer Bar,Beer Garden,Bike Shop,Bistro,Bookstore,Burger Joint,Clothing Store,Coffee Shop,Comic Shop,Convenience Store,Creperie,Department Store,Dessert Shop,Diner,Dry Cleaner,Electronics Store,Farmers Market,Fast Food Restaurant,Food,Frozen Yogurt Shop,Gas Station,Gluten-free Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hookah Bar,Ice Cream Shop,Indie Movie Theater,Insurance Office,Intersection,Irish Pub,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Liquor Store,Lounge,Martial Arts Dojo,Massage Studio,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,New American Restaurant,Nightclub,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Restaurant,Salon / Barbershop,Sandwich Place,Sculpture Garden,Snack Place,Soup Place,Supplement Shop,Sushi Restaurant,Thai Restaurant,Theater,Wings Joint,Yoga Studio
0,Dansville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,East Lansing,0.069767,0.023256,0.023256,0.0,0.0,0.0,0.0,0.116279,0.023256,0.0,0.0,0.023256,0.023256,0.0,0.023256,0.069767,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.046512,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.046512,0.023256,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.023256,0.046512,0.023256,0.0,0.023256,0.023256,0.023256,0.023256,0.046512,0.0,0.0,0.0,0.023256,0.023256,0.023256,0.0,0.023256,0.0
2,Haslett,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25
3,Lansing,0.0,0.0,0.027027,0.0,0.027027,0.054054,0.054054,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.081081,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.0,0.054054,0.0,0.0,0.027027,0.0,0.0,0.027027,0.054054,0.054054,0.027027,0.027027,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.081081,0.027027,0.027027,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0
4,Okemos,0.0,0.0,0.0,0.0,0.0,0.111111,0.166667,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Webberville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.222222,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Williamston,0.15,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.05,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.1,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0


In [114]:
#Get top 5 most common
num_top_venues = 5

for hood in ingham_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = ingham_grouped[ingham_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Dansville----
               venue  freq
0     Ice Cream Shop   0.2
1  Convenience Store   0.2
2        Coffee Shop   0.2
3                Bar   0.2
4               Food   0.2


----East Lansing----
                 venue  freq
0                  Bar  0.12
1  American Restaurant  0.07
2          Coffee Shop  0.07
3       Sandwich Place  0.05
4         Noodle House  0.05


----Haslett----
              venue  freq
0       Yoga Studio  0.25
1      Intersection  0.25
2  Department Store  0.12
3    Hardware Store  0.12
4          Pharmacy  0.12


----Lansing----
                venue  freq
0         Coffee Shop  0.08
1      Sandwich Place  0.08
2                Bank  0.05
3  Mexican Restaurant  0.05
4           Juice Bar  0.05


----Okemos----
            venue  freq
0            Bank  0.17
1  Sandwich Place  0.11
2          Bakery  0.11
3     Coffee Shop  0.11
4       Pet Store  0.06


----Webberville----
                  venue  freq
0                 Diner  0.22
1        Ice Cream S

Put Foursquare data into df

In [103]:
#sort desc 
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
#df with top 10 venues
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = ingham_grouped['Neighborhood']

for ind in np.arange(ingham_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ingham_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dansville,Ice Cream Shop,Bar,Coffee Shop,Food,Convenience Store,Grocery Store,Gluten-free Restaurant,Gas Station,Frozen Yogurt Shop,Gym
1,East Lansing,Bar,American Restaurant,Coffee Shop,Korean Restaurant,Noodle House,Ice Cream Shop,Sandwich Place,Art Gallery,Farmers Market,Irish Pub
2,Haslett,Yoga Studio,Intersection,Department Store,Diner,Pharmacy,Hardware Store,Gas Station,Frozen Yogurt Shop,Food,Convenience Store
3,Lansing,Coffee Shop,Sandwich Place,Middle Eastern Restaurant,Mexican Restaurant,Bank,Bakery,Juice Bar,Lounge,Men's Store,Gym
4,Okemos,Bank,Coffee Shop,Sandwich Place,Bakery,Jewelry Store,Japanese Restaurant,Pet Store,Pharmacy,Bike Shop,Electronics Store


## Results

The results of this project produced some high level data that an investor could use only in conjunction with another visual source such as Zillow, or an MLS software package. The results of this analysis really only shows on the map which general areas of Lansing have the business establishments where the investor should look for property. It would be more useful if a paid API could be used to add an additional layer to the visualizations produced using Matplo

## Discussion

As pointed out in the methodology section, the Four Square API is not the best source of data for a smaller tier metro/market. For this project, it would have been more appropriate to use a paid real estate specific data source to conduct more robost analysis.

## Conclusion

In conclusion, based on the data obtained through this methodology it is recommended to continue further research and expansion of this code and project. Since this project was for a class assignment there was no budget for use of enhanced data sets/API's that could make the data more useful in solving the business problem. Use of paid real estate API's such as Estated, Zillow, CoStar, and other real estate specific sources combined with the methodology here could produce an application or source that could potentially be income producing when sold to other investors.