# Data sources to answer what neighborhood is the most 'Minnesota Nice'?

## The hypothetical data science firm of MCG has researched the 'Minnesota Nice' business problem and determined that a variety of data needs to be gathered. In particular, geolocation data will be critical from the Foursquare API.

### A more detailed description of data requirements follows:

* Neighborhood names along with census data from the American Community Survey will be pulled from the Minnesota open data website:
 * https://www.mncompass.org/profiles/neighborhoods/minneapolis-saint-paul#!community-areas 
* Neighborhood names will be associated with central latitute/longitude coordinates using the the methods described in the StackOverflow post:
 * https://stackoverflow.com/questions/44616592/search-google-geocoding-api-by-neighborhood
   * This will use the Google API searching for a combination of Neighborhood + City and then pulling the lat-long coordinates.
   
* Foursquare data will be obtained similar to the Toronto neighborhood analysis. We plan to look at restaurants, parks, schools, and spiritual centers.
 * https://developer.foursquare.com/docs/resources/categories 
 
* Walk scores for the neighborhoods will be obtained from the 'Walk Score' API:
 * https://www.walkscore.com/professional/api.php  

## First we import a couple of useful packages

In [1]:
import pandas as pd
import numpy as np
import googlemaps
import requests
import urllib
import uszipcode

## Now I import a couple of .csv files that were pulled from the mncompass.org website. We'll combine and pull just the neighborhood names.

In [2]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

In [3]:
# The code was removed by Watson Studio for sharing.

In [4]:
body = client_a28f8de00eed48e5bb907b36c94b68c9.get_object(Bucket='minnesotanice-donotdelete-pr-m1b1j2ihuwlryd',Key='MSP Neighborhoods_2010.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body, skiprows = 1)

body = client_a28f8de00eed48e5bb907b36c94b68c9.get_object(Bucket='minnesotanice-donotdelete-pr-m1b1j2ihuwlryd',Key='MSP Neighborhoods_2013-2017.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_2 = pd.read_csv(body, skiprows = 1)

df1 = df_data_1[['geography', 'City']]
df2 = df_data_2[['geography', 'City']]

TwinCityHoods = df1.append(df2)

### Just quick sanity check on the import

In [5]:
TwinCityHoods.head()

Unnamed: 0,geography,City
0,Mid-City Industrial,Minneapolis
1,University of Minnesota,Minneapolis
2,Northeast Park,Minneapolis
3,Beltrami,Minneapolis
4,Downtown East,Minneapolis


In [6]:
TwinCityHoods.shape

(113, 2)

### We know there are 102 neighborhoods from the website listing so let's drop any duplicates.

In [7]:
TwinCityHoods = TwinCityHoods.drop_duplicates().dropna()

### Now since we need lat-longs, we'll make a list of the neighborhoods we want to search for on the google API.

In [8]:
TwinCityHoods['neighborhood'] = TwinCityHoods.geography + ", " + TwinCityHoods.City

In [9]:
TwinCityHoods.head()

Unnamed: 0,geography,City,neighborhood
0,Mid-City Industrial,Minneapolis,"Mid-City Industrial, Minneapolis"
1,University of Minnesota,Minneapolis,"University of Minnesota, Minneapolis"
2,Northeast Park,Minneapolis,"Northeast Park, Minneapolis"
3,Beltrami,Minneapolis,"Beltrami, Minneapolis"
4,Downtown East,Minneapolis,"Downtown East, Minneapolis"


In [10]:
NeighborhoodList = TwinCityHoods[['neighborhood']]


In [11]:
# The code was removed by Watson Studio for sharing.

In [12]:
def geocode_address_lat(loc):
    geocode_result = gmaps.geocode(loc)
    lat = geocode_result[0]["geometry"]["location"]["lat"]
    return(lat)
    
def geocode_address_lon(loc):
    geocode_result = gmaps.geocode(loc)
    lon = geocode_result[0]["geometry"]["location"]["lng"]
    return(lon)


In [13]:
NeighborhoodList['latitude'] = NeighborhoodList['neighborhood'].apply(geocode_address_lat);
NeighborhoodList['longitude'] = NeighborhoodList['neighborhood'].apply(geocode_address_lon);


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [14]:
NeighborhoodList.head()

Unnamed: 0,neighborhood,latitude,longitude
0,"Mid-City Industrial, Minneapolis",44.998862,-93.217771
1,"University of Minnesota, Minneapolis",44.97399,-93.227728
2,"Northeast Park, Minneapolis",45.00312,-93.241263
3,"Beltrami, Minneapolis",44.994943,-93.2416
4,"Downtown East, Minneapolis",44.975911,-93.254587


## To use the walkscore API, we also need an address to associate with the lat-long. We'll now do the reverse geocode to associate a human recognizable address.

In [15]:
# The code was removed by Watson Studio for sharing.

In [16]:
def geocode_address(loc):
    geocode_result = gmaps.geocode(loc)
    address = geocode_result[0]['formatted_address']
    return(address)

In [17]:
NeighborhoodList['lat-lon'] = NeighborhoodList.latitude.map(str) + "," + NeighborhoodList.longitude.map(str);
NeighborhoodList['address'] = NeighborhoodList['lat-lon'].apply(geocode_address);

In [18]:
NeighborhoodList.head()

Unnamed: 0,neighborhood,latitude,longitude,lat-lon,address
0,"Mid-City Industrial, Minneapolis",44.998862,-93.217771,"44.9988622,-93.2177712","Broadway St NE & Hoover St, Minneapolis, MN 55..."
1,"University of Minnesota, Minneapolis",44.97399,-93.227728,"44.97399,-93.2277285","Oak St SE & Washington Ave SE, Minneapolis, MN..."
2,"Northeast Park, Minneapolis",45.00312,-93.241263,"45.0031203,-93.2412634","1653 Fillmore St NE, Minneapolis, MN 55413, USA"
3,"Beltrami, Minneapolis",44.994943,-93.2416,"44.994943,-93.2415998","453 Fillmore St NE, Minneapolis, MN 55413, USA"
4,"Downtown East, Minneapolis",44.975911,-93.254587,"44.9759107,-93.25458719999999","1001 S Washington Ave, Minneapolis, MN 55415, USA"


In [19]:
def walkscore(address, latitude, longitude, walk_key = walk_key):
    walk_base_url = 'http://api.walkscore.com/score'
    w_url = walk_base_url + '?' + urllib.parse.urlencode({
        'format': 'json',
        'address': str(address),
        'lat': str(latitude),
        'lon': str(longitude),
        'wsapikey': walk_key,
        'transit': 1
    })
    results=requests.get(w_url).json()
    return(results['walkscore'])

In [20]:
NeighborhoodList['walkscore'] = NeighborhoodList.apply(lambda x: walkscore(x.address, x.latitude, x.longitude), axis = 1);

In [21]:
NeighborhoodList.head()

Unnamed: 0,neighborhood,latitude,longitude,lat-lon,address,walkscore
0,"Mid-City Industrial, Minneapolis",44.998862,-93.217771,"44.9988622,-93.2177712","Broadway St NE & Hoover St, Minneapolis, MN 55...",34
1,"University of Minnesota, Minneapolis",44.97399,-93.227728,"44.97399,-93.2277285","Oak St SE & Washington Ave SE, Minneapolis, MN...",77
2,"Northeast Park, Minneapolis",45.00312,-93.241263,"45.0031203,-93.2412634","1653 Fillmore St NE, Minneapolis, MN 55413, USA",62
3,"Beltrami, Minneapolis",44.994943,-93.2416,"44.994943,-93.2415998","453 Fillmore St NE, Minneapolis, MN 55413, USA",59
4,"Downtown East, Minneapolis",44.975911,-93.254587,"44.9759107,-93.25458719999999","1001 S Washington Ave, Minneapolis, MN 55415, USA",90


# MIKE THIS IS WHERE YOU LEFT OFF, gotta figure out what you want to pull from the uszipcode database

In [40]:
from uszipcode import Zipcode
from uszipcode import SearchEngine
search = SearchEngine(simple_zipcode=False)
result = search.by_coordinates(39.122229, -77.133578, radius=3, returns=1)

Start downloading data for rich info zipcode database, total size 450+MB ...
  10 MB finished ...
  20 MB finished ...
  30 MB finished ...
  40 MB finished ...
  50 MB finished ...
  60 MB finished ...
  70 MB finished ...
  80 MB finished ...
  90 MB finished ...
  100 MB finished ...
  110 MB finished ...
  120 MB finished ...
  130 MB finished ...
  140 MB finished ...
  150 MB finished ...
  160 MB finished ...
  170 MB finished ...
  180 MB finished ...
  190 MB finished ...
  200 MB finished ...
  210 MB finished ...
  220 MB finished ...
  230 MB finished ...
  240 MB finished ...
  250 MB finished ...
  260 MB finished ...
  270 MB finished ...
  280 MB finished ...
  290 MB finished ...
  300 MB finished ...
  310 MB finished ...
  320 MB finished ...
  330 MB finished ...
  340 MB finished ...
  350 MB finished ...
  360 MB finished ...
  370 MB finished ...
  380 MB finished ...
  390 MB finished ...
  400 MB finished ...
  410 MB finished ...
  420 MB finished ...
  430 MB

In [51]:
result[0].year_housing_was_built

[{'key': 'Data',
  'values': [{'x': '1939 Or Earlier', 'y': 74},
   {'x': '1940s', 'y': 25},
   {'x': '1950s', 'y': 38},
   {'x': '1960s', 'y': 1099},
   {'x': '1970s', 'y': 1111},
   {'x': '1980s', 'y': 2197},
   {'x': '1990s', 'y': 238},
   {'x': '2000s', 'y': 137},
   {'x': '2010 Or Later', 'y': 10}]}]

In [None]:
def walkscore(address, latitude, longitude, walk_key = walk_key):
    walk_base_url = 'http://api.walkscore.com/score'
    w_url = walk_base_url + '?' + urllib.parse.urlencode({
        'format': 'json',
        'address': str(address),
        'lat': str(latitude),
        'lon': str(longitude),
        'wsapikey': walk_key,
        'transit': 1
    })
    results=requests.get(w_url).json()
    return(results['walkscore'])