# Battle of Neighborhoods, Week 2: The Ongoing Restaurant Problem in The Greater Youngstown Area
### Applied Data Science Capstone by IBM/Coursera

## Table of Contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## 1. Introduction <a name="introduction"></a>

To begin, let's run any and all required imports...

In [2]:
import pandas as pd
import numpy as np
import json
from bs4 import BeautifulSoup
import requests
import io

Now we need to scrape a website and parse the HTML to get Mahoning County zip code data via BeautifulSoup.

In [3]:
url = 'http://www.ciclt.net/sn/clt/capitolimpact/gw_ziplist.aspx?FIPS=39099'
data = requests.get(url).text
soup = BeautifulSoup(data, 'html5lib')

In [4]:
table = soup.find('table')

In [5]:
column_names = ['PostalCode','City','County']
df = pd.DataFrame(columns = column_names)
ytown_data = df

for tr_cell in table.find_all('tr'):
    row_data=[]
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)==3:
        df.loc[len(df)] = row_data

We now have a complete list of the zip codes used in Mahoning County.

In [5]:
ytown_data

Unnamed: 0,PostalCode,City,County
0,44401,Berlin Center,Mahoning County
1,44405,Campbell,Mahoning County
2,44406,Canfield,Mahoning County
3,44416,Ellsworth,Mahoning County
4,44422,Greenford,Mahoning County
5,44429,Lake Milton,Mahoning County
6,44436,Lowellville,Mahoning County
7,44442,New Middletown,Mahoning County
8,44443,New Springfield,Mahoning County
9,44449,North Benton,Mahoning County


Now it's time to import the population data for Mahoning County.

In [19]:
youngstown_data_url = 'https://raw.githubusercontent.com/ernestperrin/Coursera_Capstone/main/ytowndata.csv'
content = requests.get(youngstown_data_url).content
youngstown_data = pd.read_csv(io.StringIO(content.decode('utf-8')))
youngstown_data

Unnamed: 0,PostalCode,City,County,Population
0,44401,Berlin Center,Mahoning County,2892
1,44405,Campbell,Mahoning County,8229
2,44406,Canfield,Mahoning County,22054
3,44416,Ellsworth,Mahoning County,0
4,44422,Greenford,Mahoning County,0
5,44429,Lake Milton,Mahoning County,2576
6,44436,Lowellville,Mahoning County,3836
7,44442,New Middletown,Mahoning County,4006
8,44443,New Springfield,Mahoning County,1704
9,44449,North Benton,Mahoning County,1323


### Cleaning the data

However, we can see there is both 0 population areas (which represent P.O. Boxes), as well as duplicate zip codes (which represent overlapping area borders within the zipcode), so these should be dropped from the table in order to avoid skewed data.

In [20]:
ytown_data = df.drop([3, 4, 14, 25, 26, 27, 28, 30, 33], axis=0)
ytown_data

Unnamed: 0,PostalCode,City,County
0,44401,Berlin Center,Mahoning County
1,44405,Campbell,Mahoning County
2,44406,Canfield,Mahoning County
5,44429,Lake Milton,Mahoning County
6,44436,Lowellville,Mahoning County
7,44442,New Middletown,Mahoning County
8,44443,New Springfield,Mahoning County
9,44449,North Benton,Mahoning County
10,44451,North Jackson,Mahoning County
11,44452,North Lima,Mahoning County


In [22]:
youngstown_data2_url = 'https://raw.githubusercontent.com/ernestperrin/Coursera_Capstone/main/ytowndata2.csv'
content = requests.get(youngstown_data2_url).content
youngstown_data2 = pd.read_csv(io.StringIO(content.decode('utf-8')))

We now have a much cleaner table with P.O. Box and duplicate zip codes removed!

In [23]:
youngstown_data2

Unnamed: 0,PostalCode,City,County,Population
0,44401,Berlin Center,Mahoning County,2892
1,44405,Campbell,Mahoning County,8229
2,44406,Canfield,Mahoning County,22054
3,44429,Lake Milton,Mahoning County,2576
4,44436,Lowellville,Mahoning County,3836
5,44442,New Middletown,Mahoning County,4006
6,44443,New Springfield,Mahoning County,1704
7,44449,North Benton,Mahoning County,1323
8,44451,North Jackson,Mahoning County,3032
9,44452,North Lima,Mahoning County,3167


In [24]:
def get_geocode(postal_code):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Youngstown, Ohio'.format(postal_code))
        lat_lng_coords = g.latlng
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    return latitude,longitude

There are more required imports to install mapping/visualization data. Let's run and install what's necessary...

In [13]:
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!pip install folium
import folium

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


Testing the geodata...

In [25]:
address = 'Boardman, OH'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Boardman are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Boardman are 41.024597400000005, -80.65752960822488.


In [26]:
geo_data_url = 'https://raw.githubusercontent.com/ernestperrin/Coursera_Capstone/main/ytowncoords.csv'
content = requests.get(geo_data_url).content
geo_data = pd.read_csv(io.StringIO(content.decode('utf-8')))

In [14]:
geo_data

Unnamed: 0,Postal Code,Latitude,Longitude
0,44401,41.034545,-80.928626
1,44405,41.076142,-80.593256
2,44406,41.025059,-80.760912
3,44429,41.099502,-80.970368
4,44436,41.035942,-80.536226
5,44442,40.961171,-80.557571
6,44443,40.917283,-80.606183
7,44449,40.985058,-81.012593
8,44451,41.100058,-80.857307
9,44452,40.948393,-80.658963


Using the **df.rename command** for an imminent table merge...

In [27]:
geo_data.rename(columns={'Postal Code':'PostalCode'}, inplace=True)

Using the **pd.merge command** in order to merge the two tables and compile the data into a single table.

In [28]:
youngstown_data = pd.merge(youngstown_data2,geo_data,on='PostalCode',how='left')

In [29]:
youngstown_data

Unnamed: 0,PostalCode,City,County,Population,Latitude,Longitude
0,44401,Berlin Center,Mahoning County,2892,41.034545,-80.928626
1,44405,Campbell,Mahoning County,8229,41.076142,-80.593256
2,44406,Canfield,Mahoning County,22054,41.025059,-80.760912
3,44429,Lake Milton,Mahoning County,2576,41.099502,-80.970368
4,44436,Lowellville,Mahoning County,3836,41.035942,-80.536226
5,44442,New Middletown,Mahoning County,4006,40.961171,-80.557571
6,44443,New Springfield,Mahoning County,1704,40.917283,-80.606183
7,44449,North Benton,Mahoning County,1323,40.985058,-81.012593
8,44451,North Jackson,Mahoning County,3032,41.100058,-80.857307
9,44452,North Lima,Mahoning County,3167,40.948393,-80.658963


Time to use **Folium** to visualize the geograpihc data we have so far...

In [102]:
map_youngstown = folium.Map(location=[latitude, longitude], zoom_start=10)
neighborhoods = youngstown_data

for lat, lng, city, county in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['City'], neighborhoods['County']):
    label = '{}, {}'.format(county, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_youngstown)  
    
map_youngstown

### Utilizing the **Foursquare API** to analyze Youngstown/Mahoning County...

In [32]:
CLIENT_ID = 'SAMSDRN21E01GPZCIXE5OH1JF4WMDIG4XFA3IICE33402HHL' 
CLIENT_SECRET = 'V3EELQKFP0ML5G4HMGRFMCWO2IJW5OEQLR3QMVRGN11ZAV5N'
VERSION = '20180605'
LIMIT = 100

In [33]:
youngstown_data.loc[0, 'City']

'Berlin Center'

In [34]:
city_latitude = youngstown_data.loc[0, 'Latitude']
city_longitude = youngstown_data.loc[0, 'Longitude']

city_name = youngstown_data.loc[0, 'City']

print('Latitude and longitude values of {} are {}, {}.'.format(city_name, 
                                                               city_latitude, 
                                                               city_longitude))

Latitude and longitude values of Berlin Center are 41.034545, -80.928626.


In [35]:
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    city_latitude, 
    city_longitude, 
    radius, 
    LIMIT)

In [37]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60a2763dc7b9444485bd89f3'},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 0,
  'suggestedBounds': {'ne': {'lat': 41.039045004500004,
    'lng': -80.92267144641144},
   'sw': {'lat': 41.0300449955, 'lng': -80.93458055358855}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': []}]}}

In [38]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Cleaning the json and structuring it into a pandas dataframe.

In [40]:
from pandas.io.json import json_normalize

In [42]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues)

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


KeyError: "None of [Index(['venue.name', 'venue.categories', 'venue.location.lat',\n       'venue.location.lng'],\n      dtype='object')] are in the [columns]"

In [43]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

0 venues were returned by Foursquare.


In [44]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [52]:
m_name = youngstown_data['City']
m_lat = youngstown_data['Latitude']
m_lng = youngstown_data['Longitude']
youngstown_venues = getNearbyVenues(m_name,m_lat,m_lng)

Berlin Center
Campbell
Canfield
Lake Milton
Lowellville
New Middletown
New Springfield
North Benton
North Jackson
North Lima
Petersburg
Struthers
Youngstown
Youngstown
Youngstown
Youngstown
Youngstown
Youngstown
Youngstown
Youngstown
Youngstown
Boardman
Poland
Austintown
Beloit
Sebring


## Analysis <a name="analysis"></a>

Time to analyze the data from Foursquare's API. Running the shape, we can see there are 168 total venues with Foursquare data located in Mahoning County.

In [53]:
print(youngstown_venues.shape)
youngstown_venues.head()

(168, 7)


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Campbell,41.076142,-80.593256,Clarencedale Cake,41.075808,-80.595431,Bakery
1,Campbell,41.076142,-80.593256,Waffle wagon,41.077684,-80.595213,Food Truck
2,Campbell,41.076142,-80.593256,Nightly Drinks,41.07917,-80.591819,Bagel Shop
3,Canfield,41.025059,-80.760912,Chase Bank,41.025913,-80.76162,Bank
4,Canfield,41.025059,-80.760912,Dairy Queen,41.022921,-80.760137,Ice Cream Shop


In [54]:
youngstown_venues

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Campbell,41.076142,-80.593256,Clarencedale Cake,41.075808,-80.595431,Bakery
1,Campbell,41.076142,-80.593256,Waffle wagon,41.077684,-80.595213,Food Truck
2,Campbell,41.076142,-80.593256,Nightly Drinks,41.079170,-80.591819,Bagel Shop
3,Canfield,41.025059,-80.760912,Chase Bank,41.025913,-80.761620,Bank
4,Canfield,41.025059,-80.760912,Dairy Queen,41.022921,-80.760137,Ice Cream Shop
...,...,...,...,...,...,...,...
163,Beloit,40.917414,-80.990071,Whistle Stop Pizza,40.920845,-80.993299,Pizza Place
164,Sebring,40.920003,-81.025268,American Legion Post 76,40.920515,-81.024890,Bar
165,Sebring,40.920003,-81.025268,Sebring Mansion,40.923014,-81.024544,Bed & Breakfast
166,Sebring,40.920003,-81.025268,Sebring Southside Park,40.917265,-81.021831,Park


From here, we know that not all of the venues are relevant to a potential pizza place. As stated previously, we are looking specifically for **Pizza Places**, as well as **Italian Restaurants, Sandwich Places, and Hot Dog Joints**. Time to search the Youngstown Venues table for all Pizza Places, a direct competitor for the client, as well as other locations in minor competition...

In [73]:
pizza_competition = youngstown_venues[youngstown_venues['Venue Category'].str.contains('Pizza Place')]
pizza_competition

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
5,Canfield,41.025059,-80.760912,The Original Oven Fresh Pizza,41.02543,-80.762143,Pizza Place
6,Canfield,41.025059,-80.760912,Cocca's Pizza,41.027058,-80.76156,Pizza Place
119,Boardman,41.024597,-80.65753,Little Caesars Pizza,41.024885,-80.659022,Pizza Place
142,Boardman,41.024597,-80.65753,Sbarro,41.020965,-80.660212,Pizza Place
156,Austintown,41.102896,-80.761275,Pizza Hut,41.099969,-80.758953,Pizza Place
163,Beloit,40.917414,-80.990071,Whistle Stop Pizza,40.920845,-80.993299,Pizza Place


In [77]:
italian_competition = youngstown_venues[youngstown_venues['Venue Category'].str.contains('Italian Restaurant')]
italian_competition

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
14,Lowellville,41.035942,-80.536226,Carchedis Restaurant,41.036739,-80.536285,Italian Restaurant
25,North Jackson,41.100058,-80.857307,Dino's Italian Restaurant,41.099817,-80.856055,Italian Restaurant
39,Struthers,41.054357,-80.591771,Dona Vito's,41.056493,-80.588241,Italian Restaurant
52,Youngstown,41.10024,-80.646822,Cassese's MVR,41.103234,-80.644002,Italian Restaurant
58,Youngstown,41.10024,-80.646822,Roberto's Italian Ristorante,41.100621,-80.650739,Italian Restaurant
152,Austintown,41.102896,-80.761275,Marinos Italian Restaurant,41.099172,-80.760367,Italian Restaurant


In [79]:
sandwich_competition = youngstown_venues[youngstown_venues['Venue Category'].str.contains('Sandwich Place')]
sandwich_competition

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
18,New Middletown,40.961171,-80.557571,Subway,40.963325,-80.559392,Sandwich Place
26,North Jackson,41.100058,-80.857307,SUBWAY,41.100634,-80.857522,Sandwich Place
63,Youngstown,41.10024,-80.646822,SUBWAY,41.100837,-80.649687,Sandwich Place
114,Boardman,41.024597,-80.65753,SUBWAY,41.022506,-80.658608,Sandwich Place
148,Austintown,41.102896,-80.761275,Sandwich Factory,41.100602,-80.76352,Sandwich Place
150,Austintown,41.102896,-80.761275,Jimmy John's,41.099952,-80.761715,Sandwich Place


In [80]:
hotdog_competition = youngstown_venues[youngstown_venues['Venue Category'].str.contains('Hot Dog Joint')]
hotdog_competition

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
50,Youngstown,41.10024,-80.646822,Susie's Double D's,41.101273,-80.650128,Hot Dog Joint


It would be best to **append the above tables** to easily view various competitive venues in one place.

In [84]:
competition_venues = pd.concat([pizza_competition,italian_competition,sandwich_competition,hotdog_competition])
competition_venues

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
5,Canfield,41.025059,-80.760912,The Original Oven Fresh Pizza,41.02543,-80.762143,Pizza Place
6,Canfield,41.025059,-80.760912,Cocca's Pizza,41.027058,-80.76156,Pizza Place
119,Boardman,41.024597,-80.65753,Little Caesars Pizza,41.024885,-80.659022,Pizza Place
142,Boardman,41.024597,-80.65753,Sbarro,41.020965,-80.660212,Pizza Place
156,Austintown,41.102896,-80.761275,Pizza Hut,41.099969,-80.758953,Pizza Place
163,Beloit,40.917414,-80.990071,Whistle Stop Pizza,40.920845,-80.993299,Pizza Place
14,Lowellville,41.035942,-80.536226,Carchedis Restaurant,41.036739,-80.536285,Italian Restaurant
25,North Jackson,41.100058,-80.857307,Dino's Italian Restaurant,41.099817,-80.856055,Italian Restaurant
39,Struthers,41.054357,-80.591771,Dona Vito's,41.056493,-80.588241,Italian Restaurant
52,Youngstown,41.10024,-80.646822,Cassese's MVR,41.103234,-80.644002,Italian Restaurant


**Sorting venues by city** will make the final determination of where to open a pizza place easier, as certain cities with high competition might be easy to avoid opening a business.

In [85]:
competition_venues.sort_values('City')

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
156,Austintown,41.102896,-80.761275,Pizza Hut,41.099969,-80.758953,Pizza Place
150,Austintown,41.102896,-80.761275,Jimmy John's,41.099952,-80.761715,Sandwich Place
148,Austintown,41.102896,-80.761275,Sandwich Factory,41.100602,-80.76352,Sandwich Place
152,Austintown,41.102896,-80.761275,Marinos Italian Restaurant,41.099172,-80.760367,Italian Restaurant
163,Beloit,40.917414,-80.990071,Whistle Stop Pizza,40.920845,-80.993299,Pizza Place
119,Boardman,41.024597,-80.65753,Little Caesars Pizza,41.024885,-80.659022,Pizza Place
142,Boardman,41.024597,-80.65753,Sbarro,41.020965,-80.660212,Pizza Place
114,Boardman,41.024597,-80.65753,SUBWAY,41.022506,-80.658608,Sandwich Place
5,Canfield,41.025059,-80.760912,The Original Oven Fresh Pizza,41.02543,-80.762143,Pizza Place
6,Canfield,41.025059,-80.760912,Cocca's Pizza,41.027058,-80.76156,Pizza Place


**Sorting by population** will be another step in determining if a location would be a good spot to open a new pizza place.

In [115]:
youngstown_data = youngstown_data.replace(',','', regex=True)
youngstown_data

Unnamed: 0,PostalCode,City,County,Population,Latitude,Longitude
0,44401,Berlin Center,Mahoning County,2892,41.034545,-80.928626
1,44405,Campbell,Mahoning County,8229,41.076142,-80.593256
2,44406,Canfield,Mahoning County,22054,41.025059,-80.760912
3,44429,Lake Milton,Mahoning County,2576,41.099502,-80.970368
4,44436,Lowellville,Mahoning County,3836,41.035942,-80.536226
5,44442,New Middletown,Mahoning County,4006,40.961171,-80.557571
6,44443,New Springfield,Mahoning County,1704,40.917283,-80.606183
7,44449,North Benton,Mahoning County,1323,40.985058,-81.012593
8,44451,North Jackson,Mahoning County,3032,41.100058,-80.857307
9,44452,North Lima,Mahoning County,3167,40.948393,-80.658963


In [116]:
youngstown_data.sort_values(by='Population', ascending=False)

Unnamed: 0,PostalCode,City,County,Population,Latitude,Longitude
1,44405,Campbell,Mahoning County,8229,41.076142,-80.593256
17,44507,Youngstown,Mahoning County,5863,41.074229,-80.655251
14,44504,Youngstown,Mahoning County,5262,41.124017,-80.655218
25,44672,Sebring,Mahoning County,4874,40.920003,-81.025268
5,44442,New Middletown,Mahoning County,4006,40.961171,-80.557571
24,44609,Beloit,Mahoning County,3987,40.917414,-80.990071
4,44436,Lowellville,Mahoning County,3836,41.035942,-80.536226
21,44512,Boardman,Mahoning County,34424,41.024597,-80.65753
9,44452,North Lima,Mahoning County,3167,40.948393,-80.658963
8,44451,North Jackson,Mahoning County,3032,41.100058,-80.857307


In [55]:
youngstown_venues.groupby('City').count()

Unnamed: 0_level_0,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Austintown,14,14,14,14,14,14
Beloit,2,2,2,2,2,2
Boardman,55,55,55,55,55,55
Campbell,3,3,3,3,3,3
Canfield,5,5,5,5,5,5
Lake Milton,4,4,4,4,4,4
Lowellville,6,6,6,6,6,6
New Middletown,4,4,4,4,4,4
New Springfield,2,2,2,2,2,2
North Jackson,5,5,5,5,5,5


In [56]:
print('There are {} uniques categories.'.format(len(youngstown_venues['Venue Category'].unique())))

There are 83 uniques categories.


In [57]:
youngstown_onehot = pd.get_dummies(youngstown_venues[['Venue Category']], prefix="", prefix_sep="")

youngstown_onehot['City'] = youngstown_venues['City'] 

fixed_columns = [youngstown_onehot.columns[-1]] + list(youngstown_onehot.columns[:-1])
youngstown_onehot = youngstown_onehot[fixed_columns]

youngstown_onehot.head()

Unnamed: 0,City,Accessories Store,American Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,...,Sporting Goods Shop,Supermarket,Supplement Shop,Tex-Mex Restaurant,Thrift / Vintage Store,Toy / Game Store,Video Store,Wine Bar,Wings Joint,Women's Store
0,Campbell,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Campbell,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Campbell,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Canfield,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,Canfield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [58]:
youngstown_onehot.shape

(168, 84)

In [90]:
youngstown_grouped = youngstown_onehot.groupby('City').mean().reset_index()
youngstown_grouped

Unnamed: 0,City,Accessories Store,American Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,...,Sporting Goods Shop,Supermarket,Supplement Shop,Tex-Mex Restaurant,Thrift / Vintage Store,Toy / Game Store,Video Store,Wine Bar,Wings Joint,Women's Store
0,Austintown,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.071429,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0
1,Beloit,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Boardman,0.018182,0.018182,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,...,0.036364,0.0,0.018182,0.018182,0.0,0.018182,0.018182,0.0,0.018182,0.018182
3,Campbell,0.0,0.0,0.0,0.0,0.0,0.333333,0.333333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Canfield,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.2,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Lake Milton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Lowellville,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,New Middletown,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,New Springfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,North Jackson,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [91]:
num_top_venues = 5

for hood in youngstown_grouped['City']:
    print("----"+hood+"----")
    temp = youngstown_grouped[youngstown_grouped['City'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Austintown----
                  venue  freq
0        Sandwich Place  0.14
1           Video Store  0.14
2  Fast Food Restaurant  0.07
3   American Restaurant  0.07
4    Mexican Restaurant  0.07


----Beloit----
         venue  freq
0         Park   0.5
1  Pizza Place   0.5
2   Kids Store   0.0
3    Nightclub   0.0
4        Motel   0.0


----Boardman----
                 venue  freq
0       Clothing Store  0.20
1     Department Store  0.05
2        Jewelry Store  0.05
3           Shoe Store  0.05
4  Sporting Goods Shop  0.04


----Campbell----
               venue  freq
0         Bagel Shop  0.33
1             Bakery  0.33
2         Food Truck  0.33
3  Accessories Store  0.00
4         Kids Store  0.00


----Canfield----
                 venue  freq
0          Pizza Place   0.4
1  American Restaurant   0.2
2                 Bank   0.2
3       Ice Cream Shop   0.2
4           Kids Store   0.0


----Lake Milton----
              venue  freq
0              Lake  0.25
1              Pa

In [92]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [93]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

city_venues_sorted = pd.DataFrame(columns=columns)
city_venues_sorted['City'] = youngstown_grouped['City']

for ind in np.arange(youngstown_grouped.shape[0]):
    city_venues_sorted.iloc[ind, 1:] = return_most_common_venues(youngstown_grouped.iloc[ind, :], num_top_venues)

city_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Austintown,Video Store,Sandwich Place,Mexican Restaurant,Italian Restaurant,Pharmacy,Coffee Shop,Ice Cream Shop,Supermarket,Fast Food Restaurant,Gas Station
1,Beloit,Park,Pizza Place,Discount Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
2,Boardman,Clothing Store,Shoe Store,Jewelry Store,Department Store,Lingerie Store,Kids Store,Japanese Restaurant,Pharmacy,Sporting Goods Shop,Pizza Place
3,Campbell,Food Truck,Bagel Shop,Bakery,Women's Store,Donut Shop,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
4,Canfield,Pizza Place,Ice Cream Shop,American Restaurant,Bank,Donut Shop,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner


In [94]:
city_venues_sorted

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Austintown,Video Store,Sandwich Place,Mexican Restaurant,Italian Restaurant,Pharmacy,Coffee Shop,Ice Cream Shop,Supermarket,Fast Food Restaurant,Gas Station
1,Beloit,Park,Pizza Place,Discount Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
2,Boardman,Clothing Store,Shoe Store,Jewelry Store,Department Store,Lingerie Store,Kids Store,Japanese Restaurant,Pharmacy,Sporting Goods Shop,Pizza Place
3,Campbell,Food Truck,Bagel Shop,Bakery,Women's Store,Donut Shop,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
4,Canfield,Pizza Place,Ice Cream Shop,American Restaurant,Bank,Donut Shop,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
5,Lake Milton,Park,Campground,Business Service,Lake,Coffee Shop,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store
6,Lowellville,Bar,Italian Restaurant,American Restaurant,Deli / Bodega,Lake,Women's Store,Cosmetics Shop,Credit Union,Department Store,Diner
7,New Middletown,Grocery Store,American Restaurant,Diner,Sandwich Place,Dive Bar,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store
8,New Springfield,Garden Center,Discount Store,Women's Store,Dive Bar,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
9,North Jackson,Furniture / Home Store,Home Service,Sandwich Place,Ice Cream Shop,Italian Restaurant,Fast Food Restaurant,Food Truck,Convenience Store,Cosmetics Shop,Credit Union


In [139]:
city_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

youngstown_merged = youngstown_data

youngstown_merged = youngstown_merged.join(city_venues_sorted.set_index('City'), on='City')

youngstown_merged.head()

ValueError: cannot insert Cluster Labels, already exists

In [144]:
youngstown_merged.head()

Unnamed: 0,PostalCode,City,County,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,44401,Berlin Center,Mahoning County,2892,41.034545,-80.928626,,,,,,,,,,,
1,44405,Campbell,Mahoning County,8229,41.076142,-80.593256,3.0,Food Truck,Bagel Shop,Bakery,Women's Store,Donut Shop,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
2,44406,Canfield,Mahoning County,22054,41.025059,-80.760912,1.0,Pizza Place,Ice Cream Shop,American Restaurant,Bank,Donut Shop,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
3,44429,Lake Milton,Mahoning County,2576,41.099502,-80.970368,1.0,Park,Campground,Business Service,Lake,Coffee Shop,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store
4,44436,Lowellville,Mahoning County,3836,41.035942,-80.536226,1.0,Bar,Italian Restaurant,American Restaurant,Deli / Bodega,Lake,Women's Store,Cosmetics Shop,Credit Union,Department Store,Diner


In [124]:
youngstown_merged

Unnamed: 0,PostalCode,City,County,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,44401,Berlin Center,Mahoning County,2892,41.034545,-80.928626,,,,,,,,,,,
1,44405,Campbell,Mahoning County,8229,41.076142,-80.593256,3.0,Food Truck,Bagel Shop,Bakery,Women's Store,Donut Shop,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
2,44406,Canfield,Mahoning County,22054,41.025059,-80.760912,1.0,Pizza Place,Ice Cream Shop,American Restaurant,Bank,Donut Shop,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
3,44429,Lake Milton,Mahoning County,2576,41.099502,-80.970368,1.0,Park,Campground,Business Service,Lake,Coffee Shop,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store
4,44436,Lowellville,Mahoning County,3836,41.035942,-80.536226,1.0,Bar,Italian Restaurant,American Restaurant,Deli / Bodega,Lake,Women's Store,Cosmetics Shop,Credit Union,Department Store,Diner
5,44442,New Middletown,Mahoning County,4006,40.961171,-80.557571,1.0,Grocery Store,American Restaurant,Diner,Sandwich Place,Dive Bar,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store
6,44443,New Springfield,Mahoning County,1704,40.917283,-80.606183,2.0,Garden Center,Discount Store,Women's Store,Dive Bar,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
7,44449,North Benton,Mahoning County,1323,40.985058,-81.012593,,,,,,,,,,,
8,44451,North Jackson,Mahoning County,3032,41.100058,-80.857307,1.0,Furniture / Home Store,Home Service,Sandwich Place,Ice Cream Shop,Italian Restaurant,Fast Food Restaurant,Food Truck,Convenience Store,Cosmetics Shop,Credit Union
9,44452,North Lima,Mahoning County,3167,40.948393,-80.658963,1.0,Donut Shop,American Restaurant,Hardware Store,Fast Food Restaurant,Discount Store,Women's Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store


Let's cluster those locations to create centers of zones which contain possible locations. These zones will be the final result of our analysis.

### Cluster 1

In [134]:
youngstown_merged.loc[youngstown_merged['Cluster Labels'] == 0, youngstown_merged.columns[[1] + list(range(5, youngstown_merged.shape[1]))]]

Unnamed: 0,City,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Struthers,-80.591771,0.0,Bookstore,Italian Restaurant,American Restaurant,Bowling Alley,Credit Union,Deli / Bodega,Department Store,Diner,Discount Store,Dive Bar


### Cluster 2

In [128]:
youngstown_merged.loc[youngstown_merged['Cluster Labels'] == 1, youngstown_merged.columns[[1] + list(range(5, youngstown_merged.shape[1]))]]

Unnamed: 0,City,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Canfield,-80.760912,1.0,Pizza Place,Ice Cream Shop,American Restaurant,Bank,Donut Shop,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner
3,Lake Milton,-80.970368,1.0,Park,Campground,Business Service,Lake,Coffee Shop,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store
4,Lowellville,-80.536226,1.0,Bar,Italian Restaurant,American Restaurant,Deli / Bodega,Lake,Women's Store,Cosmetics Shop,Credit Union,Department Store,Diner
5,New Middletown,-80.557571,1.0,Grocery Store,American Restaurant,Diner,Sandwich Place,Dive Bar,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store
8,North Jackson,-80.857307,1.0,Furniture / Home Store,Home Service,Sandwich Place,Ice Cream Shop,Italian Restaurant,Fast Food Restaurant,Food Truck,Convenience Store,Cosmetics Shop,Credit Union
9,North Lima,-80.658963,1.0,Donut Shop,American Restaurant,Hardware Store,Fast Food Restaurant,Discount Store,Women's Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store
10,Petersburg,-80.529573,1.0,Grocery Store,BBQ Joint,Business Service,Farm,Convenience Store,Food Truck,Food Court,Food,Fast Food Restaurant,Furniture / Home Store
12,Youngstown,-80.642058,1.0,Bar,Bank,Automotive Shop,Café,Cocktail Bar,Italian Restaurant,Burger Joint,Gourmet Shop,Gym / Fitness Center,Construction & Landscaping
13,Youngstown,-80.646822,1.0,Bar,Bank,Automotive Shop,Café,Cocktail Bar,Italian Restaurant,Burger Joint,Gourmet Shop,Gym / Fitness Center,Construction & Landscaping
14,Youngstown,-80.655218,1.0,Bar,Bank,Automotive Shop,Café,Cocktail Bar,Italian Restaurant,Burger Joint,Gourmet Shop,Gym / Fitness Center,Construction & Landscaping


### Cluster 3

In [129]:
youngstown_merged.loc[youngstown_merged['Cluster Labels'] == 2, youngstown_merged.columns[[1] + list(range(5, youngstown_merged.shape[1]))]]

Unnamed: 0,City,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,New Springfield,-80.606183,2.0,Garden Center,Discount Store,Women's Store,Dive Bar,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner


### Cluster 4

In [130]:
youngstown_merged.loc[youngstown_merged['Cluster Labels'] == 3, youngstown_merged.columns[[1] + list(range(5, youngstown_merged.shape[1]))]]

Unnamed: 0,City,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Campbell,-80.593256,3.0,Food Truck,Bagel Shop,Bakery,Women's Store,Donut Shop,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner


### Cluster 5

In [131]:
youngstown_merged.loc[youngstown_merged['Cluster Labels'] == 4, youngstown_merged.columns[[1] + list(range(5, youngstown_merged.shape[1]))]]

Unnamed: 0,City,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,Beloit,-80.990071,4.0,Park,Pizza Place,Discount Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Credit Union,Deli / Bodega,Department Store,Diner


## Results and Discussion <a name="results"></a>

   It is clear from the data above that the largest competition that would be faced by a new pizza place would be located in Austintown, Boardman, Canfield, and Youngstown, all located within Cluster 2. This makes sense, as all of these are very high population areas and thus have more venues. However, Poland is an area with a high population, but features a lack of competition in the venue types analyzed. 
    
   Some of the Clusters feature very little competition, but with a corresponding low population and are far from the rest of the zip codes, making for an overall poor area to start a new business.
    
   As previously stated, Youngstown is a recovering city with an economy on the rise. It features a low cost-of-living, as well as low startup costs for new businesses. Its high Italian population means a plethora of available options for both Italian restaurants and pizza places. 
    
    
   When both venue data (via Foursquare API) and population data is analyzed, it is clear that the highest competition would be in the most population dense areas of the Greater Youngstown area. While the Foursquare API was ran based on geospatial data (and thus missing some known venues), it is clear that despite their large populations, Austintown, Boardman, Canfield, and Youngstown would be unlikely areas for a new pizza business to succeed. 
   
   On the contrary, Poland is also population-rich, boasting a total population which exceeds Canfield, and based on the data obtained, does not have any venues that would compete with a new pizza business. When analyzing the clusters, Poland falls within Cluster 2, the densest area analyzed. 


## Conclusion <a name="conclusion"></a>

### Recommendation

The purpose of this project was to determine where an entrepreneur might open a new pizza place with the highest chance of success. Considering competitive venues and population, clustering was performed in order to generate areas of interest. While Cluster 2 featured the most competition, it also features the highest population by far. Poland, an area located near to central hubs in the Greater Youngstown Area, would appear to be a safe bet.


### Future Considerations

As this project only provides a basic analysis of location data, there are absolutely areas to investigate in the future. For example, the Greater Youngstown Area/Mahoning County is only one part of the Greater Youngstown-Warren-Boardman, OH-PA Metropolitan Statistical Area (or “Mahoning Valley”). Further analysis can be performed on the other counties in both Ohio and Pennsylvania which are included in this metro area (Trumbull County in Ohio and Mercer County in PA).
	
Additionally, a deeper look into the population demographics, such as general age and income, could provide even more helpful information to an entrepreneur in their decision making process.  