## **Opening a New Business in San Antonio, TX Informed by Foursquare Data**

### **D. Risius**  
### *1/5/2020*

### **Introduction:**
San Antonio, Texas is one of the fastest growing cities in the United States.  According to the [United States Census Bureau](https://www.census.gov/newsroom/press-releases/2018/estimates-cities.html), San Antonio topped the list of the fastest growing metro areas for 2017.  In previous analysis, we clustered and segmented neighborhoods in [Toronto](https://github.com/risiud/Coursera_Capstone/blob/master/Clustering%20Toronto%20Neighborhoods.ipynb) and New York city based on FourSquare venue data.  San Antonio is a very different city than either New York or Toronto.  For one, it is a very large city with relatively sparse population compared to the other cities.  According to [Wikopedia](https://en.wikipedia.org/wiki/San_Antonio), San Antonio city consists of around 1.5 million people within a land area of 461 square miles compared to 8.5 million for 303 square miles in [New York City](https://en.wikipedia.org/wiki/New_York_City) and 2.7 million for 243 square miles in [Toronto](https://en.wikipedia.org/wiki/Toronto).  The ethnicity of the three cities is also different.  San Antonio has a large hispanic influence with around 63% of residents of hispanic or latino origin.  New York is around 28% hispanic while Toronto is around 4% hispanic with a much larger proportion of Asian (40%) and European (48%) than San Antonio or New York.  This data may help us determine which types of venues may be more successful in different neighborhoods.

### **Problem Statement:** 
Given our previous analysis clustering and segmenting neighborhoods using FourSquare data in New York City and Toronto, how does San Antonio, Texas compare in terms of most popular types of veneus?  If we wanted to open a new business in San Antonio, can we use the FourSquare data for the different clusters to inform a decision on what type of business to open and the best location around the city to start it?

### **Data:** 
New York City and Toronto have well defined neighborhoods that helped us cluster the data.  San Antonio has some established neighborhoods, however many of the areas within the city are not defined within a particular neighborhood.  Therefore we can't use the same approach as we did with New York and Toronto as we would omit large portions of the city.  San Antonio consists of 87 seperate zip codes.  For analyzing San Antonio we will these zip codes intead and will map and cluster those using the geographical center of the zip code.  To get the geographic coordinates we used the website [San Antonio AreaConnect](https://sanantonio.areaconnect.com/zip2.htm?city=San%20Antonio&search=zip) which provides latitude/longitude coordinates for the various zip codes around San Antonio.  We will cluster these zip codes using the Foursquare location data similiar to the analysis in New York and Toronto.  Based on the cluster analysis, we will recommend ideas for new business venue(s) in particular zip codes.  The excerpt of the San Antonio zip code data below shows the data we will use for our analysis.  First we import all the necessary packages to read the data as a pandas dataframe and plot the geographic data on a map.     

In [8]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [9]:
url = "https://raw.githubusercontent.com/risiud/Coursera_Capstone/master/SanAntonioZips.csv"

neighborhoods = pd.read_csv(url)
neighborhoods.head()

Unnamed: 0,Zipcode,City,State,AreaCode,County,Latitude,Longitude
0,78201,San Antonio,TX,210,Bexar,29.472,-98.537
1,78202,San Antonio,TX,210,Bexar,29.422,-98.466
2,78203,San Antonio,TX,210,Bexar,29.415,-98.462
3,78204,San Antonio,TX,210,Bexar,29.397,-98.5
4,78205,San Antonio,TX,210,Bexar,29.424,-98.487


In [4]:
print('The dataframe has {} Zip Codes.'.format(
        len(neighborhoods['Zipcode'].unique())
    )
)

The dataframe has 87 Zip Codes.


We use the geopy and folium packages to map the geographic position for each of the 87 zipcodes around San Antonio.  We will compare and cluster these in the follow-on analysis to find recommended locations to open a new business.![San Antonio Zip Codes](https://github.com/risiud/Coursera_Capstone/blob/master/SanAntonioNeighborhoods.JPG?raw=true)