# Background

## Introduction/Business Problem

I live in San Diego, CA, one of the top ten most populous US cities with roughly 1.5 million people.  Due to its Spanish and Mexican historical roots and proximity to the US/Mexico border, there is a large Latino population in the city, predominantly from our neighbor Mexico.  In addition to residents, a large number of workers cross over from Tijuana and back each day to work jobs in the US, and often need somewhere to grab lunch. Anywhere you drive around there seems to be an abundance of Mexican eateries, many of which are family-owned and well-known to San Diegans. However, it still seems there are go-to neighborhoods around the city that are known for having the best Mexican food, and some neighborhoods with little to no options.  With this project I will explore, for a budding entrepreneur who cooks the best Mexican food in the neighborhood, where the optimal location would be to open a new enterprise in a seemingly-saturated marketplace with such a large total addressable market.

Using geospatial and demographic data. I will explore neighborhoods in the city of San Diego, cluster them and determine identifying demographic characteristics of each.  Then using venue data from the Foursquare API, I will identify underserved communities for a new Mexican food restaurant based on various factors, and provide recommendations for different strategic options.

## Data Sources and Use

Using the City of San Diego's Data Portal (https://data.sandiegocounty.gov/browse), I identified several datasets containing different demographic data points such as '2017 San Diego County Demographics - Race and Ethnicity', '2016 San Diego County Demographics - Age and Gender' '2016 San Diego County Demographic Profiles - Median Income and Persons Per Household', among others. These datasets contain various demographic data broken down by neighborhood.  'Neighborhood' refers to census-designated Subregional Areas (SRAs), which have remained consistent for many years.  Geospatial data for San Diego SRAs came from San Diego ArcGIS dataset MSA_SRA (https://sdgis-sandag.opendata.arcgis.com/datasets/msa-sra), with help from the online virtual mapping application. Venue data will be obtained using the Foursquare API, which will assist in creating clusters for likely locations for a new restaurant.

In [22]:
# import all dependencies

import pandas as pd
import numpy as np

## race and ethnicity dataset
data = pd.read_csv('2017_San_Diego_County_Demographics_-_Race_and_Ethnicity.csv')
data.head(30)

Unnamed: 0,Geography,Hispanic,Percent Hispanic,White,Percent White,Black,Percent Black,API,Percent API,AIAN,Percent AIAN,Other,Percent Other
0,Central Region,217672,0.4,153475,0.3,56358,0.1,67943,0.1,1248,0.0,14964,0.0
1,Central San Diego,58777,0.3,87442,0.5,10628,0.1,9826,0.1,537,0.0,4956,0.0
2,Mid-City,71326,0.4,49962,0.3,22166,0.1,25756,0.1,256,0.0,5231,0.0
3,Southeastern San Diego,87569,0.5,16071,0.1,23564,0.1,32361,0.2,455,0.0,4777,0.0
4,East Region,135614,0.3,277918,0.6,27259,0.1,24458,0.0,2708,0.0,20373,0.0
5,Alpine,1770,0.1,12923,0.8,172,0.0,324,0.0,75,0.0,534,0.0
6,El Cajon,38158,0.3,75168,0.6,6648,0.1,4918,0.0,233,0.0,6268,0.0
7,Harbison-Crest,2561,0.2,11718,0.8,37,0.0,250,0.0,185,0.0,413,0.0
8,Jamul,6452,0.3,9500,0.5,1203,0.1,1953,0.1,124,0.0,511,0.0
9,La Mesa,15762,0.3,34587,0.6,4021,0.1,3714,0.1,209,0.0,3145,0.1


In [18]:
data.dtypes

Geography            object
Hispanic              int64
Percent Hispanic    float64
White                 int64
Percent White       float64
Black                 int64
Percent Black       float64
API                   int64
Percent API         float64
AIAN                  int64
Percent AIAN        float64
Other                 int64
Percent Other       float64
New_Column           object
dtype: object