# Final Capstone Project
## Find the best city in and around Cincinnati, OH to launch a new Asian Grocery Supermarket.

### **1. Introduction:**
#### **Client and Background:** Our client is a business chain that has Asian grocery markets across various states within USA. They want to expand their business in mid-west and are contemplating Cincinnati neighborhood as their destination. We will help them in researching and analyzing various neighborhoods in the region and find the best location to launch their new store.

#### **Objective:** The project is to **research the main cities within the Greater Cincinnati Area of Ohio (OH)** and present their potential to be a good fit **for launching a new Asian Grocery Supermarket**.
#### After few initial discussions with client, we agreed to explore the cities in terms of **Restaurants** and **Schools** to get an idea of Asian population in these cities. The client would be primarily selling the Indian and Chinese groceries and other festive items so we will primarily consider Indian and Chinese restaurants with other Asian Restaurants. School analysis presumes the current and future immigrant Asian population may be more focused on cities with high standard of education.

#### We will work to present each city in terms of number of Indian/Chinese/Asian Restaurants, schools and school district details. 

### **2. Data:**
#### We will be using the below datasets -

#### **Cities:** For cities, we start with the Wikipedia page.
##### https://en.wikipedia.org/wiki/Cincinnati_metropolitan_area
#### The page lists all the Counties in the Cincinnati metro area, we take the Ohio (OH) counties from here and dig deep to more wiki pages for those counties to get the main cities. the cities have their population listed on those wiki pages that we will be using.
#### For **locations** data of cities and other venues we will use `geolocator`

#### **Restaurants:** For restaurants analysis, we will use the FOURSQUARE database and APIs.
##### https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&query={}&ll={},{}&radius={}&limit={}

#### **Schools and Schools Districts:** This information is provided by the state education websites regularly every year, we will use the current year (2019) enrollment data and the ranks available from the last year.
>#### i) Fall Enrollment (Headcount) - October 2018 Public Districts and Buildings
> ##### http://education.ohio.gov/Topics/Data/Frequently-Requested-Data/Enrollment-Data
>#### ii) Performance Index Score Rankings
> ##### http://education.ohio.gov/lists_and_rankings

#### Now our plan is to explore all the information above and see how best the available data in those sites can be leveraged to serve our objective. The methodology for that is described in the next section.

### **3. Methodology and Steps:** 
#### Upon initial exploratory analysis, we figured that the below information about each city can best represent, its suitability for the new store that our client wants to open in the city.

> - density of Indian/Chinese/other Asian restaurants within 5 miles of the cities
> - population of the cities
> - number of Pre/Elem/Med/High schools within 5 miles of those cities
> - grade/rating of the main school district serving each city
> - Total enrollment in the school district this current year and possibly # of Asian students enrolled

#### Here is a steps we will follow to gather and analyze the above information
#### **1. Gather the Cities and visualize their locations:**
#### We explored the wiki pages for the cities and found that the pages/information are so spread that it makes more sense to do some work offline and consolidate the information in an csv rather than web scrapping. We did that. 

#### This step reads the first CSV file that has basic information on cities (Name and Geo Coordinates) within Greater Cincinnati.
#### We will also view them on the map to get a good view of their relative locations using FOLIUM.

#### **2. Gather Restaurant Data:**
#### We will be using FOURSQUARE APIs and Beautifulsoup methods to get the Restaurants information in those cities.

>##### - 2.a) Start with exploring one city: We will explore one city first to understand FOURSQUARE results
>##### - 2.b) Gather Information for all cities: Repeat the above process to get restaurants data for all the cities in question

#### **3. Gather Schools Data:**
####  Similar way, we will be using FOURSQUARE APIs to get schools information around those cities
>##### 3.a) Explore the schools data for one city like we did for restaurants
>##### 3.b) Get schools locations data for all the cities using Foursquare APIs.
>##### 3.c) Also get Schools demographic and performance data (grades, population etc.) available at the state education departments sites.
>##### -- This (3.c) information is also little spread across multiple site. So like the city names, we did some off line work to consolidate the information of school districts as well to save efforts.             
                 
#### **4. Combine all information for final analysis and presentation**
>##### 4.a) combine schools location data (3a) with restaurants location data (2) and group them for Citywise representation (one row for each city)
>##### 4.b) Then add schools grade and population information (3.b) as well for each city

#### Now that we have all the information in the dataframe for each city, we will assign score to each city for each of the criteria and take the final score to rank the cities. We will also plot them using FOLIUM to visualize the Final Index score and location on the map for easy comprehension. For best visualization, we will have the circle size for each, based on their final Index tanking.

### **4. Results:** 
#### Finally we are able to rank all the cities based on their final index. We see that cities like Mason, Blue Ash, Cincinnati, Westwood, Liberty Township rank higher while cities like Franklin, Oxford rank at the bottom. 
#### But we do not want to make the decision based on their rank alone. It may not make sense to open the new store in the highest ranked city if that is isolated from all other cities, right?. For this reason, it is also very important to see the reach of higher ranking cities to to other cities in question.
#### To achieve this, we again plotted the final list of cities on the map using FOLIUM, but this this have their circle size according to their final index. The higher the rank, the larger the circle. 
#### That made it so much convenient for our client to see that the cities like Mason, OH, Blue Ash, OH, Cincinnati, OH and Westwood, OH and most potential candidates for the new market.
#### While cities like Oxford, Franklin, Amelia, Hamilton, South Lebanon are ruled out for their lack of Asian Index and at the same time distance from other cities with good ranks.
#### Our client finally could use this information along with few other off line checks (like competition, their reviews, real state rates etc.) to decide that they would open their new Asian Super Market in **Mason, OH.**

### **5. Discussion and Conclusion:** 
#### So we helped out client choose the best fit city to launch the new Asian Grocery Supermarket. We achieved this by collecting, cleansing, preparing, analyzing, summarizing, and presenting the data in the best way so he can visualize the cities and their information.
#### The above work shows, in a rather simple way, how data science can be used to make such important decisions. 
#### This work can be further extended/modified to include more features and criteria like *Real State prices*, *# of Competitors* etc. to make it more exhaustive. Further the work can be extended using various other data science and machine learning technologies like *Classification, Clustering* to service various other clients and their specific asks. For example:
>##### i) An individual customer who wants to relocate from some other city of US to greater Cincinnati can use us to know which city is best to rent/buy a home for the family. We can analyze his original city and find a city with similar characteristics from the list that we gathered within Cincinnati area using clustering techniques.
>##### ii) Any other new business ventures that wants to come to Cincinnati can use this information for demographic analysis of cities. like to host Korean Movies in theaters, open Italian Restaurant etc.
#### And of course, this entire thing can be easily replicated for any other city, metropolitan area across the globe with little tweaking!