# <center>Capstone Project - The Battle of Neighborhoods</center> 

## 1. Introduction/Business Problem

### Background: Foursquare

Foursquare Labs Inc., commonly known as [Foursquare](https://foursquare.com/), is an American technology company. The company's location platform is the foundation of several business and consumer products, including the [Foursquare City Guide](https://foursquare.com/city-guide) and [Foursquare Swarm](https://www.swarmapp.com/) apps.
Foursquare built a massive dataset of location through crowd-sourcing their data and had people use their app to build their dataset and add venues and complete any missing information they had in their dataset. Communicating with the Foursquare
database is really very easy, all thanks to their RESTful API. You simply create a uniform resource identifier, or URI, and you append it with extra parameters depending on the data that you are seeking from the database. Any call request you make is composed of, we can call this base URI, which is _api.foursquare.com/v2_ and you can request data about venues, users, or tips. But, every time you make a call request, you have to pass your developer account credentials, which are your Client ID and Client Secret as well as what is called the version of the API, which is simply a date.

### Background: Nottingham, UK

[Nottingham](https://en.wikipedia.org/wiki/Nottingham) is a city and unitary authority area in Nottinghamshire, England. Part of the East Midlands region, it is 128 miles (206 km) north of London and 45 miles (72 km) northeast of Birmingham.
<img src="https://previews.123rf.com/images/yurkaimmortal/yurkaimmortal1303/yurkaimmortal130300023/18386260-nottingham-england-skyline-city-silhouette.jpg"  width=400 height = 400/>

Nottingham has links to the legend of Robin Hood and to the lace-making, bicycle (notably Raleigh bikes) and tobacco industries. Nottingham is a popular tourist destination. In 2020, Nottingham had an estimated population of 330,000. The wider conurbation, which includes many of the city's suburbs, has a population of 768,638.

### Business Problem

Nottinhgam city center is very lively with a wide offer of food and entrainment venues, while other areas of the city are also becoming more popular. However, historic data and observations in the city center suggest that there is a very high turn-over among newly opened venues and many of them shut relatively soon after opening. At the same time, there are several exmamples of venues in different categories that have become very successful and established.<br><br>
This poses a critical question for potential new owners and investors: **Is there any link between the area and the type of venue that will be successful?**

The target audience of this case study are existing and prospective new owners of restaurants, bars, arcades, food vans, snack shops and many other types of venues in the food and leisure industry. This case study can inform them which areas of Nottingham might be the best fit for opening their business ideas as well as where there is already a high concentration of similar business, such that there may be a stronger competition.

As a data science project, this case study may also reveal new insights from data, which have not been known and noticed before.

## 2. Data

A combination of different datasets will be used to explore the suggested business problem. In order to address the questions outlined in the previous section, a collection of data is required about:
- venue category or type
- its location
- how long it operated
- how popular it was or is

It is proposed to use Foursquare location, reviews and check-ins data to dervie the features above. The location data associated with venues can be obtained directly from the Foursquare dataset, which then can be used for spatial clustering. It is more challenging to evaluate how long a venue operatated, as it must be recognised that the venue may be __currently open and active__ or __already out of business and shut down__.

To deal with this problem, the following method is proposed:
1. Obtain the most recent review/check-in date for the venue - this is the **START DATE**
2. If this date is within the last 3 months, the venue is labelled as **ACTIVE**. If this date is older than the last 3 months, the venue is labelled as **INACTIVE**
3. Obtain the oldest reveiew/check-in for the venue - this is the **END DATE**
4. Calculate the differecene between the start date and the end date - this is the **OPERATION LENGTH**


The final part is to evaluate the how popular the vanues was or is. For this, the following values should be extracted:
1. Mean review score - this is the **MEAN SCORE**
2. Total number of reviews/check-ins - this is the **VISITS** number

The other features are:
- **NAME**
- venue **TYPE**
- **LONGITUDE**
- **LATITUDE**

Therefore, the objective is to populate a dataframe, which will look like this:

In [6]:
# Code to generate an example dataframe
import pandas as pd
column_names = ['Name', 'Type', 'Latitude', 'Longitude', 'Status', 'StartDate', 'EndDate', 'MeanScore', 'Visits'] 
venues = pd.DataFrame(columns=column_names)
venues = venues.append({'Name':'Example A', 'Type':'Restaurant', 'Latitude':52.125, 'Longitude':-1.487,
                        'Status':'ACTIVE', 'StartDate':pd.Timestamp(2017, 5, 4), 'EndDate':pd.Timestamp(2021, 3, 15),
                        'MeanScore': 4.6, 'Visits':241}, ignore_index=True)
venues = venues.append({'Name':'Example B', 'Type':'Bar', 'Latitude':52.130, 'Longitude':-1.975,
                        'Status':'INACTIVE', 'StartDate':pd.Timestamp(2018, 10, 1), 'EndDate':pd.Timestamp(2019, 11, 30),
                        'MeanScore': 2.2, 'Visits':148}, ignore_index=True)
venues


Unnamed: 0,Name,Type,Latitude,Longitude,Status,StartDate,EndDate,MeanScore,Visits
0,Example A,Restaurant,52.125,-1.487,ACTIVE,2017-05-04,2021-03-15,4.6,241
1,Example B,Bar,52.13,-1.975,INACTIVE,2018-10-01,2019-11-30,2.2,148


The dataframe above shows two examples of the data that needs to be extraxcted from the Foursquare dataset for Nottingham.