### Capstone project - Applied Data Science Capstone by IBM/Coursera
___
# The best inner London areas to launch new coffee shops

![coffee cup](coffeecup.jpg)

### Table of contents:
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

___
## 1. Introduction: Business Problem <a name="introduction"></a>

My friend has a successful inner London coffee shop and has been approached by an investor keen to develop his cafe concept into a small franchise and expand across inner London.

Both he and the investor believe a key component of the cafe's success has been selecting a location close to a new high rise apartment development.  By providing a friendly, buzzy cafe with great coffee he has created a communal place for the locals, who are typically young, professional renters with high disposable income and a taste for good quality coffee in the morning on their way to work, on weekends with a pastry, and increasingly during the day whilst they work from home during Covid lockdown.

He has asked me to help him identify other new high rise apartment developments that lack good coffee shops in the immediate vicinity, and will then propose these to the investor as potential sites to expand the coffee franchise.

___
## 2. Data <a name="data"></a>

1. The London Development Database (LDD) which records significant planning permissions in London.
https://data.london.gov.uk/download/planning-permissions-on-the-london-development-database--ldd-/966b9309-3969-417e-b1d0-b97cfe42404a/LDD%20-%20Housing%20Completions%20unit%20level%20%28final%29.xlsx

2. A spreadsheet containing postcodes and latitude and longitude coordinates for the building developments in the final refined dataframe
'Geospatial_Coordinates.xlsx'

3. The FourSquare API which provides information on cafes.
'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, neighbourhood_latitude, neighbourhood_longitude, radius, LIMIT)


___
## 3. Methodology <a name="methodology"></a>

1. Identify the areas in London with the highest number of new flats and houses recently approved.  
- Target developments approved in the past 3 years
- Restrict the data to include only inner London boroughs denoted by letters N, NE, NW, W, SW, S, SW, E and EC and followed by numbers 1-9. All postcodes with a number greater than 9 are outside the perimeter we're interested in.
- Use 'Post Code' field for the location reference.
- Use 'Total proposed units' and restrict the values to >6 to eliminate town house conversions.
- Rename column field values so that they are more user friendly
- Reformat date completed field to show only the year 'YYYY' to make it easier to select developments completed after 2017
- Create a new dataframe with only the columns required
- Remove duplicate rows to show only one row per development
- Restrict dataframe to show developments only with >100 and <300 units completed after 2016
- Remove incomplete post codes
- Merge excel file with longitude and latitude for each post code using the post code field 
- Use Folium to create a map of London and plot the results on the map

2. Once we have identified some candidate sites, we will then identify the number of coffee shops in the local area, the mix of small independent cafes versus well known chains e.g. Costa, Pret, Starbucks, and the strength of the independent cafes ratings versus the chains.  We will use the FourSquare API to complete this step. 

3. Finally, we will then compare the most popular businesses in the vicinity of the candidate sites against the types of businesses in our current local vicinity to determine if the candidate site has clientele with similar tastes to our current site, and therefore might be more receptive to our product as is, or alternatively, might necessitate some tailoring to local tastes.

___
## 4. Results <a name="results"></a>

From the initial data set from the Greater London Authority containing 149,856 records we were able to identify eight developments that fit the criteria of >100 & <300 units completed after 2016.

We're seeing a lot of duplicate rows which are likely amendments to the original planning permission.  
We only want one record per development so we will cleanse the data using the drop-duplicates() function.  
We're also seeing a lot of post codes from outer London post codes, so we will restrict the dataframe to inner london postcodes only.

Another problem identified is the lack of full post codes for some developments.  
This will make mapping the blocks impossible, so we will remove these. 

This is our final clean list of inner city developments.  
We now need to import a dataframe containing latitude and longitude and then merge the two dataframes on the postcode field.

I'll then use Folium to create a map of London using longitude and latitude coordinates



___
## 5. Discussion <a name="discussion"></a>

London is a big city with a significant number of large residential development projects completed in the past three years.  I used some of the more basic data science techniques to cleanse and refine the data and decided against using machine learning techniques as this wasn't required for the business problem I was trying to solve for.

I also performed data analysis through this information by adding the longitude and latitude coordinates of London boroughs as static data on GitHub. In future studies, this data can also be accessed dynamically from specific platforms or packages.

I have experienced significant environment issues with this project and have given up on identifying coffee shops in the vicinity using the FourSqaure API which is a shame as I enjoyed using this API successfully in the labs.  

I ended the study by visualizing the data on a map of London.

___
## 6. Conclusion <a name="conclusion"></a>

There are eight potential developments that we should investigate further as potential locations to set up a new coffee shop.

___