# Where to open a restaurant in Glasgow

## Introduction / Business Problem 

<p>Glasgow is the largest and most populous city in Scotland. </p>

<p>It is known for its vibrant nature. It is always welcoming new businesses and especially for Restaurant. As it is touted as a business capital of Scotland, there is no shortage of tourists and business visitors. </p>

<p>However not all the constituencies are having quality restaurants. So this is an attempt to find out how where the restaurants are available in Glasgow constituencies, what is the ratio of restaurants to population and suggesting whic constituency would be better to open the restaurant in Glasgow. </p>

<img src="https://kali-capstone-assignment.s3.eu-gb.cloud-object-storage.appdomain.cloud/Glasgow.PNG" alt="Glasgow"></img>

# Data Section

<p> I found the postcodes of Scotand in CSV Format in the following location </p>
<a href="https://www.doogal.co.uk/PostcodeDownloads.php">Postal Codes in UK</a>

<p> I have uploaded the CSV into IBM Object storage, as I am going to use IBM Watson Studio for this exercise. </p>
<p> Here is the link of the file in Object storage.. </p>
<a href="https://kali-capstone-assignment.s3.eu-gb.cloud-object-storage.appdomain.cloud/scotland.csv">https://kali-capstone-assignment.s3.eu-gb.cloud-object-storage.appdomain.cloud/scotland.csv</a>


<p> Let's explore the data and how it will be used for my purposes. I am going to start loading the data into Pandas Dataframe for this purpose. </p>

In [62]:
import pandas as pd
import numpy as np

In [63]:
scotland_df = pd.read_csv("https://kali-capstone-assignment.s3.eu-gb.cloud-object-storage.appdomain.cloud/scotland.csv")

  interactivity=interactivity, compiler=compiler, result=result)


### Print the data to see how it appears

In [64]:
scotland_df.head(5)

Unnamed: 0,Postcode,In Use?,Latitude,Longitude,Easting,Northing,Grid Ref,County,District,Ward,...,User Type,Last updated,Nearest station,Distance to station,Postcode area,Postcode district,Police force,Water company,Plus Code,Average Income
0,AB1 0AA,No,57.101474,-2.242851,385386.0,801193.0,NJ853011,,Aberdeen City,Lower Deeside,...,0,2020-02-19,Portlethen,8.31408,AB,AB1,Scotland,Scottish Water,9C9V4Q24+HV,
1,AB1 0AB,No,57.102554,-2.246308,385177.0,801314.0,NJ851013,,Aberdeen City,Lower Deeside,...,0,2020-02-19,Portlethen,8.55457,AB,AB1,Scotland,Scottish Water,9C9V4Q33+2F,
2,AB1 0AD,No,57.100556,-2.248342,385053.0,801092.0,NJ850010,,Aberdeen City,Lower Deeside,...,0,2020-02-19,Portlethen,8.54352,AB,AB1,Scotland,Scottish Water,9C9V4Q22+6M,
3,AB1 0AE,No,57.084444,-2.255708,384600.0,799300.0,NO845992,,Aberdeenshire,North Kincardine,...,0,2020-02-19,Portlethen,8.20809,AB,AB1,Scotland,Scottish Water,9C9V3PMV+QP,
4,AB1 0AF,No,57.096656,-2.258102,384460.0,800660.0,NJ844006,,Aberdeen City,Lower Deeside,...,1,2020-02-19,Portlethen,8.85583,AB,AB1,Scotland,Scottish Water,9C9V3PWR+MQ,


In [66]:
scotland_df.shape

(224804, 47)

In [67]:
scotland_df.dtypes

Postcode                           object
In Use?                            object
Latitude                          float64
Longitude                         float64
Easting                           float64
Northing                          float64
Grid Ref                           object
County                            float64
District                           object
Ward                               object
District Code                      object
Ward Code                          object
Country                            object
County Code                        object
Constituency                       object
Introduced                         object
Terminated                         object
Parish                            float64
National Park                      object
Population                        float64
Households                        float64
Built up area                     float64
Built up sub-division             float64
Lower layer super output area     

#### As the data contains very granular detail, we need to narrow it down further

<ul>Firstly extract Glasgow only data from the scotland dataset</ul>
<ul>This will be achieved by the District column. Let's see what are the different values of the district column</ul>

In [69]:
scotland_df['District'].value_counts()

Glasgow City             25406
Aberdeenshire            21424
City of Edinburgh        18948
Aberdeen City            14307
Fife                     12705
South Lanarkshire        10557
Highland                 10165
North Lanarkshire         9739
Dumfries and Galloway     7851
Perth and Kinross         6890
Moray                     6626
Renfrewshire              6191
Dundee City               6106
Scottish Borders          5176
West Lothian              5092
Falkirk                   5069
North Ayrshire            4995
East Ayrshire             4745
Angus                     4540
South Ayrshire            4454
East Dunbartonshire       4256
Argyll and Bute           4136
Stirling                  3709
East Lothian              3572
Midlothian                3367
East Renfrewshire         3132
West Dunbartonshire       3058
Inverclyde                3002
Na h-Eileanan Siar        2003
Clackmannanshire          1595
Orkney Islands             831
Shetland Islands           749
Name: Di

##### Let's obtain the dataframe for glasgow only

In [70]:
glasgow_only = scot_dfs[scot_dfs['District'] == 'Glasgow City' ]

In [71]:
glasgow_only.shape

(25406, 47)

###### Further data processing steps

<ul> Filter out the rows that are not "In Use?" </ul>
<ul> Take only the columns Constituency, Latitude, Longitude, Population </ul> 
<ul> Group them and create a new Data frame which has the aggregated values </ul>

In [72]:
glasgow_only_active = glasgow_only[ glasgow_only["In Use?"] == 'Yes']
glasgow_only_active.shape

(15413, 47)

In [73]:
glasgow_working_df = glasgow_only_active[["Constituency","Latitude","Longitude", "Population"]].reset_index()

In [75]:
glasgow_cons_population = glasgow_working_df.groupby("Constituency")["Population"].sum().reset_index()

In [76]:
glasgow_cons_latitude = glasgow_working_df.groupby("Constituency")["Latitude"].max().reset_index()

In [77]:
glasgow_cons_longitude = glasgow_working_df.groupby("Constituency")["Longitude"].min().reset_index()

In [78]:
 glasgow_cons_latlong = pd.merge(glasgow_cons_latitude, glasgow_cons_longitude, on="Constituency")

In [80]:
glasgow_df = pd.merge(glasgow_cons_latlong, glasgow_cons_population, on="Constituency")

In [81]:
glasgow_df.head(10)

Unnamed: 0,Constituency,Latitude,Longitude,Population
0,Glasgow Central,55.871472,-4.31386,88872.0
1,Glasgow East,55.883648,-4.222633,86052.0
2,Glasgow North,55.92641,-4.322833,71435.0
3,Glasgow North East,55.917998,-4.271293,81568.0
4,Glasgow North West,55.918846,-4.387423,84955.0
5,Glasgow South,55.84389,-4.335717,87132.0
6,Glasgow South West,55.868672,-4.379913,83174.0


#### This is the data that will be used with Four Square API to explore venues in each of these constituencies. Clustering would be applied to the venues and sort them out to see where the restaurants are ranked. This will allow to decide which or whether any areas are good for opening a restaurant 

### This is the End of Week 4 submission of Capstone