# Capstone Project - Compatible Neighborhoods for Indian Restaurants
### Prakirth Govardhanam
### Applied Data Science Capstone by IBM/Coursera

## Introduction/Business-Problem

In this project, I try to find possible-beneficial locations within the Neighborhoods of Helsinki, Finland for establishing a chain of **Indian Restaurants**. The conditions to fulfill in order are:
* CONDITION 1 - Distance from **_Popularity Centre (Assumption)_** in the Neighborhood - for popularity
* CONDITION 2 - Absence of other **Indian restaurants** in the Neighborhood - to limit competition 

## Data

Data sources used to determine the Neighborhoods within the city of Helsinki are provided by:
* **Wikipedia_(https://en.wikipedia.org/wiki/Subdivisions_of_Helsinki)_** - for listing the Neighborhoods of Helsinki
* **The City of Helsinki_(https://kartta.hel.fi/avoindata)_** - for districts' labels and geospatial Data
* **Foursquare API** - for popular venues, restaurants and their respective geospatial data


### _Project Assumption_

* **_Popularity Centre_** = the centroid of the top-10 venues (filtered by ratings) in each Neighborhood will be considered as the "popularity centre" within every Neighborhood

# PART 1 - Data Preparation

## Part 1.1 - Data Extraction

### Import necessary libraries

In [6]:
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

### Clarification:
* Names of anything in Finland has its name in 2 languages, **_Finnish & Swedish_**
* Hence, names of Neighbourhoods & Districts are also in same pattern: **_Postal-Code Finnish-name (Swedish-name)_**

### _Assumption_ #1
* In the current extracted labels data, _Finnish-names_ are **Available for every place** where as _Swedish-names_ are **not**.
* Hence, we will extract and work only with _Finnish-names_ of the Neighbourhoods & Districts

### _Straigh forward approach for Finnish District/Neighborhood names_

In [30]:
url = 'https://en.wikipedia.org/wiki/Names_of_places_in_Finland_in_Finnish_and_in_Swedish#Municipalities'

html = requests.get(url).text
soup = BeautifulSoup(html, features='html.parser')

# Select Helsinki Districts div/span-tag
# Extract a-href tag titles --> only gives Finnish names

### _Round About and Complex Approach_

In [7]:
# url for Labels of Helsinki neighborhoods from Wikipedia page
url = 'https://en.wikipedia.org/wiki/Subdivisions_of_Helsinki'

#scraping html data using requests and BeautifulSoup
html = requests.get(url).text
soup = BeautifulSoup(html, features='html.parser')

#extracting labels without html-tags and separating with ','
labels = [label.get_text(",", strip=True) for label in soup.find_all(class_ = 'div-col columns column-width')]

In [16]:
#splitting neighborhood and district labels using stripping pattern:(,)
hoods = re.split(r",", labels[0])
dists = re.split(r",", labels[1])

In [9]:
print(f"List of Neighborhoods:\n{hoods[0:9]}\nList of Districts:\n{dists[0:12]}")

List of Neighborhoods:
['01', 'Kruununhaka', '(Kronohagen)', '02', 'Kluuvi', '(Gloet)', '03', 'Kaartinkaupunki', '(Gardestaden)']
List of Districts:
['1', 'Helsinki southern major district', '101', 'Vironniemi', '(', 'Estnäs', ')', '102', 'Ullanlinna', '(', 'Ulrikasborg', ')']


In [10]:
print(hoods[0:10])

['01', 'Kruununhaka', '(Kronohagen)', '02', 'Kluuvi', '(Gloet)', '03', 'Kaartinkaupunki', '(Gardestaden)', '04']


In [11]:
#extracting codes
scraped_codes = []
for num in hoods:
    scraped_codes.append(re.findall(r"(\d+)", num))
codes = [code for scraped_code in scraped_codes for code in scraped_code if code != '']
print(codes)
# CODES - NOT NEEDED !!!!

  
#extracting Finnish-names

#splitting each neighborhood into an array -> np.array_split(hoods, len(hoods)/3, axis=0)

['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '102', '11', '111', '112', '113', '12', '121', '122', '13', '14', '15', '16', '161', '1', '17', '171', '172', '173', '174', '18', '19', '20', '201', '202', '203', '204', '21', '22', '23', '231', '232', '24', '25', '26', '27', '28', '281', '282', '283', '284', '285', '286', '287', '29', '291', '292', '293', '294', '30', '301', '302', '303', '304', '305', '306', '31', '32', '33', '331', '332', '333', '334', '335', '34', '341', '342', '35', '351', '352', '353', '354', '36', '361', '362', '363', '364', '37', '38', '381', '382', '383', '384', '385', '386', '39', '391', '392', '40', '401', '402', '403', '41', '411', '412', '413', '414', '42', '43', '431', '432', '433', '434', '44', '45', '451', '452', '453', '454', '455', '456', '457', '46', '461', '462', '463', '464', '465', '47', '471', '472', '473', '474', '475', '48', '49', '491', '492', '493', '494', '495', '50', '51', '52', '53', '531', '532', '533', '54', '541', '542', '543'

In [12]:
scraped_names = []
for name in hoods:
    scraped_names.append(re.findall(r"[a-zA-ZÄäÖöÅå\s-]+", name))
print(scraped_names[:10])
names = [name for scraped_name in scraped_names for name in scraped_name if name != []]
print(names[:10])

[[], ['Kruununhaka'], ['Kronohagen'], [], ['Kluuvi'], ['Gloet'], [], ['Kaartinkaupunki'], ['Gardestaden'], []]
['Kruununhaka', 'Kronohagen', 'Kluuvi', 'Gloet', 'Kaartinkaupunki', 'Gardestaden', 'Kamppi', 'Kampen', 'Punavuori', 'Rödbergen']


In [14]:
#TEST Run to extract coordinates from District/Neighborhood names

import geocoder
from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent='Helsinki_zipcodes')
for name in names[:10]:
    print(geolocator.geocode(name).latitude)

60.1728702
60.1728702
60.1707783
60.1707783
60.1652138


AttributeError: 'NoneType' object has no attribute 'latitude'

In [None]:
#https://gis.stackexchange.com/questions/342855/reading-geopackage-geometries-in-python
import geopandas as gpd
data = gpd.read_file("path.mygeopackage.gpkg")
data.head() 

#UNINSTALL VS CODE INSTALLER!!!!!