# Data Ninja - Secret places to open a restaurant (Part I)
**Fernanda Oliveira**  
Data Scientist

## Table of contents
1. [Introduction](#introduction)
2. [Data acquisition](#data)
3. [Methodology](#methodology)
4. [Analysis](#Analysis)
5. [Results and Discussion](#results)
6. [Conclusion](#conclusion)

## 1. Introduction  <a name="introduction"></a>

### 1.1 Background

The city of Berlin is well known to be a cosmopolitan city where you can find people from all around the world. Berlin offers a very wide commercial variety, especially in the area of gastronomy. The trend that comes to stay, are Asian restaurants, particularly Japanese restaurants. Although there are a lot of them spread in the city, there are new ones opening all the time. Therefore to analyze locations, types, and the number of these restaurants is a plus for those who want to open a new restaurant in the city.

### 1.2 Problem

Searching an optimal location to open a Japanese restaurant in the city of Berlin can be challenging. One could think that the better location for it should be at a place where there is no Japanese restaurant. But the problem is that perhaps most of the interested customers instead of going to an isolated neighborhood, prefer to go to a popular neighborhood, where there are more options and also there is movement of people. At the same time that the concurrence will be big in these regions, the flux of interested customers in this specific region will be relevant as well. Many people, for example, go on the weekends to a specific Japanese restaurant and when they arrive, there is a large line waiting for them. This usually happens because it is also a new trend in Berlin, in some popular restaurants, not to have an option to make a reservation. The good news is that perhaps some of the customers, those who do not want wait too long in line, might want to search for similar options in the neighborhood. 

### 1.3 Interest

This project is ideal for a person or a branch that is interested in opening a Japanese restaurant. 

In [40]:
import numpy as np # library to handle data in a vectorized manner

In [41]:
import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files


In [42]:
import requests # library to handle requests

In [43]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [44]:
#!pip install seaborn
import seaborn as sns

In [45]:
import matplotlib.pyplot as plt
from matplotlib.ticker import EngFormatter  
from matplotlib.ticker import PercentFormatter  

In [46]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [47]:
import geocoder
# from geopy.geocoders import Nominatim
# !pip install pygeocoder
# from pygeocoder import Geocoder

In [48]:
#!conda install -c conda-forge geopy --yes 
#from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [49]:
#!conda install -c conda-forge folium=0.5.0 --yes 
#import folium # map rendering library

In [50]:
#print('Libraries imported.')

## 2. Data acquisition <a name="Data source "></a>

### 2.1 Data source 

The data and tools that I will use are the following:

* **Foursquare API** to select the number of restaurants and their location in some neighborhoods of Berlin
* **Geocoder** to get the latitudes and longitudes of places to rent, together with information from https://www.sebuyo.com

### 2.2 Feature selection

* I will first create a dataset thought the Foursquare API, exploring several types of venues, such as ID, name, category (Japanese restaurant), latitude, longitude, neighborhood, and distance (in meters) to Charllotenburg, a borough of Berlin, where is very famous to have Japanese restaurants. Then I will apply again the search using Foursquare API for public transportation categories, city train, and metro in Berlin. 

* I will save the data collected using Foursquare API to a CSV file and then read them with `Pandas`.

* Then, I will create another dataset that has information about available places to rent in Berlin. First, I will create the features "postal codes" and "prices" of these places and then with the help of Geocoder, I will get the latitude, longitude features. Then, I will save to a file CSV and read it with `Pandas`.

In [51]:
CLIENT_ID = 'myclientID' # your Foursquare ID
CLIENT_SECRET = 'myclientsecret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [52]:
# define Berlin's geolocation coordinates (center of Charllotenburg)
berlin_latitude = 52.50333132 #52.520008
berlin_longitude = 13.308665432 #13.404954

In [53]:
type your answer here
LIMIT = 500 # limit of number of venues returned by Foursquare API
radius = 15000 # define radius
category = '4bf58dd8d48988d111941735' # Japanese restaurants 
#category = '4bf58dd8d48988d1fc931735' # S-Bahnhof
#category = '4bf58dd8d48988d1fd931735' #U-Bahnhof

In [54]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
 CLIENT_ID, 
   CLIENT_SECRET, 
    VERSION, 
   berlin_latitude, 
   berlin_longitude, 
    category,
      radius, 
    LIMIT)

In [55]:
results = requests.get(url).json()
results.values();

In [56]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [57]:
venues = results['response']['groups'][0]['items']
venues_neighborhood = [results['response']['groups'][0]['items'][n]['venue']['location']['formattedAddress'] for n in range(len(venues))]  
venues_distance = [results['response']['groups'][0]['items'][n]['venue']['location']['distance'] for n in range(len(venues))]  
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.id','venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

[venues_neighborhood[k][0] for k in range(len(venues_neighborhood))]

#nearby_venues['neighborhood'] = venues_neighborhood[0][0]
nearby_venues['neighborhood'] =  [venues_neighborhood[k][0] for k in range(len(venues_neighborhood))]
nearby_venues['distance [m]'] = venues_distance

df = nearby_venues
df.head()

import os

outname = 'japanesecategory.csv'

outdir = '/'
if not os.path.exists(outdir):
   os.mkdir(outdir)

fullname = os.path.join(outdir, outname)    

df.to_csv(fullname)

Using the website https://www.sebuyo.com I made a search of prices and code postal of the avaiable places to rent in Berlin

In [58]:
df_rent = pd.DataFrame({'Postcode': [10247, 10777, 10713, 10719, 12359, 12057, 10785, 12043, 13595, 12053, 10435, 10119, 10245, 13597, 12347, 10115, 10717, 13585, 12057, 16727],'Price': [2400, 1142.36, 3269, 5900, 300, 400, 3900, 10000, 0, 1600, 2500, 3000, 1095, 0, 1000, 0, 2700, 570, 400, 0]})

In [59]:
df_rent.head()

Unnamed: 0,Postcode,Price
0,10247,2400.0
1,10777,1142.36
2,10713,3269.0
3,10719,5900.0
4,12359,300.0


Using Geocoder I found the respective latitudes and longitudes using the information of the code postal. 

In [62]:
def  get_latlng(postal_code):
    latlng_coords = None
    while(latlng_coords is None):
        g = geocoder.arcgis('{}, Berlin, Berlin'.format(postal_code))
        latlng_coords = g.latlng
    return latlng_coords

In [63]:
codepost = df_rent['Postcode']
coords = [get_latlng(postal_code) 
          for postal_code 
          in codepost.tolist()]

In [64]:
df_coords = pd.DataFrame(coords, columns = ['Latitude', 'Longitude'])
df_rent['Latitude'] = df_coords['Latitude']
df_rent['Longitude'] = df_coords['Longitude']