# <font color=red>Battle of Neighborhoods _(Week 1)_</font>

## Description of the Problem and Discussion of the Background (Introduction Section)

### Prospects of a Lunch Restaurant, Close to Office Areas in Tokyo, Japan.  

Tokyo, where I am currently staying, is the most populous metroplitan area in the world. Currently ranked 3rd in the global economic power index, Tokyo is definitely one of the best places to start up a new business. 

During the daytime, specially in the morning and lunch hours, office areas provide huge opportunitiues for restaurants. Reasonably priced (one lunch meal $ 8\text{\$} $) shops are usually always fill during the lunch hours (11 am -- 2 pm) and given this scenario, we will go through the benifits and pitfalls of opening a breakfast cum lunch restaurants in highly densed office places. Usually the profit margin for a decent restaurant lie within $15 - 20\% $ range but, it can even go high enough to $35\%$, as discussed [here](https://www.ichefpos.com/en-sg/blog/japanese-restaurants-profits). 

![Tokyo at Night](tokyo_night.jpg)



We will go through each step of this project and address them separately. For this week I just describe the intial data preparation and future steps to start the battle of neighborhoods in Tokyo. 

1. <font color=green> Obtain the Data </font> <br>

    1.a. Name of the 23 Wards, area and population from web scrapping <br>

    1.b. Obtain information about best business districts. <br>
    
    1.c. Use Foresquare Data to obtain info about restaurants. <br> 


2. <font color=green> Data Visualization and Some Simple Statistical Analysis. </font> 

3. <font color=green> Analysis Using Clustering, Specially K-Means Clustering. </font> <br>

    3.a. Maximize the number of clusters. <br>
    
    3.b. Visualization using Chloropleth Map <br>
    

4. <font color=green> Compare the Neighborhoods to Find the Best Place for Starting up a Restaurant. </font>   


5. <font color=green> Inference From these Results and related Conclusions. </font> <br>


<font color=orange>__Target Audience__</font>

1. Business personnel who wants to invest or open a restaurant. This analysis will be a comprehensive guide to start ot expand restaurants targeting the large pool of office workers in Tokyo during lunch hours. 
2. Freelancer who loves to have their own restaurant as a side business. This analysis will give an idea, how benificial it is to open a restaurant and what are the pros and cons of this business. 
3. New graduates, to find reasonable lunch/breakfast place close to office. 
4. Budding Data Scientists, who wants to implement some of the most used Exploratory Data Analysis techniques to obtain <br>
   necessary data, analyze it and, finally be able to tell a story out of it. 

## Preparation for Data (Data Section)

### We start of with Importing Libraries 

In [2]:
import requests
import json

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

### First, Get The Names of Wards, Major Districts and Population from Wikipedia

In [3]:
response_obj = requests.get('https://en.wikipedia.org/wiki/Special_wards_of_Tokyo').text
print (type (response_obj))

<class 'str'>


In [4]:
soup = BeautifulSoup(response_obj,'lxml')
print (soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Special wards of Tokyo - Wikipedia
  </title>
  <script>
   document.documentElement.className = document.documentElement.className.replace( /(^|\s)client-nojs(\s|$)/, "$1client-js$2" );
  </script>
  <script>
   (window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"Special_wards_of_Tokyo","wgTitle":"Special wards of Tokyo","wgCurRevisionId":880357442,"wgRevisionId":880357442,"wgArticleId":296875,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["All articles with dead external links","Articles with dead external links from January 2018","Articles with permanently dead external links","Webarchive template wayback links","CS1 Japanese-language sources (ja)","Articles with short description","Articles containing Japan

In [5]:
Wards_Tokyo_Table = soup.find('table', {'class':'wikitable sortable'})
Wards_Tokyo_Table

<table class="wikitable sortable">
<tbody><tr>
<th>No.
</th>
<th class="unsortable">Flag
</th>
<th>Name
</th>
<th class="unsortable" width="55px">Kanji
</th>
<th>Population<br/>(as of October 2016<sup class="plainlinks noexcerpt noprint asof-tag update" style="display:none;"><a class="external text" href="//en.wikipedia.org/w/index.php?title=Special_wards_of_Tokyo&amp;action=edit">[update]</a></sup>)
</th>
<th>Density<br/><span style="font-size:90%;">(/km<sup>2</sup>)</span>
</th>
<th>Area<br/><span style="font-size:90%;">(km<sup>2</sup>)</span>
</th>
<th class="unsortable">Major districts
</th></tr>
<tr>
<td>01</td>
<td><a class="image" href="/wiki/File:Flag_of_Chiyoda,_Tokyo.svg"><img alt="Flag of Chiyoda, Tokyo.svg" class="thumbborder" data-file-height="540" data-file-width="810" decoding="async" height="33" src="//upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Flag_of_Chiyoda%2C_Tokyo.svg/50px-Flag_of_Chiyoda%2C_Tokyo.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thu

### Processing the Information From Wiki To Make Necessary Lists

In [6]:
Name=[]
Kanji = []
Pop = []
Density = []
num = []
flag = []
Area = []
Major_District = []

for row in Wards_Tokyo_Table.findAll("tr"):
    #print (row)    
    Ward = row.findAll('td')
    #print (len(Ward))
    print (Ward)
    if len(Ward)==8: #Only extract table body not heading
        print (Ward[0])
        #print postcode[1]
        #print postcode[2]
        num.append(Ward[0].find(text=True))
        flag.append(Ward[1].findAll('a')) # useless
        Name.append(Ward[2])
        Kanji.append(Ward[3].find(text=True))
        Pop.append(Ward[4])
        Area.append(Ward[5].find(text=True))
        Major_District.append(Ward[7].find(text=True))
            
#print (Pop) 


#++++++++++++++++++++++++++++++++++++++++++++++
#+ Area 
#++++++++++++++++++++++++++++++++++++++++++++++

# print (Area[3]) # the first element of the area needs a replacement with the true value 5100
Area = ['5100' if x=='0' else x for x in Area]
New_Area = []

# change the type of Area list 
for l in range(len(Area)):
    x=Area[l].replace(",","")
    print (x)
    New_Area.append(x)

New_Area=[int(s) for s in New_Area]

#print (New_Area) # the list elements are already in accordance with the table

#+++++++++++++++++++++++++++++++++++++++++++++++++++++
#+ Name of the Wards
#+++++++++++++++++++++++++++++++++++++++++++++++++++++
#print (Name) # want to select only the title part

new_names = []
for n in range(len(Name)):
    print (Name[n])
    names = Name[n].findAll('a')
    new_names.append(names) 

print (new_names)

flat_new_names_list = [item for sublist in new_names for item in sublist]
print (flat_new_names_list)

Wards_names= []
#now 
for name_wards in flat_new_names_list:
        Wards_names.append(name_wards.get('title'))

print (Wards_names)

# replace the elements in the list that contains 'Tokyo' with only the ward names
replace_names={'Chiyoda, Tokyo':'Chiyoda', 'Chūō, Tokyo':'Chuo', 'Minato, Tokyo':'Minato', 
               'Sumida, Tokyo':'Sumida', 'Koto, Tokyo':'Koto', 'Ōta, Tokyo':'Ota', 'Nakano, Tokyo':'Nakano', 
               'Kita, Tokyo':'Kita', 'Arakawa, Tokyo':'Arakawa', 'Adachi, Tokyo':'Adachi', 'Edogawa, Tokyo':'Edogawa'}


Wards_names1 = [replace_names.get(n1,n1) for n1 in Wards_names]

#print (Wards_names1)

#+++++++++++++++++++++++++++++++++++++++++++++++++++++
#+ Population
#+++++++++++++++++++++++++++++++++++++++++++++++++++++
# print (len(Pop))
# #print ((Pop[5].text))
population = []
for p in range(len(Pop)):
    print ((Pop[p]))
    pops = Pop[p].text[1:9]
    print (Pop[p].text[1:9])
    #populs = Pop[p].find('visibility:hidden;color:transparent;')
    population.append(pops) 
print (population)


New_population = []
for po in range(len(population)):
    xy=population[po].replace(",","")
    print (xy)
    New_population.append(xy)

New_population=[int(s1) for s1 in New_population]
# print (New_population)



#++++++++++++++++++++++++++++++++++++++++++++++++
#+ Major Districts
#++++++++++++++++++++++++++++++++++++++++++++++++

#print (Major_District)

replace_districts = {'Nagatachō':'Nagatacho', 'Hongō':'Hongo', 'Kinshichō':'Kinshicho', 'Ōmori': 'Omori', 
                     'Kōenji':'Koenji', 'Arakawa, Machiya, ':'Arakawa', 'Ayase, ':'Ayase', 'Kasai, Koiwa\n':'Kasai'}


Major_District_names1 = [replace_districts.get(n2,n2) for n2 in Major_District]
#print (Major_District_names1)


[]
[<td>01</td>, <td><a class="image" href="/wiki/File:Flag_of_Chiyoda,_Tokyo.svg"><img alt="Flag of Chiyoda, Tokyo.svg" class="thumbborder" data-file-height="540" data-file-width="810" decoding="async" height="33" src="//upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Flag_of_Chiyoda%2C_Tokyo.svg/50px-Flag_of_Chiyoda%2C_Tokyo.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Flag_of_Chiyoda%2C_Tokyo.svg/75px-Flag_of_Chiyoda%2C_Tokyo.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Flag_of_Chiyoda%2C_Tokyo.svg/100px-Flag_of_Chiyoda%2C_Tokyo.svg.png 2x" width="50"/></a></td>, <td><a href="/wiki/Chiyoda,_Tokyo" title="Chiyoda, Tokyo">Chiyoda</a></td>, <td>千代田区
</td>, <td><span style="visibility:hidden;color:transparent;">0</span><span style="visibility:hidden;color:transparent;">0</span>59,441</td>, <td><span style="visibility:hidden;color:transparent;">0</span>5,100</td>, <td><span style="visibility:hidden;color:transparent;">0</span>11.66
</td>, <t

#### Let's Make the Tokyo Ward and Population DataFrame 

In [7]:
df=pd.DataFrame(Wards_names1,columns=['Ward'])
df['Area_SqKm'] = New_Area
df['Population'] = New_population
df['Major_District'] = Major_District_names1
df.index = np.arange(1, len(df) + 1) # reset the index so that it starts from 1. 
#print (df)
df

Unnamed: 0,Ward,Area_SqKm,Population,Major_District
1,Chiyoda,5100,59441,Nagatacho
2,Chuo,14460,147620,Nihonbashi
3,Minato,12180,248071,Odaiba
4,Shinjuku,18620,339211,Shinjuku
5,Bunkyō,19790,223389,Hongo
6,Taitō,19830,200486,Ueno
7,Sumida,18910,260358,Kinshicho
8,Koto,12510,502579,Kiba
9,Shinagawa,17180,392492,Shinagawa
10,Meguro,19110,280283,Meguro


### Get the Coordinates of the Major Districts 

In [8]:
from geopy.geocoders import Nominatim
geolocator = Nominatim()
df['Major_Dist_Coord']= df['Major_District'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))

df

  


Unnamed: 0,Ward,Area_SqKm,Population,Major_District,Major_Dist_Coord
1,Chiyoda,5100,59441,Nagatacho,"(35.675618, 139.7434685)"
2,Chuo,14460,147620,Nihonbashi,"(35.684058, 139.774501377979)"
3,Minato,12180,248071,Odaiba,"(35.61912805, 139.779403349221)"
4,Shinjuku,18620,339211,Shinjuku,"(35.6937632, 139.7036319)"
5,Bunkyō,19790,223389,Hongo,"(32.5093796, -116.2970014)"
6,Taitō,19830,200486,Ueno,"(35.7117877, 139.7760958)"
7,Sumida,18910,260358,Kinshicho,"(35.6967524, 139.8141509)"
8,Koto,12510,502579,Kiba,"(23.0131338, -80.8328748)"
9,Shinagawa,17180,392492,Shinagawa,"(35.599252, 139.73891)"
10,Meguro,19110,280283,Meguro,"(35.62125, 139.688014)"


In [9]:
df[['Latitude', 'Longitude']] = df['Major_Dist_Coord'].apply(pd.Series)
df

Unnamed: 0,Ward,Area_SqKm,Population,Major_District,Major_Dist_Coord,Latitude,Longitude
1,Chiyoda,5100,59441,Nagatacho,"(35.675618, 139.7434685)",35.675618,139.743469
2,Chuo,14460,147620,Nihonbashi,"(35.684058, 139.774501377979)",35.684058,139.774501
3,Minato,12180,248071,Odaiba,"(35.61912805, 139.779403349221)",35.619128,139.779403
4,Shinjuku,18620,339211,Shinjuku,"(35.6937632, 139.7036319)",35.693763,139.703632
5,Bunkyō,19790,223389,Hongo,"(32.5093796, -116.2970014)",32.50938,-116.297001
6,Taitō,19830,200486,Ueno,"(35.7117877, 139.7760958)",35.711788,139.776096
7,Sumida,18910,260358,Kinshicho,"(35.6967524, 139.8141509)",35.696752,139.814151
8,Koto,12510,502579,Kiba,"(23.0131338, -80.8328748)",23.013134,-80.832875
9,Shinagawa,17180,392492,Shinagawa,"(35.599252, 139.73891)",35.599252,139.73891
10,Meguro,19110,280283,Meguro,"(35.62125, 139.688014)",35.62125,139.688014


In [10]:
df.drop(['Major_Dist_Coord'], axis=1, inplace=True)
df

Unnamed: 0,Ward,Area_SqKm,Population,Major_District,Latitude,Longitude
1,Chiyoda,5100,59441,Nagatacho,35.675618,139.743469
2,Chuo,14460,147620,Nihonbashi,35.684058,139.774501
3,Minato,12180,248071,Odaiba,35.619128,139.779403
4,Shinjuku,18620,339211,Shinjuku,35.693763,139.703632
5,Bunkyō,19790,223389,Hongo,32.50938,-116.297001
6,Taitō,19830,200486,Ueno,35.711788,139.776096
7,Sumida,18910,260358,Kinshicho,35.696752,139.814151
8,Koto,12510,502579,Kiba,23.013134,-80.832875
9,Shinagawa,17180,392492,Shinagawa,35.599252,139.73891
10,Meguro,19110,280283,Meguro,35.62125,139.688014


#### We have the Dataframe with Coordinates 

#### But here we see problem with coordinates for some places like Hongo, Kiba, Omori, Kasai. So we need to replace them manually 

Google search gives the values 

Hongo -- 35.7088° N, 139.7601° E <br>
Kiba  -- 35.6722° N, 139.8061° E <br>
Omori -- 35.5884° N, 139.7279° E <br>
Kasai -- 35.6634° N, 139.8731° E <br>

In [17]:
#df.dtypes
Lat_list = df['Latitude'].tolist()
Long_list = df['Longitude'].tolist()
print ("Old latitude list: ", Lat_list)
print ("Old Longitude list: ", Long_list)
replace_latitudes = {32.5093796:35.7088, 23.0131338:35.6722, -38.9047057:35.5884, -5.3498001:35.6634}
replace_longitudes = {-116.2970014:139.7601, -80.8328748:139.8061, 175.7552111:139.7279, 21.424098:139.8731}

latitudes_new = [replace_latitudes.get(n3,n3) for n3 in Lat_list]
longtitudes_new = [replace_longitudes.get(n4,n4) for n4 in Long_list]
print (latitudes_new)
print (longtitudes_new)

Tokyo_df = df.drop(['Latitude', 'Longitude'], axis=1)
# #df.drop(['Longitude'], axis=1, inplace=True)
# Tokyo_df

Old latitude list:  [35.675618, 35.684058, 35.61912805, 35.6937632, 32.5093796, 35.7117877, 35.6967524, 23.0131338, 35.599252, 35.62125, -38.9047057, 35.646096, 35.6645956, 35.718123, 35.7049419, 35.7301027, 35.7781394, 35.737529, 35.774143, 35.74836, 35.4463689, 34.1763346, -5.3498001]
Old Longitude list:  [139.7434685, 139.774501377979, 139.779403349221, 139.7036319, -116.2970014, 139.7760958, 139.8141509, -80.8328748, 139.73891, 139.688014, 175.7552111, 139.65627, 139.6987107, 139.664468, 139.649909, 139.7118843, 139.7207999, 139.78131, 139.681209, 139.638735, 139.4309254, 132.2260196, 21.424098]
[35.675618, 35.684058, 35.61912805, 35.6937632, 35.7088, 35.7117877, 35.6967524, 35.6722, 35.599252, 35.62125, 35.5884, 35.646096, 35.6645956, 35.718123, 35.7049419, 35.7301027, 35.7781394, 35.737529, 35.774143, 35.74836, 35.4463689, 34.1763346, 35.6634]
[139.7434685, 139.774501377979, 139.779403349221, 139.7036319, 139.7601, 139.7760958, 139.8141509, 139.8061, 139.73891, 139.688014, 139.72

### Final Data-Frame with Coordinates of the Major District

In [18]:
Tokyo_df['Dist_Latitude'] = latitudes_new
Tokyo_df['Dist_Longitude'] = longtitudes_new

#Tokyo_df

#Tokyo_df.to_csv('Tokyo_df_Coord.csv', sep='\t', encoding='utf-8')



Tokyo_df

Unnamed: 0,Ward,Area_SqKm,Population,Major_District,Dist_Latitude,Dist_Longitude
1,Chiyoda,5100,59441,Nagatacho,35.675618,139.743469
2,Chuo,14460,147620,Nihonbashi,35.684058,139.774501
3,Minato,12180,248071,Odaiba,35.619128,139.779403
4,Shinjuku,18620,339211,Shinjuku,35.693763,139.703632
5,Bunkyō,19790,223389,Hongo,35.7088,139.7601
6,Taitō,19830,200486,Ueno,35.711788,139.776096
7,Sumida,18910,260358,Kinshicho,35.696752,139.814151
8,Koto,12510,502579,Kiba,35.6722,139.8061
9,Shinagawa,17180,392492,Shinagawa,35.599252,139.73891
10,Meguro,19110,280283,Meguro,35.62125,139.688014


## Conclusion 
### 1st Week: Description of Problem and Data Preparation

We get the Initial Data-Frame with Names of Major Wards, and corresponding districts in those Major Wards <br>
and the coordinates of those major districts. Before comparing all the wards, since we want to concentrate only on lunch restaurants targeting the office workers, we need to get the idea about the [best business areas in Tokyo](https://www.realestate-tokyo.com/office/tokyo-business-districts/). Here we want to concentrate on the best five wards 

1. Chiyoda. Major District: _Nagatacho_<br>
2. Shinjuku. Major District: _Shinjuku_<br>
3. Shibuya. Major District: _Shibuya_<br>
4. Chuo. Major District: _Nihombashi_<br>
5. Shinagawa. Major District: _Shinagawa_<br>


So as the next step we will use [Foursquare](https://developer.foursquare.com/) data and obtain information on restaurants.  With these, we can start with our battle of neighborhoods for opening a restaurant in Tokyo.  