# How similar are Indonesian Embassies based on their location ?

### Problem Description

<b>Embassy</b> is a representative of a country in other countries, their existence help to indicate a relation among countries and serve as a way to communicate or strengthen the ties. Their location follows a strict and complex requirements both from the country it comes from and the country it resides, usually in a special diplomatic compound or district. Despite all of the careful planning and requirements, the decision to establish an embassy could have come from other necessities such as certain neighborhood/district/area, near to and close from certain amneties, places that could support the embassy mission etc.  

The knowledge of how a certain embassies is similar or different could <b>help give a bigger view to Indonesian Foreign Affairs Officials</b> to understand the general environment their embassies are located. Should a certain embassies need to be treated differently, do embassies with certain criteria experience the same or different stress level of working for their staffs, do certain embassies experience certain disturbance etc.

### What Data Do We Need ?

To accomplish the analysis we will need data on Indonesian Embassies abroad and their latitude and longitude and combine it with data from FourSquare API to get much more data about the embassy's neighborhood, venues and places surrounding the embassy. In which we will cluster and compare each embassy and find similarities between them and group them together.

### Data Gathering

#### 1. List of Indonesian Embassies abroad

Apparently to get data about the list of Indonesian Embassies abroad is going to be a bit difficult, because the Ministry of Foreign Affairs website doesn't display the data easily to be analize. But with the help of google and wikipedia we find a page that we could extract the data that we need.

In [1]:
list_of_ID_embassies = 'https://id.wikipedia.org/wiki/Kedutaan_besar_Republik_Indonesia'

#### Import the libraries that we need

In [2]:
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page

#### Get the website page

In [5]:
page_list_of_ID_embassies  = requests.get(list_of_ID_embassies).text
page_list_of_ID_embassies

'<!DOCTYPE html>\n<html class="client-nojs" lang="id" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>Kedutaan besar Republik Indonesia - Wikipedia bahasa Indonesia, ensiklopedia bebas</title>\n<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":[",\\t.",".\\t,"],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","Januari","Februari","Maret","April","Mei","Juni","Juli","Agustus","September","Oktober","November","Desember"],"wgRequestId":"3a7aeeb9-cccd-4bcc-ac24-98aa49e1ba29","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Kedutaan_besar_Republik_Indonesia","wgTitle":"Kedutaan besar Republik Indonesia","wgCurRevisionId":18023174,"wgRevisionId":18023174,"wgArticleId":28406,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Artikel dengan pranala luar nonaktif","Artikel dengan prana

#### Extract the website page for table of Indonesian Embassies

In [7]:
soup = BeautifulSoup(page_list_of_ID_embassies,"html5lib")

In [9]:
table = soup.find('table')
table

<table class="wikitable sortable" style="text-align: center;">

<tbody><tr>
<th>Perwakilan
</th>
<th>Duta Besar
</th>
<th>Pelantikan
</th>
<th>Website
</th>
<th>Merangkap
</th>
<th>Daftar
</th>
<th>Ref.
</th></tr>
<tr>
<td align="left"><span class="flagicon"><img alt="" class="thumbborder" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/9/9a/Flag_of_Afghanistan.svg/23px-Flag_of_Afghanistan.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/9/9a/Flag_of_Afghanistan.svg/35px-Flag_of_Afghanistan.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/9a/Flag_of_Afghanistan.svg/45px-Flag_of_Afghanistan.svg.png 2x" width="23"/> </span><a href="/wiki/Afganistan" title="Afganistan">Afganistan</a>
</td>
<td><a href="/wiki/Arief_Rachman" title="Arief Rachman">Arief Rachman</a>
</td>
<td>13 Maret 2017
</td>
<td><a href="/wiki/Kedutaan_Besar_Republik_Indonesia_di_Kabul" title="Kedutaan Besar Republik

#### Table consists of Country , Ambassador, Inauguration, Website (Country Capital), Include, List, Ref

In [270]:
table_embassies=[] #List of ID Embassies
rows = table.findAll('tr')
for num, row in enumerate(rows, start=0):
    if num != 0 : #Skip the table header information
        cell = {}
        row_data = row.findAll('td')
        cell['Country'] = row_data[0].text
        cell['Capital'] = row_data[3].text
        table_embassies.append(cell)

#### See the top 10

In [271]:
table_embassies[:10]

[{'Country': '\xa0Afganistan\n', 'Capital': 'Kabul [1]\n'},
 {'Country': '\xa0Afrika Selatan\n', 'Capital': 'Pretoria [2]\n'},
 {'Country': '\xa0Aljazair\n', 'Capital': 'Algiers\n'},
 {'Country': '\xa0Amerika Serikat\n', 'Capital': 'Washington, D.C. [3]\n'},
 {'Country': '\xa0Arab Saudi\n', 'Capital': 'Riyadh\n'},
 {'Country': '\xa0Argentina\n', 'Capital': 'Buenos Aires [4]\n'},
 {'Country': '\xa0Australia\n', 'Capital': 'Canberra [5]\n'},
 {'Country': '\xa0Austria\n', 'Capital': 'Wina\n'},
 {'Country': '\xa0Azerbaijan\n', 'Capital': 'Baku\n'},
 {'Country': '\xa0Bahrain\n', 'Capital': 'Manama [6]\n'}]

#### Import pandas

In [272]:
import pandas as pd
pd.set_option('display.max_rows', None) #Need to see all the rows

In [273]:
df_ID_embassies=pd.DataFrame(table_embassies)

In [274]:
df_ID_embassies.head(200)

Unnamed: 0,Country,Capital
0,Afganistan\n,Kabul [1]\n
1,Afrika Selatan\n,Pretoria [2]\n
2,Aljazair\n,Algiers\n
3,Amerika Serikat\n,"Washington, D.C. [3]\n"
4,Arab Saudi\n,Riyadh\n
5,Argentina\n,Buenos Aires [4]\n
6,Australia\n,Canberra [5]\n
7,Austria\n,Wina\n
8,Azerbaijan\n,Baku\n
9,Bahrain\n,Manama [6]\n


There are 98 Indonesian Embassies according to our data

#### Clean the data to avoid problems later

1. Get rid of the '\n'
2. Get rid of the []

In [275]:
df_ID_embassies['Country'] = df_ID_embassies['Country'].replace(f'(\n)', '',regex=True)
df_ID_embassies['Capital'] = df_ID_embassies['Capital'].replace(f'(\n)', '',regex=True)
df_ID_embassies['Capital'] = df_ID_embassies['Capital'].replace(f'(\[\d+\])', '',regex=True) #inside [] with one more digits

In [276]:
df_ID_embassies.head(100)

Unnamed: 0,Country,Capital
0,Afganistan,Kabul
1,Afrika Selatan,Pretoria
2,Aljazair,Algiers
3,Amerika Serikat,"Washington, D.C."
4,Arab Saudi,Riyadh
5,Argentina,Buenos Aires
6,Australia,Canberra
7,Austria,Wina
8,Azerbaijan,Baku
9,Bahrain,Manama


3. Drop Perbara since it's location is in Jakarta Indonesia Capital not abroad 

In [277]:
df_ID_embassies = df_ID_embassies.drop(62)

4. Drop Taiwan since Indonesia doesn't have an embassy there

In [278]:
df_ID_embassies = df_ID_embassies.drop(82)

5. Change Washington, D.C to Washington

In [279]:
df_ID_embassies.loc[3,'Capital'] ='Washington'

6. Drop Kamerun since it's handle by the embassy in Nigeria

In [280]:
df_ID_embassies = df_ID_embassies.drop(34)

7. Change Kairo to Cairo

In [281]:
df_ID_embassies.loc[50,'Capital'] ='Cairo'

8. Drop Indonesia Representative for the UN

In [282]:
df_ID_embassies = df_ID_embassies.drop(60)
df_ID_embassies = df_ID_embassies.drop(61)

9. Change Bukares to Bucharest

In [283]:
df_ID_embassies.loc[68,'Capital'] ='Bucharest'

10. Change Moskwa to Moscow

In [284]:
df_ID_embassies.loc[69,'Capital'] ='Moscow'

11. Change Kolombo to Colombo

In [285]:
df_ID_embassies.loc[76,'Capital'] ='Colombo'

12. Change Damaskus to Damascus

In [286]:
df_ID_embassies.loc[76,'Capital'] ='Damascus'

13. Change Vatikan to Vatican

In [287]:
df_ID_embassies.loc[83,'Capital'] ='Vatican'

In [288]:
df_ID_embassies = df_ID_embassies.reset_index(drop=True)
df_ID_embassies

Unnamed: 0,Country,Capital
0,Afganistan,Kabul
1,Afrika Selatan,Pretoria
2,Aljazair,Algiers
3,Amerika Serikat,Washington
4,Arab Saudi,Riyadh
5,Argentina,Buenos Aires
6,Australia,Canberra
7,Austria,Wina
8,Azerbaijan,Baku
9,Bahrain,Manama


#### Save the data

In [289]:
df_ID_embassies.to_csv('ID_Embassies.csv')

#### 2. Get Indonesian Embassies address

Since there are no data on latitude and longitude of the embassies, we need to get it's address can convert it later. This is also a problem since the data is not available easily to be extracted and analize. But there are similarities in the embassy website that we could explore

https://kemlu.go.id/CAPITAL

In [290]:
capitals = df_ID_embassies['Capital'].values

In [291]:
for capital in capitals:
    formated_capital = capital.lower().strip().replace(' ','')
    url = 'https://kemlu.go.id/{}/en'.format(formated_capital)
    embassy_website  = requests.get(url).text
    soup = BeautifulSoup(embassy_website,"html5lib")
    div_address = soup.find('div',{"class": "col-12 col-md-6 text-center text-md-left"}) #The Bottom part with the embassy address
    rows = div_address.findAll('a')
    for num, row in enumerate(rows, start=0):
        if num == 0:
            print(row.text)


                    
                    
                      Malalai Watt, Shah-re-Naw, Ministry of Interior Stret, Kabul, Afghanistan
                    
                

                    
                    
                      ​Embassy of the Republic of Indonesia
949 Francis Baard Street, Hatfield, Pretoria 0082
 
Postal Address
PO Box 13155 Hatfield 0082
Pretoria – South Africa
                    
                

                    
                    
                      Embassy of the Republic of Indonesia
-
61, Avenue Souidani Boudjemaa, B.P. 62, El Mouradia, Algiers, Algeria
                    
                

                    
                    
                      
                    
                

                    
                    
                      Diplomatic Quarter, P.O. Box 94343 - Riyadh 11693

                    
                

                    
                    
                      ​Mariscal Ramon Castilla 2901