### Sister Cities

#### 1. Introduction

Sister cities are cities that establish a bond of cooperation on many factors like culture, health, education, transport, and economic development. Often the cities are located in different countries, developing a paradiplomacy relation, a relationship that does not depend on federal governments (which is what designates diplomacy). Typically, to become sisters two cities need to have similar features like number of habitants, historical facts or economic sector.  
  
After the recognition, the two mayorships raise several protocols for exchanging experiences such as project investment and exchange of students or entrepreneurs. The goal of this project is to verify how similar a specific city is to its sister cities based on their top venues. Thus, in the case of a habitant of that city desires to move or visit the sister cities he/she can choose that most similar to his/her hometown.  
  
Moving to another country is a task that requires deep research about the target city and typically takes as consideration different factors such as cultural life, attractions, language, climate, jog market, etc. Even after detailed research, many expats decide to return to their home country due to difficulties adapting to the new city. Moving to a sister city can be easier, since they share some features and have political facilitators. This solution aims to provide an extra tool to help in the decision of moving abroad: a ranked list of the sister cities of a specific city having as criteria the similarity of their top venues.

#### 2. Data

The solution will take the Brasilian city of Recife as the hometown city. To obtain the Recife sister cities, Wikipedia will be consulted. Any city page in Wikipedia presents a section called “sister sisters”, however it can vary depending on the language chosen. Naturally, there is more information in the language of the country the city belongs. In the case of Recife, Wikipedia in English lists only three sister cities while the Portuguese version shows eight. Thus, this solution will scrape the Recife page in Portuguese Wikipedia to get Recife sister cities.

![Wikipedia Sister Cities](https://pbs.twimg.com/media/Cpc08EyXgAAhfuJ.jpg)

To obtain the top venues of a city, Foursquare will be used. Foursquare is a local search-and-discovery mobile app that provides personalized recommendations of places to go near a user's current location based on users' previous browsing history and check-in history [].  They actually crowd-sourced their data and had people use their app to build their dataset and add venues and complete any missing information they had in their dataset []. The app collects information about all sorts of venues: restaurants, bars, cafes, museums, art galleries, parks, clubs, universities, schools, markets, services like laundry, etc. Each venue has a page with, among other information, a rate (from 0 to 10), description, photos and user tips. 

![Foursquare](https://mspoweruser.com/wp-content/uploads/2015/02/image_thumb1.png)

Foursquare provides an API that allows application developers to interact with the Foursquare platform and to retrieve, among others contents, all sorts of information about the venues near a location. The request are made to an URL that look like this:

That url requests the top 100 venues in the radius of 200m of the provided location (latitude and longitude). The result is a json file that can be transformed into a pandas DataFrame and used for further data analysis. Finally, to obtain Recife’s latitude and longitude it will be used geopy library. The following example gets Recife location and use it to retrieve the top 5 venues in a radius of 200m.

In [5]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim #convert an address into latitude and longitude values

address = 'Recife, BR'

geolocator = Nominatim(user_agent="rec_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Recife are {}, {}.'.format(latitude, longitude))

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###############################

In [7]:
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

CLIENT_ID = 'TJNIF0UICB1EMN1LETJN0YNM5DARNO3JKGK4ZJ41RT4QRT5Q' # Foursquare ID
CLIENT_SECRET = 'RGNFFRIQEWPAQLRLTDOXQVEIISJQWRFBOTSOJWLHDBAVHVWS' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
radius=200
limit=5

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, limit)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ed6f71f0f5968002500edbe'},
 'response': {'venues': [{'id': '51a5ee27498e66c4f282fce1',
    'name': 'Praça do Diário',
    'location': {'lat': -8.063966329105192,
     'lng': -34.87832044217551,
     'labeledLatLngs': [{'label': 'display',
       'lat': -8.063966329105192,
       'lng': -34.87832044217551}],
     'distance': 31,
     'cc': 'BR',
     'city': 'Recife',
     'state': 'PE',
     'country': 'Brasil',
     'formattedAddress': ['Recife, PE', 'Brasil']},
    'categories': [{'id': '4bf58dd8d48988d164941735',
      'name': 'Plaza',
      'pluralName': 'Plazas',
      'shortName': 'Plaza',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/plaza_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1591146491',
    'hasPerk': False},
   {'id': '4f5e0578e4b070e0493faacd',
    'name': 'Praça da Independência',
    'location': {'address': 'Av. Guararapes, 107',
     'lat': -8.06423106857362,
    