### Why select cities?
The next step (code file #2) will undertake webscraping to extract population and geographical data on selected countries. While it suffices to pick cities of our choice, I will take an automatized approach to city selection. The following two steps are undertaken:
1. I select a list of countries that Gans is present.
2. Then I use Ninja api to pick large cities in these countries with population more than 1,000,000. 

#### Import packages

In [1]:
import pandas as pd
import requests

# Get the api key
from keys import Ninja_API_Key

#### Select countries

In [2]:
countries=["DE", "FR", "IT", "PL", "PT", "ES", "GB", "CH", "NL", "BE"]

#### Pick up the most populated cities from selected countries

In [3]:
# Define the sections that will together form the url.
url = "https://api.api-ninjas.com/v1/city"
header={"X-Api-Key": Ninja_API_Key}

# Define the sections that will together form the url.
cities_json={}
for i in countries:
    # Reference the sections in the request.
    cities = requests.request("GET", url, headers=header, params={"country": i, "min_population": "1000000", "limit": "5"})
    city_json = cities.json()
    cities_json[i]=city_json
    cities_json

print(cities_json.keys())
print(cities_json)
print(cities_json.values())

dict_keys(['DE', 'FR', 'IT', 'PL', 'PT', 'ES', 'GB', 'CH', 'NL', 'BE'])
{'DE': [{'name': 'Berlin', 'latitude': 52.5167, 'longitude': 13.3833, 'country': 'DE', 'population': 3644826, 'is_capital': True}, {'name': 'Hamburg', 'latitude': 53.55, 'longitude': 10.0, 'country': 'DE', 'population': 1841179, 'is_capital': False}, {'name': 'Munich', 'latitude': 48.1372, 'longitude': 11.5755, 'country': 'DE', 'population': 1471508, 'is_capital': False}, {'name': 'Cologne', 'latitude': 50.9422, 'longitude': 6.9578, 'country': 'DE', 'population': 1085664, 'is_capital': False}], 'FR': [{'name': 'Paris', 'latitude': 48.8566, 'longitude': 2.3522, 'country': 'FR', 'population': 11020000, 'is_capital': True}, {'name': 'Nice', 'latitude': 43.7034, 'longitude': 7.2663, 'country': 'FR', 'population': 1006402, 'is_capital': False}], 'IT': [{'name': 'Rome', 'latitude': 41.8931, 'longitude': 12.4828, 'country': 'IT', 'population': 2872800, 'is_capital': True}, {'name': 'Milan', 'latitude': 45.4669, 'longitude

#### Pick up columns of interest

In [4]:
cities_df = pd.concat([pd.DataFrame(cities, columns=['name', 'country', 'latitude', 'longitude', 'is_capital']) for cities in cities_json.values()], ignore_index=True)
cities_df

Unnamed: 0,name,country,latitude,longitude,is_capital
0,Berlin,DE,52.5167,13.3833,True
1,Hamburg,DE,53.55,10.0,False
2,Munich,DE,48.1372,11.5755,False
3,Cologne,DE,50.9422,6.9578,False
4,Paris,FR,48.8566,2.3522,True
5,Nice,FR,43.7034,7.2663,False
6,Rome,IT,41.8931,12.4828,True
7,Milan,IT,45.4669,9.19,False
8,Warsaw,PL,52.2167,21.0333,True
9,Barcelona,ES,41.3825,2.1769,False


#### Rename columns

In [5]:
cities_df_cleaned = cities_df.rename(columns={"country": "country_2c", "name": "City"})
cities_df_cleaned

Unnamed: 0,City,country_2c,latitude,longitude,is_capital
0,Berlin,DE,52.5167,13.3833,True
1,Hamburg,DE,53.55,10.0,False
2,Munich,DE,48.1372,11.5755,False
3,Cologne,DE,50.9422,6.9578,False
4,Paris,FR,48.8566,2.3522,True
5,Nice,FR,43.7034,7.2663,False
6,Rome,IT,41.8931,12.4828,True
7,Milan,IT,45.4669,9.19,False
8,Warsaw,PL,52.2167,21.0333,True
9,Barcelona,ES,41.3825,2.1769,False


#### Save the df as a csv file

In [6]:
cities_df_cleaned.to_csv("data/cities_df_cleaned.csv", sep=',', index=False, encoding='utf-8')