# Mission: Impossible

Le but de cette mission est de webscraper le world factbook de la CIA et de cartographier les données collectées sur un dashboard cartographique (carte intéractive + widget de présentation des données attributaires).  

Voici quelques ressources qui vous permettront de la réaliser :
- https://www.cia.gov/the-world-factbook/
- https://youtu.be/t9Ed5QyO7qY
- https://ipywidgets.readthedocs.io/
- https://public.opendatasoft.com/explore/dataset/world-administrative-boundaries/table/?sort=iso3
- https://www.cia.gov/the-world-factbook/references/country-data-codes/

Cette mission, si vous l'acceptez, se terminera le **17 décembre 2021 à 18h**. A vous de recruter votre équipe (3 personnes max) formée d'au moins un expert en programmation python. Comme d'habitude, si vous ou l'un de vos agents étiez capturé ou épuisé, l'Institut of Urban Planning and Alpine Geography nierait avoir eu connaissance de vos agissements.

### Chargement des librairies

In [87]:
from bs4 import BeautifulSoup
import csv
import geopandas
from ipyleaflet import Map
import leafmap
import pandas
import urllib.request

### Chargement du fond de carte

In [88]:
world = leafmap.Map(center = [45, 0], zoom = 2)
style = {
    "stroke": True,
    "color": "#0000ff",
    "weight": 2,
    "opacity": 1,
    "fill": True,
    "fillColor": "#0000ff",
    "fillOpacity": 0.1,
}
hover_style = {"fillOpacity": 0.7}
world.add_geojson("webscraping/world-administrative-boundaries.geojson", layer_name = "World", style = style, hover_style = hover_style)
world

Map(center=[45, 0], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_text…

### Chargement des données

In [71]:
# Chargement des codes normalisés des pays
codes = pandas.read_csv("webscraping/codes.csv")
codes.head()

Unnamed: 0,Code,Name,Category,Region
0,af,Afghanistan,Countries,South Asia
1,al,Albania,Countries,Europe
2,ag,Algeria,Countries,Africa
3,an,Andorra,Countries,Europe
4,ao,Angola,Countries,Africa


In [48]:
# url des thèmes disponibles
url_themes = "https://www.cia.gov/the-world-factbook/references/guide-to-country-comparisons/"
page = urllib.request.urlopen(url_themes)
soup = BeautifulSoup(page, 'html.parser')
theme = soup.find_all('a', attrs={'class': 'link-button bold'})
theme_links = []
for link in theme:
    theme_links.append('https://www.cia.gov' + link.get('href'))

In [59]:
theme_links

['https://www.cia.gov/the-world-factbook/field/area/country-comparison',
 'https://www.cia.gov/the-world-factbook/field/population/country-comparison',
 'https://www.cia.gov/the-world-factbook/field/median-age/country-comparison',
 'https://www.cia.gov/the-world-factbook/field/population-growth-rate/country-comparison',
 'https://www.cia.gov/the-world-factbook/field/birth-rate/country-comparison',
 'https://www.cia.gov/the-world-factbook/field/death-rate/country-comparison',
 'https://www.cia.gov/the-world-factbook/field/net-migration-rate/country-comparison',
 'https://www.cia.gov/the-world-factbook/field/maternal-mortality-ratio/country-comparison',
 'https://www.cia.gov/the-world-factbook/field/infant-mortality-rate/country-comparison',
 'https://www.cia.gov/the-world-factbook/field/life-expectancy-at-birth/country-comparison',
 'https://www.cia.gov/the-world-factbook/field/total-fertility-rate/country-comparison',
 'https://www.cia.gov/the-world-factbook/field/hiv-aids-adult-preval

In [56]:
# Pour chaque lien, on applique la procédure vue au TP8
for link in theme_links:
    page = urllib.request.urlopen(link)
    soup = BeautifulSoup(page, 'html.parser')
    table = soup.find('table', attrs={'class': 'content-table table-auto'})
    results = table.find_all('tr')
    # Retrouve le thème
    theme = link.split('/')[5].replace('-','_')
    rows = [] 
    rows.append(['rank', 'country', theme, 'date_of_information'])
    for result in results :
        data = result.find_all('td')
        if len(data) > 0 :
            rank = data[0].getText()
            country = data[1].getText()
            theme_data = data[2].getText().replace(',','') #  remove the decimal separator
            doi = data[3].getText()
            rows.append([rank, country, theme_data, doi])
            
    # Create csv and write rows to output file
    with open('webscraping/' + theme + '.csv','w', newline='') as f_output:
        csv_output = csv.writer(f_output)
        csv_output.writerows(rows)

In [84]:
# Jointure des fichiers csv sur un nombre limité de thèmes
selected_themes = ['area', 'population', 'median_age']
j = codes
for s in selected_themes:
    df = pandas.read_csv("webscraping/" + s + ".csv")
    # jointure
    j = j.merge(df, how = 'inner', left_on = 'Name', right_on = 'country')
# projection
j = j[['Code', 'Name', 'Region'] + selected_themes]
j.head()

Unnamed: 0,Code,Name,Region,area,population,median_age
0,af,Afghanistan,South Asia,652230,37466414,19.5
1,al,Albania,Europe,28748,3088385,34.3
2,ag,Algeria,Africa,2381740,43576691,28.9
3,an,Andorra,Europe,468,85645,46.2
4,ao,Angola,Africa,1246700,33642646,15.9


## Dashboard interactif