# XML example and exercise
****
+ study examples of accessing nodes in XML tree structure  
+ work on exercise to be completed and submitted
****
+ reference: https://docs.python.org/2.7/library/xml.etree.elementtree.html
+ data source: http://www.dbis.informatik.uni-goettingen.de/Mondial
****

In [41]:
from xml.etree import ElementTree as ET
import pandas as pd

## XML example

+ for details about tree traversal and iterators, see https://docs.python.org/2.7/library/xml.etree.elementtree.html

In [13]:
document_tree = ET.parse( './data/mondial_database_less.xml' )

In [14]:
# print names of all countries
for child in document_tree.getroot():
    print (child.find('name').text)

Albania
Greece
Macedonia
Serbia
Montenegro
Kosovo
Andorra


In [12]:
# print names of all countries and their cities
for element in document_tree.iterfind('country'):
    #print ('* ' + element.find('name').text + ':',    capitals_string = '')
    for subelement in element.getiterator('city'):
        capitals_string += subelement.find('name').text + ', '
    print (capitals_string[:-2])

NameError: name 'capitals_string' is not defined

****
## XML exercise

Using data in 'data/mondial_database.xml', the examples above, and refering to https://docs.python.org/2.7/library/xml.etree.elementtree.html, find

1. 10 countries with the lowest infant mortality rates
2. 10 cities with the largest population
3. 10 ethnic groups with the largest overall populations (sum of best/latest estimates over all countries)
4. name and country of a) longest river, b) largest lake and c) airport at highest elevation

Exercise 1 - 10 countries with the lowest infant mortality rates

In [15]:
#import the dataset
document = ET.parse( './data/mondial_database.xml' )
root = document.getroot()

In [92]:
#create dictionary mortrate and fill it with country name for the keys and infant mortability rate as values
mortrate = {}
for country in document.iterfind('country'):
        counname = country.find('name').text
        try:
            counmort = country.find('infant_mortality').text
        except:
            counmort = None
        if counname not in mortrate.keys():
            mortrate[counname] = counmort
#take the dictionary and create a dataframe from it
mortrate = pd.DataFrame.from_dict(mortrate, orient='index')
#add numerical index
mortrate= mortrate.reset_index(drop=False)
#name the columns in the dataset
mortrate.columns = ['Country', 'Mortality']
#specify that mortality rates are to be treated as floats
mortrate.Mortality = mortrate.Mortality.astype(float)
#review the head of the dataset
mortrate.head(10)

Unnamed: 0,Country,Mortality
0,Albania,13.19
1,Greece,4.78
2,Macedonia,7.9
3,Serbia,6.16
4,Montenegro,
5,Kosovo,
6,Andorra,3.69
7,France,3.31
8,Spain,3.33
9,Austria,4.16


In [72]:
mortrate.sort_values('Mortality', ascending = False).head(10)

Unnamed: 0,Country,Mortality
194,Western Sahara,145.82
54,Afghanistan,117.23
189,Mali,104.34
226,Somalia,100.14
213,Central African Republic,92.86
230,Guinea-Bissau,90.92
214,Chad,90.3
192,Niger,86.27
195,Angola,79.99
201,Burkina Faso,76.8


Exercise 2 - 10 cities with the largest population

In [109]:
mortrate = {}
for country in document.iterfind('country'):
        try:
            counname = country.find('./city/name').text
        
            print(counname)
        except:
            print('nothing')
            
        try:
            populate = []
            populate.append(country.findall('./city/population').text)
            print(populate)
        except:
            print('NA')


Tirana
NA
nothing
NA
Skopje
NA
Beograd
NA
Podgorica
NA
Prishtine
NA
Andorra la Vella
NA
nothing
NA
nothing
NA
nothing
NA
nothing
NA
nothing
NA
nothing
NA
nothing
NA
Vaduz
NA
nothing
NA
Ljubljana
NA
nothing
NA
nothing
NA
Rīga
NA
Vilnius
NA
nothing
NA
nothing
NA
nothing
NA
nothing
NA
Luxembourg
NA
nothing
NA
nothing
NA
Zagreb
NA
Sofia
NA
nothing
NA
nothing
NA
nothing
NA
Tallinn
NA
Tórshavn
NA
nothing
NA
nothing
NA
nothing
NA
Monaco
NA
Gibraltar
NA
Saint Peter Port
NA
Vatican City
NA
Ceuta
NA
Melilla
NA
Reykjavik
NA
Dublin
NA
San Marino
NA
Saint Helier
NA
Valletta
NA
Douglas
NA
Chişinău
NA
nothing
NA
Longyearbyen
NA
nothing
NA
Kabul
NA
nothing
NA
nothing
NA
nothing
NA
nothing
NA
nothing
NA
nothing
NA
Yerevan
NA
Tbilisi
NA
Baku
NA
Al Manāmah
NA
nothing
NA
nothing
NA
nothing
NA
Thimphu
NA
Bandar Seri Begawan
NA
nothing
NA
Vientiane
NA
Chiang Mai
NA
Phnom Penh
NA
Hanoi
NA
nothing
NA
Pyongyang
NA
Bishkek
NA
Hong Kong
NA
Macao
NA
Ulaanbaatar
NA
Kathmandu
NA
Flying Fish Cove
NA
West Island
NA
L