# XML example and exercise
****
+ study examples of accessing nodes in XML tree structure  
+ work on exercise to be completed and submitted
****
+ reference: https://docs.python.org/2.7/library/xml.etree.elementtree.html
+ data source: http://www.dbis.informatik.uni-goettingen.de/Mondial
****

In [30]:
from xml.etree import ElementTree as ET
import operator

## XML example

+ for details about tree traversal and iterators, see https://docs.python.org/2.7/library/xml.etree.elementtree.html

In [4]:
document_tree = ET.parse( './data/mondial_database_less.xml' )

In [4]:
# print names of all countries
for child in document_tree.getroot():
    print child.find('name').text

Albania
Greece
Macedonia
Serbia
Montenegro
Kosovo
Andorra


In [5]:
# print names of all countries and their cities
for element in document_tree.iterfind('country'):
    print '* ' + element.find('name').text + ':',
    capitals_string = ''
    for subelement in element.getiterator('city'):
        capitals_string += subelement.find('name').text + ', '
    print capitals_string[:-2]

* Albania: Tirana, Shkodër, Durrës, Vlorë, Elbasan, Korçë
* Greece: Komotini, Kavala, Athina, Peiraias, Peristeri, Acharnes, Patra, Kozani, Kerkyra, Ioannina, Thessaloniki, Iraklio, Chania, Ermoupoli, Rhodes, Tripoli, Lamia, Chalkida, Larissa, Volos, Mytilini, Karyes
* Macedonia: Skopje, Kumanovo
* Serbia: Beograd, Novi Sad, Niš
* Montenegro: Podgorica
* Kosovo: Prishtine
* Andorra: Andorra la Vella


****
## XML exercise

Using data in 'data/mondial_database.xml', the examples above, and refering to https://docs.python.org/2.7/library/xml.etree.elementtree.html, find

1. 10 countries with the lowest infant mortality rates
2. 10 cities with the largest population
3. 10 ethnic groups with the largest overall populations (sum of best/latest estimates over all countries)
4. name and country of a) longest river, b) largest lake and c) airport at highest elevation

In [25]:
document = ET.parse( './data/mondial_database.xml' )

In [55]:
#1. 10 countries with the lowest infant mortality rates
list = []
for element in document_tree.iterfind('country'):
    if element.find('infant_mortality') is None:
        pass
    else:
        country = element.find('name').text 
        rate = float(element.find('infant_mortality').text)
        list.append([country,rate])
print(sorted(list,key=operator.itemgetter(1),reverse=True)[0:9])

[['Albania', 13.19], ['Macedonia', 7.9], ['Serbia', 6.16], ['Greece', 4.78], ['Andorra', 3.69]]


In [53]:
#2. 10 cities with the largest population
list = []
for element in document_tree.iterfind('country'):
        country = element.find('name').text 
        lastpop = element.findall('population[last()]')[0]
        lastpop = int(float(lastpop.text))
        list.append([country,lastpop])

In [54]:
sorted_list=sorted(list,key=operator.itemgetter(1))[0:9]
print(sorted_list)


[['Andorra', 78115], ['Montenegro', 620029], ['Kosovo', 1733872], ['Macedonia', 2059794], ['Albania', 2800138], ['Serbia', 7120666], ['Greece', 10816286]]


In [50]:
#3. 10 ethnic groups with the largest overall populations (sum of best/latest estimates over all countries)
list = {}
for country in document.iterfind('country'):
    if country.find('./ethnicgroup[1][@percentage]') is None:
        pass
    else:
        lastpop = int(country.find('./population[last()]').text)
        ethnic = country.find('./ethnicgroup[1]')
        ethnicname = ethnic.text
        ethicperc = float(ethnic.get('percentage'))/100
        print (country.find('name').text, ethnicname)
        print (lastpop * ethicperc)
        list[country.find('name').text, ethnicname] = lastpop * ethicperc

Albania Albanian
2660131.1
Greece Greek
10059145.98
Macedonia Macedonian
1322387.7480000001
Serbia Serb
5903032.114
Montenegro Montenegrin
266612.47
Kosovo Albanian
1595162.24
Andorra Spanish
33589.45
Spain Mediterranean Nordic
46815916.0
Austria Austrian
7743280.448999999
Czech Republic Czech
9548241.456
Germany German
73401020.925
Hungary Hungarian
9172430.644
Liechtenstein Italian
1831.8000000000002
Slovakia Slovak
4625259.852
Slovenia Slovene
1873527.11
Switzerland German
5290760.15
Belarus Belorussian
7682081.904
Latvia Latvian
1305309.228
Lithuania Lithuanian
2502620.4
Poland German
500939.25700000004
Ukraine Ukrainian
35502969.586
Russia Russian
114646210.938
Belgium Fleming
6437741.319999999
Luxembourg Luxembourgish
331182.243
Netherlands Dutch
13592447.067000002
Bosnia and Herzegovina Muslim
1819978.5599999998
Croatia Croat
3844388.3519999995
Bulgaria Bulgarian
5601820.488
Romania Romanian
17928382.130999997
Turkey Turkish
63935390.4336
Estonia Estonian
889290.5850000001
Faroe

In [52]:
sorted_list = sorted(list.items(), key=operator.itemgetter(1), reverse=True)[0:9]
print(sorted_list)

[(('China', 'Han Chinese'), 1245058800.0), (('India', 'Dravidian'), 302713744.25), (('United States', 'European'), 254958101.97759998), (('Nigeria', 'African'), 162651570.84), (('Bangladesh', 'Bengali'), 146776916.72), (('Japan', 'Japanese'), 126534212.00000001), (('Russia', 'Russian'), 114646210.938), (('Indonesia', 'Javanese'), 113456006.10000001), (('Brazil', 'European'), 108886717.794)]
