# Exercise #1: Fetching data from Wikipedia and DBpedia

Your task is to list all the Norwegian cities with a population of over 100k. Specifically:
  - Get the list of cities from a given Wikipedia category.
  - Get the population of a given city from its respective DBpedia entry.

For both subtasks, some example code is provided that you'll need to adjust.

The goal of this exercise is to help you gain a better understanding of Wikipedia categories and how entities are represented in DBpedia. Note that you can always look at the corresponding Wikipedia pages and DBpedia resources in your browser.

## Getting a list of cities from Wikipedia

In [1]:
import wikipediaapi

wiki_wiki = wikipediaapi.Wikipedia('en')

Get a list of Norwegian cities from Wikipedia by listing the members of the ['Cities_and_towns_in_Norway' category](https://en.wikipedia.org/wiki/Category:Cities_and_towns_in_Norway).

Check "How To Get All Pages From Category" on [this page](https://pypi.org/project/Wikipedia-API/).

In [2]:
cities = []
cat = wiki_wiki.page("Category:Cities_and_towns_in_Norway")
for c in cat.categorymembers.values():
    if c.ns != wikipediaapi.Namespace.CATEGORY:
        cities.append(c.title)

In [3]:
print(cities)

['Largest metropolitan areas in the Nordic countries', 'List of towns and cities in Norway', 'List of urban areas in the Nordic countries', 'List of historical capitals of Norway', 'List of urban areas in Norway by population', 'Kjøpstad', 'Ålesund (town)', 'Åndalsnes', 'Åkrehamn', 'Alleen', 'Alta (town)', 'Arendal (town)', 'Askim', 'Bergen', 'Bodø (town)', 'Brekstad', 'Brevik, Norway', 'Brønnøysund', 'Bryne', 'Drammen', 'Drøbak', 'Egersund', 'Elverum', 'Fagernes', 'Falkum', 'Farsund (town)', 'Fauske (town)', 'Finnsnes', 'Flekkefjord (town)', 'Florø', 'Førde (town)', 'Fosnavåg', 'Fredrikstad', 'Gjøvik', 'Grimstad (town)', 'Gulsvik', 'Halden', 'Hamar', 'Hammerfest (town)', 'Harstad (town)', 'Haugesund', 'Hokksund', 'Holmestrand', 'Hønefoss', 'Honningsvåg', 'Jørpeland', 'Kirkenes', 'Kolvereid', 'Kongsberg', 'Kongsvinger', 'Kopervik', 'Kragerø', 'Kristiansand', 'Kristiansund (town)', 'Langesund', 'Larvik', 'Leirvik', 'Leknes', 'Levanger (town)', 'Lillehammer', 'Lillesand (town)', 'Lillest

## Get properties of a given entity from DBpedia

In [4]:
import requests

Example how to get data about a given person from DBpedia, [see here](http://dbpedia.org/resource/Matteo_Donati).

In [5]:
data = requests.get("http://dbpedia.org/data/Matteo_Donati.json").json()
properties = data['http://dbpedia.org/resource/Matteo_Donati']

`properties` is a dictionary with lots of keys that correspond to that entity's properties.
Each value is a list of dictionaries itself. 

In [6]:
print("Height: {}".format(properties['http://dbpedia.org/ontology/height'][0]['value']))
print("Birth date: {}".format(properties['http://dbpedia.org/ontology/birthDate'][0]['value']))
print("Hand: {}".format(properties['http://dbpedia.org/ontology/plays'][0]['value']))

Height: 1.88
Birth date: 1995-02-28
Hand: Right-handed (two-handed backhand)


## Get populations of cities

Look up the populations of Norwegian cities and output those above 100k.

The predicate that corresponds to population.

In [7]:
predicate = "http://dbpedia.org/ontology/populationTotal"

In [8]:
for city in cities:
    url_name = city.replace(" ", "_")
    data = requests.get("http://dbpedia.org/data/{}.json".format(url_name)).json()
    dict_key = "http://dbpedia.org/resource/{}".format(url_name)
    if dict_key not in data:
        continue
    properties = data[dict_key]
    if predicate not in properties:  # skip non-city entities
        continue
    population = int(properties[predicate][0]['value'])
    if population >= 100000:
        print(city, population)

Bergen 278121
Oslo 658390
Stavanger 130426
Trondheim 187353


## Feedback

Please give (anonymous) feedback on this exercise by filling out [this form](https://forms.gle/22o3ursi5YsR1Ztb8).