# Lab 8 - Geospatial Semantics I

Th. 21.11.2024 15:00-17:00

---
## 1 GeoNames as a Data Provider

In this section, we will first go through GeoNames as a global gazetteer and try to understand its contained data.

### 1.1 Description
The GeoNames geographical database is available for download free of charge under a creative commons attribution license. It contains over 25 million geographical names and consists of over 12 million unique features whereof 4.8 million populated places and 16 million alternate names.

GeoNames is integrating geographical data such as names of places in various languages, elevation, population and others from various sources. All lat/long coordinates are in WGS84 (World Geodetic System 1984). Users may manually edit, correct and add new names using a user friendly wiki interface.

![geonames](../figs/lab8_figs/geonames.png)

### 1.2 Feature codes

In GeoNames, all features are categorized into several feature classes and further subcategorized into one out of 645 [feature codes](https://www.geonames.org/export/codes.html). 

<strong>Question 1</strong>: How many <strong>feature classes</strong> does it have? What are they?

![geonames_feature_codes](../figs/lab8_figs/geonames_feature_codes.png)

### 1.3 Statistics
Check out ([more statistics ...](https://www.geonames.org/statistics)) here regarding region/country statistics and how many features each region/country has. 

<strong>Question 2</strong>: Where is Austria? Can you list the associated statistics?
![geonames_statistics](../figs/lab8_figs/geonames_statistics.png)

### 1.4 Data Dump 
The data in GeoNames is accessible free of charge through a number of [webservices](https://www.geonames.org/export/#ws) and a daily [database export](https://www.geonames.org/export/#dump). See the [readme.txt](https://download.geonames.org/export/dump/readme.txt) about downloadable data stored on the [server](https://download.geonames.org/export/dump/).

<strong>Download the AT.zip dataset</strong>, which includes features within Austria.
![at](../figs/lab8_figs/at.png)

---
## 2 GeoNames Ontology

GeoNames was developed during the [Semantic Web](https://en.wikipedia.org/wiki/Semantic_Web) movement.
### 2.1 The Semantic Web
The Semantic Web is a project that intends to add computer-processable meaning (semantics) to the World Wide Web.
In Feb 2004, [The World Wide Web Consortium (W3C)](https://www.w3.org/) released the [Resource Description Framework (RDF)](https://www.w3.org/RDF/) and the [OWL Web Ontology Language (OWL)](https://www.w3.org/OWL/) as W3C Recommendations. RDF is used to represent information and to exchange knowledge in the Web. OWL is used to publish and share sets of terms called ontologies, supporting advanced Web search, software agents and knowledge management.

![w3c](../figs/lab8_figs/w3c.png)

### 2.2 Ontology Expressed in OWL
The GeoNames Ontology makes it possible to add geospatial semantic information to the World Wide Web. All over 11 million geonames toponyms now have a unique URL with a corresponding RDF web service. Other services describe the relation between toponyms. The Ontology for GeoNames is available in OWL. 

You can download from [here](https://www.geonames.org/ontology/ontology_v3.3.rdf). To open it, you can try to use code editors like [Microsoft Visual Studio Code](https://code.visualstudio.com/). 

<strong>Question 3</strong>: What interesting information do you see from the GeoNames ontology? List a few examples.
![geonames_ontology](../figs/lab8_figs/geonames_ontology.png)

### 2.3 Mapping GeoNames Ontology with Other Ontologies
Check also how the GeoNames Ontology can be mapped to other ontologies, e.g., of other gazetteers. You can download the mapping from [here](https://www.geonames.org/ontology/mappings_v3.01.rdf), and again, open it with the code editor.

![geonames_ontology_mapping](../figs/lab8_figs/geonames_ontology_mapping.png)

P.S.: One of such gazetteers is [DBpedia](https://www.dbpedia.org/).

---
## 3 Working with GeoNames Data - Austria
In this section, we are going to use the data downloaded before and learn how to read and browse it.

### 3.1 Read Downloaded Data

This time we need to read a .txt file with Pandas to create a DataFrame. However, the file does not come with a header. How do we know what each column stands for?

Quick answer(!): Check the <strong>readme.txt</strong> in the downloaded .zip file, so that you can know how to define the <strong>field names</strong> in <strong>AT.txt</strong>.

![readme](../figs/lab8_figs/readme.png)

In [None]:
import pandas as pd

In [None]:
df_at = pd.read_csv('AT/AT.txt', sep='\t', names=["geonameid", "name", "asciiname", "alternatenames",
                                                "latitude","longitude","feature class","feature code",
                                                "country code","cc2","admin1 code","admin2 code",
                                                "admin3 code","admin4 code","population",
                                                "elevation","dem","timezone","modification date"])

In [None]:
df_at.head()

Find out what is the geonameid for <strong>Wien</strong>.

In [None]:
df_at[df_at['name'] == 'Wien']

## 3.2 Browse a GeoNames Feature with FollowYourNose Search
Next, we are going to use the retrieved geonameid to browse information about Wien.

Note that GeoNames is using [303 redirection](https://www.geonames.org/ontology/documentation.html#:~:text=GeoNames%20is%20using,for%20more%20information.) to distinguish the <strong>Concept</strong> (thing as is) from the <strong>Document</strong> about it.

For Wien we actually have these two URIs pointing to both it as a Concept and a Document about it:
- [1] https://sws.geonames.org/2761367/
- [2] https://sws.geonames.org/2761367/about.rdf

The GeoNames redirects requests for [1] to [2]. The latter one that contains <strong>RDF descriptions</strong> about a feature is what we will use next for <strong>[FollowYourNose](https://www.w3.org/wiki/FollowYourNose)</strong> search.

### 3.2.1 RDFLib
To access and parse RDF, we need a Python package called [RDFLib](https://rdflib.readthedocs.io/en/stable/) to work with. Install the libary with the following command.
```
pip install rdflib
```
![rdflib](../figs/lab8_figs/rdflib.png)

The data are in the form of <strong>triples</strong>, and we need a <strong>Graph</strong> to load and store them. A triple has the form of <strong><subject, predicate, object></strong>.
![triples](../figs/lab8_figs/triples.jpg)

In [None]:
from rdflib import Graph
# Create a Wien Graph
g = Graph()

Parse the Document about Wien.

In [None]:
g.parse("http://www.geonames.org/2761367/about.rdf")

Loop through each triple in the Wien Graph, and check if there is at least one triple.

In [None]:
for subj, pred, obj in g:
    if (subj, pred, obj) not in g:
       raise Exception("It better be!")

<strong>Question 4</strong>: How many triples are there in the Wien Graph?

In [None]:
print(f"The Wien Graph has {len(g)} triples.")

Print out the entire Wien Graph in the RDF <strong>[Turtle](https://en.wikipedia.org/wiki/Turtle_(syntax))</strong> format.

In [None]:
print(g.serialize(format="turtle"))

### 3.2.2 Keep Searching?
Can we keep searching for <strong>linked entities</strong> to Wien? Try the following code.

In [None]:
from rdflib import URIRef

wien = URIRef("https://sws.geonames.org/2761367/")

predicate_chosen = URIRef("http://www.geonames.org/ontology#childrenFeatures")

for s, p, o in g.triples((wien, predicate_chosen, None)):
    print(s, p, o)

In [None]:
g.parse(o)

print(g.serialize(format="turtle"))

<strong>Question 5</strong>:  What is this url - https://sws.geonames.org/2761333/ - pointing? What is its difference compared with <strong>Wien</strong>?

---
## Submission
Run the codes above and submit the .ipynb file along with answers to Question 1 to 5.