# Querying wikivoyage

In the frontend we want to include a small description of the place we are showing. If we use anything from wikipedia, we als need to give [attribution to the writers](https://en.wikivoyage.org/wiki/Wikivoyage:How_to_re-use_Wikivoyage_guides). This means we need to add a link to the wikipedia page where we got the text from.

Check out the official API documentation [here](https://www.mediawiki.org/wiki/API:Tutorial).

### Example usage

Below snippet is directly copied from the documentation and shows how to query using Python.

In [None]:
"""
    opensearch.py
    MediaWiki API Demos
    Demo of `Opensearch` module: Search the wiki and obtain
	results in an OpenSearch (http://www.opensearch.org) format
    MIT License
"""

import requests

S = requests.Session()

URL = "https://en.wikipedia.org/w/api.php"

PARAMS = {
    "action": "opensearch",
    "namespace": "0",
    "search": "Hampi",
    "limit": "5",
    "format": "json",
}

R = S.get(url=URL, params=PARAMS)
DATA = R.json()

print(DATA)

## Intro text

To fetch text, one has to use the `query/prop=extracts` parameter. A list of all the arguments for `extracts` is in the [official documentation](https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bextracts).

How to retrieve only the intro text, this answer at [Stackoverflow](https://stackoverflow.com/questions/8555320/is-there-a-clean-wikipedia-api-just-for-retrieve-content-summary) suggests to use the `exintro` argument. The `redirects` can be used in combination with `titles` to find a location based on name.

In [None]:
URL = "https://en.wikivoyage.org/w/api.php"

PARAMS = {
    "action": "query",
    "redirects": 1,
    "titles": "Amsterdam",
    "format": "json",
    "prop": "extracts",  # yields extracts if available
    "exintro": "true",
    "explaintext": "true",
}

R = S.get(url=URL, params=PARAMS)
data = R.json()

print(data)

Indeed works pretty well!

Now, we have the luxury that our database is based on wikivoyage. So why not query on pageid directly?

In [None]:
PARAMS = {
    "action": "query",
    "pageids": 1036, # 1036 = amsterdam, 8966 Da Nang, 16935 Karachi
    "format": "json",
    "prop": "extracts",  # yields extracts if available
    "exintro": "true",
    "explaintext": "true",
}

R = S.get(url=URL, params=PARAMS)
data = R.json()

print(data)

There are some problematic pages:
* 8966 [Da Nang](https://en.wikivoyage.org/wiki/Da_Nang). This one doesn't return the intro text with `exintro`
* 16935 Karachi. Has a very long intro. Maybe split on the first paragraph or apply `\n\n` formatting for alinea breaks? This can be solved in the frontend.

To tackle the first problem, what can we do to get the intro extract?
1. `query` wikivoyage on `title` -> but still no extract so doesn't work.
2. `query` wikipedia on `pageid`? -> could be an extract there, but wikipedia has a lot more non-travel related info
3. write own callback function to parse intro from html? -> get first sections using `exlimit=1` 
4. `opensearch` wikivoyage on name?
5. showing only the first X sentences? -> but how many to pick?

Let's explore option 3. Here you can see the different parameters for the [`query` format](https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bextracts). The idea is to use `exlimit=1` to get the first extract from wiki and then parse the intro section by splitting on section headers like `"== Understand =="`.

In [None]:
PARAMS = {
    "action": "query",
    "pageids": 8966, # 1036 = amsterdam, 8966 Da Nang, 16935 Karachi
    "format": "json",
    "prop": "extracts",  # yields extracts if available
#     "exintro": "true",
    "explaintext": "true",
    "exlimit": 1
}

R = S.get(url=URL, params=PARAMS)
data = R.json()

data['query']['pages']['8966']['extract'].split('== Understand ==')[0]

Perfect!

## Link to website

To give attribution, we need to add a small text like:

"A list of contributors is available at the original Singapore article at Wikivoyage."

With a link to the contributors and the original article. Let's get the link to the article by using the [`info` prop](https://www.mediawiki.org/wiki/API:Info).

In [None]:
PARAMS = {
    "action": "query",
    "pageids": 8966, # 1036 = amsterdam, 8966 Da Nang, 16935 Karachi
    "format": "json",
    "prop": "info",
    "inprop": "url",
}

R = S.get(url=URL, params=PARAMS)
data = R.json()

data

In [None]:
data['query']['pages']['8966']['fullurl']

Using the title, the revision history page can be found by using the title `Da_Nang` in a url of the format: 

https://en.wikivoyage.org/w/index.php?title=Da_Nang&action=history.

So basically, the `editurl` with another action.

### Other info

There's more stuff that can be queried using the API. For example, get a description of the place with:

```
     "prop": "description",  # yields 'description': 'municipality of Vietnam'
```

Done.