# Exercise sheet \#4
## Using APIs
### Exercise 1
Write a Python script which gets the list of astronauts who are currently in Space. To do so, you can use the [astro.json](http://api.open-notify.org/astros.json) OpenNotify API. 

In [1]:
import requests
response = requests.get("http://api.open-notify.org/astros.json")
data = response.json()
astronauts = list(map(lambda x: x['name'], data['people']))
print(astronauts)

['Oleg Kononenko', 'David Saint-Jacques', 'Anne McClain']


### Exercise 2

#### Question 2.1
Write a Python program, which, for each astronaut found in Exercise 1, retrieves  her.his (English) wikipedia article and extract the article's summary and links.

In [25]:
import wikipedia

links = []
for x in astronauts:
    print(wikipedia.summary(x))
    page = wikipedia.page(x)
    links.extend(page.links)

ModuleNotFoundError: No module named 'wikipedia'

#### Question 2.2
Extend your Python program so that it only keeps links that are pointing to wikipedia pages (in any language).

In [21]:
for l in links:
    try:
        page = wikipedia.search(l, results = 1)
        print(page)
    except wikipedia.exceptions.PageError as e:
        print(e.options)

NameError: name 'links' is not defined

#### Question 2.3
Extend your Python program so that it processes these links as follows:
 - it retrieves the corresponding article and then extracts its references


In [None]:
refs = []
for l in links:
    try:
        wikipedia.search(l, results=1)
        page = wikipedia.page(l)
        try:
            refs.extend(page.references)
            print(page.references)
        except KeyError as ke:
            pass
    except wikipedia.exceptions.PageError as e:
        print(e.options)
print(refs)

#### Question 2.4
Extend your Python program to compute the average number of views for each astronaut's main article.

In [26]:
import wptools

total = 0
for x in astronauts:
    page = wptools.page(x)
    num_views = page.get_more().data['views']
    print(x, num_views)
    total_views += int(num_views)
print ("Average views:" + total/len(astronauts))

ModuleNotFoundError: No module named 'wptools'

#### Question 2.5
Export the extracted information in a CSV file having the following fields:

`Astronaut's name ; Article's summary ; links separated by commas ; number of views`

In [None]:
import wptools

csv = ''
for x in astronauts:
    page = wptools.page(x)
    summary = page.get_restbase('/page/summary/').data['exrest']
    links = page.get().data['links']
    num_views = page.get_more().data['views']
    line = x + ";" + summary.replace(";", ",") + ";" + ','.join(links) + ';' + str(num_views)
    csv += line
print(csv)

### Exercise 3
In this exercise, we want to build a multilingual resource.
Take the list of movies from [Exercise sheet #3](https://nbviewer.jupyter.org/urls/mastertal.gitlab.io/UE803/notebooks/Exercise_sheet_webscraping.ipynb) and build a parallel corpora EN-FR of movie titles.

In [None]:
import wptools

corpus = ''
with open('movies.txt', 'r') as f:
    for line in f:
        title_en = line.split(';')[0]
        page = wptools.page(title_en)
        try:
            languages=page.get_more().data['languages']
            title_fr = ''
            index = 0
            found = False
            while index < len(languages) and not (found):
                if languages[index]['lang'] == 'fr':
                    title_fr = languages[index]['title']
                    found = True
                index += 1
            if found:
                corpus += title_en + ";" + title_fr + '\n'
        except LookupError as e:
            pass
print(corpus)

### Exercise 4
In this exercise, you will extract information from the Mastodon social network.

#### Question 4.1
- Create a Mastodon account (on your prefered Mastodon instance).

- Write a Python application named "ES4" (for Exercise Sheet 4) and register it manually.


In [None]:
from mastodon import Mastodon

appId, secret = Mastodon.create_app("ES4", scopes=['read', 'write'], api_base_url="https://mastodon.social")
client = Mastodon(appId, secret, api_base_url = "https://mastodon.social")

#### Question 4.2
Extract from public timelines, toots having the #worldcup tag (we only want toots' content).

In [None]:
posts = [x.content for x in client.timeline_hashtag('worldcup')]
print(posts)

#### Question 4.3
How many such toots contain images ?

In [None]:
img_posts = [x.content for x in client.timeline_hashtag('worldcup', limit = 1000) if any(y['type']== 'image')]
print(len(img_posts))

#### Question 4.4
Get the list of public published statuses on the Mastodon flagship (mastodon.social). 

In [None]:
toots = client.timeline(timeline='public')
print(toots)

#### Question 4.5 (bonus)
Write a Python bot, which reacts to toots from its followers.
Every time a follower mentions your bot's name in a toot which contains the word `fortune`, your bot will pick up a fortune from this [list](https://www.fortunes-fr.org/data/proverbes) and toot it.

In [None]:
from mastodon import Mastodon, StreamListener
import requests, html2text, random

class theBot(StreamListener):
    def on_notification(self, notification):
        if notification['type'] == 'mention':
            html = notification['status']['content']
            text = html2text.html2text(html)
            if 'fortune' in text:
                page = requests.get('https://www.fortunes-fr.org/data/proverbes')
                page.encoding = "ISO-8859-1"
                fortunes = page.content.decode().split('%')
                fortune = fortunes[random.randint(0, len(fortunes)-1)]
                print(str(fortune))
                sid = notification['status']['id']
                visibility = notification['status']['visibility']
                mastodon.status_post(str(fortune), in_reply_to_id = sid, visibility = visibility)
                
with open('mastodon.secret', 'r') as f:
    credentials = f.readlines()
    
mastodon = Mastodon()
    access_token = credentials[2].strip()
    api_base_url = 'https://botsin.space'
fortuneBot = theBot()
mastodon.stream_user(fortuneBot)

### Exercise 5
#### Question 5.1 
Using Wikipedia, compile the list of UEFA's Intertoto cup winners sorted by country.

In [None]:
import wptools
from bs4 import BeautifulSoup, SoupStrainer

page = wptools.page('intertoto')
content = page.get_restbase('/page/html').data['html']
soup = BeautifulSoup(content, 'html.parser')
tables = soup.find_all(SoupStrainer('table'))

table_body = tables[7]
data = {}

rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    if len(cols) >=4:
        data[cols[1].text] = cols[4].text
print(data)

#### Question 5.2
Compute the number of Intertoto winners per country.

In [None]:
number = {}
for country, teams in data.items():
    answer[country] = len(teams.split(','))
print(answer)