# Scrape Stories from Stadt Bonn

Scrape German Corona stories from the City of Bonn (larger city in western Germany): https://www.bonn.de/bonn-erleben/besichtigen-entdecken/aktion-coronavirus-erzaehl-uns-bonn.php.

The stories are written by citizens of Bonn who responded to a call for stories by the department of cultural affairs.

The stories are stored in dictionaries with the following fields:

- `link`: The link of the story page (string)
- `title`: The title of the story page (string; is the same for all stories)
- `author`: The author of the story (string)
- `date`: The when the story was published (string)
- `intro_text`: The introductory text giving some background info on story (string)
- `story_text`: The main text of the story (string)

The web pages containing the stories are stored as `.html`. The notebook requires a folder at `DATA_DIR`.

In [135]:
""" Scrape stories from bonn.de """

import os
import json
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

In [2]:
# Set URL of the main page containing the links to the story pages

URL = "https://www.bonn.de/bonn-erleben/besichtigen-entdecken/aktion-coronavirus-erzaehl-uns-bonn.php"

# Set directory for storing web pages

DATA_DIR = "../bonn.de/scraped/"

In [3]:
def check_dir(data_dir):
    """ Check if directory for saving web page exists """
    if not os.path.isdir(data_dir):
        os.makedirs(data_dir)
        print(f"Created saving directory at {data_dir}")

In [4]:
def load_web_page(url, file_name, data_dir):
    """ Check if web page can be loaded from disk;
    otherwise fetch website and save as .html to disk """
    check_dir(data_dir)
    if os.path.exists(file_name):
        with open(file_name, "r", encoding="utf-8") as file:
            page = file.read()
        print(f"Loaded web page from {file_name}")
    else:
        req = Request(url, headers={"User-Agent": "Mozilla/5.0"})
        page = urlopen(req).read()
        with open(file_name, "w", encoding="utf-8") as file:
            file.write(page.decode())
        print(f"Saved web page to {file_name}")
    return page

In [184]:
def extract_text_from_url(url):
    """ Extract text from story pages """
    new_file_name = DATA_DIR + url.split("-")[-1] + ".html"
    new_page = BeautifulSoup(load_web_page(
        url, new_file_name, DATA_DIR), "html.parser")
    new_date = json.loads(new_page.find(
        "meta", attrs={"name": "application-name"})["data-content"])["date"]
    new_intro_text = new_page.find(
        "div", attrs={"class": "SP-Text"}).get_text(separator="\n")
    sections = new_page.find_all(
        "section", attrs={"class": "SP-Section SP-Collapsible"})

    docs = []

    print(f"Extracting text from: {url}")

    for section in sections:
        new_title = section.find("span", attrs={
                                 "class": "SP-Collapsible__trigger__text SP-Iconized__text"}).string
        new_author = section.find(
            "h2", attrs={"class": "SP-Headline--paragraph"})
        new_story_text = section.find("div", attrs={
                                      "class": "SP-Paragraph"}).get_text(separator="\n").replace("\xa0", "")
        new_doc = {
            "link": url,
            "title": new_title,
            "author": new_author.string if new_author is not None else "",
            "date": new_date,
            "intro_text": new_intro_text,
            "story_text": new_story_text
        }
        docs.append(new_doc)
        print(f"Extracted text from story: {new_title}")
    print("Done.")

    return docs

In [185]:
def print_doc(doc):
    """ Print document """
    for field in doc.keys():
        print(field + ": " + doc[field] + "\n")

In [186]:
docs = extract_text_from_url(URL)

Loaded web page from ../bonn.de/scraped/bonn.php.html
Extracting text from: https://www.bonn.de/bonn-erleben/besichtigen-entdecken/aktion-coronavirus-erzaehl-uns-bonn.php
Extracted text from story: Anno 2020
Extracted text from story: Corona 2020
Extracted text from story: Einbrechers Hilferuf
Extracted text from story: Am Alten Zoll
Extracted text from story: Unwirklich
Extracted text from story: Der Rabe und die Winzlinge
Extracted text from story: Leben in Zeiten des Coronavirus
Extracted text from story: An den Fenstern, Applaus.
Extracted text from story: Coronatus
Extracted text from story: Corona.
Extracted text from story: Corona-Spaziergang
Extracted text from story: Im Corona-Supermarkt
Extracted text from story: Das Coronavirus
Extracted text from story: Nur eine Pandemie
Extracted text from story: Die Krise als Chance? Corona in meinem Umfeld ...
Extracted text from story: Mallorca, meine Liebe, du wirst vermisst!
Extracted text from story: Coronakirsche
Extracted text from

In [187]:
print_doc(docs[0])

link: https://www.bonn.de/bonn-erleben/besichtigen-entdecken/aktion-coronavirus-erzaehl-uns-bonn.php

title: Anno 2020

author: Eva Maria Keuchel

date: 2020-07-20T12:48:00Z

intro_text: Das Sport- und Kulturdezernat der Stadt Bonn rief im Mai 2020 Bonner*innen dazu auf, ihre persönlichen Geschichten aus dem Corona-Alltag zu teilen.
So entsteht ein kollektives Gedächtnis dieser unerwarteten Ausnahmesituation, welches für die Zukunft bewahrt wird und die Stimmung in der Stadt wiedergibt.
Die Stadt Bonn bedankt sich bei allen Autor*innen und freut sich, die Geschichten zu teilen.

story_text: Dass mir so etwas passiert, hätte ich nie gedacht. Ich bin eine Geisel geworden und lebe unter Hausarrest. Denn ich bin eine gefährdete Person, und ich könnte anderen gefährlich werden.
Nun sitze ich fast immer allein in meiner Wohnung und muss Verzichten lernen. Die fest gebuchte Flusskreuzfahrt zusammen mit meinen Freundinnen wurde gestrichen. Meine Opernkarte für „Fidelio“ hat sich in einen Gutsc