# Classical Receptions Corpus based on Stable DraCor workflow

relevant notebooks:

https://github.com/dracor-org/dracor-notebooks/blob/docker/docker/local-dracor-with-docker.ipynb
https://github.com/dracor-org/vebidracor/blob/main/vebidracor-workflow.ipynb

To Cite the workflow, pls use:

Boerner, Ingo; Trilcke, Peer; Milling, Carsten; Fischer, Frank; Sluyter-Gäthje, Henny (2023): "Dockerizing DraCor – A Container-based Approach to Reproducibility in Computational Literary Studies". *DH2023 Book of Abstracts*. https://dh2023.adho.org




```
services:
  api:
    image: ingoboerner/dracor-api:v0.86.3_local
    ports:
      - "8080:8080"
    depends_on:
      - metrics
      - fuseki
  metrics:
    image: ingoboerner/dracor-metrics:v1.2.1
    ports:
      - "8030:8030"
  frontend:
    image: ingoboerner/dracor-frontend:v1.4.3_local
    ports:
      - "8088:80"
    depends_on:
      - api
  fuseki:
    image: "stain/jena-fuseki"
    environment:
      - ADMIN_PASSWORD=qwerty
      - FUSEKI_DATASET_1=dracor
    ports:
      - "3030:3030"
    expose:
      - "3030"
```

There should be a function to create this compose file.

In [1]:
!ls

classical_receptions_corpus.ipynb stable-dracor-workflow.ipynb
docker-compose.empty.yml


Run 
`docker-compose -f docker-compose.empty.yml up`

Go to http://localhost:8088/

Should be an empty DraCor instance.

Functions to use the API:
Helper function, might become obsolete if pydracor is ready.

In [2]:
# import libraries json and requests
import json
import requests

#corpusname:str -> []
def get(**kwargs):
    #corpusname=corpusname
    #playname=playname
    #apibase="https://dracor.org/api/"
    #method=method
    #parse_json: True
    
    #could set different apibase, e.g. https://staging.dracor.org/api/ [not recommended, pls use the production server]
    if "apibase" in kwargs:
        if kwargs["apibase"].endswith("/"):
            apibase = kwargs["apibase"]
        else:
            apibase = kwargs["apibase"] + "/"
    else:
        #use local API per default
        apibase = "http://localhost:8088/api/"
    if "corpusname" in kwargs and "playname" in kwargs:
        # used for /api/corpora/{corpusname}/play/{playname}/
        if "method" in kwargs:
            request_url = apibase + "corpora/" + kwargs["corpusname"] + "/play/" + kwargs["playname"] + "/" + kwargs["method"]
        else:
            request_url = apibase + "corpora/" + kwargs["corpusname"] + "/play/" + kwargs["playname"]
    elif "corpusname" in kwargs and not "playname" in kwargs:
        if "method" in kwargs:
            request_url = apibase + "corpora/" + kwargs["corpusname"] + "/" + kwargs["method"]
        else:
            request_url = apibase + "corpora/" + kwargs["corpusname"] 
    elif "method" in kwargs and not "corpusname" in kwargs and not "playname" in kwargs:
            request_url = apibase + kwargs["method"]
            
    else: 
        #nothing set
        request = request_url = apibase + "info"
    
    #send the response
    r = requests.get(request_url)
    if r.status_code == 200:
        #success!
        if "parse_json" in kwargs:
            if kwargs["parse_json"] == True:
                json_data = json.loads(r.text)
                return json_data
            else:
                return r.text
        else:
            return r.text
    else:
        raise Exception("Request was not successful. Server returned status code: "  + str(r.status_code))

## Setup an empty corpus `receptions` in local instance

In [3]:
new_corpus_name = "receptions"
new_corpus_title = "Classical Receptions in DraCor"

#needed for authorization
from requests.auth import HTTPBasicAuth

#Username of the local instance
usr = "admin"
#Password of the admin user
pwd = ""

#construct the payload
metadata = {
  "name": new_corpus_name,
  "title": new_corpus_title
}

#url of the corpora endpoint
corpora_endpoint_url = "http://localhost:8088/api/corpora"

#send the POST request using library requests
r = requests.post(corpora_endpoint_url, json = metadata, auth=HTTPBasicAuth(usr, pwd))

if r.status_code == 200:
    print("Success!" + " http://localhost:8088/" + new_corpus_name)

In [4]:
#check if successful
get(method="corpora", parse_json=True)

[{'uri': 'https://dracor.org/api/corpora/receptions',
  'title': 'Classical Receptions in DraCor',
  'name': 'receptions',
  'acronym': 'ReceptionsDraCor'}]

## Define corpus contents

In [5]:
#list of corpora/plays to include
corpora_to_include = [
    { 
        "corpusname": "ep",
        "repository": "https://github.com/dracor-org/epdracor",
        "commit": "8c0802bf7d9ee0508bddea02f43e9571344a9056",
        "include" : {
            "type" : "slug",
            "ids" : [
                "heywood-the-english-traveller", 
                "jonson-every-man-in-his-humour",
                "jonson-volpone",
                "jonson-the-alchemist",
                "jonson-the-case-is-altered",
                "barry-ram-alley",
                "terence-andria",
                "terence-the-two-first-comedies-andria-the-eunuch",
                "udall-ralph-roister-doister",
                "chapman-all-fools",
                "dryden-sir-martin-mar-all",
                "jonson-the-devil-is-an-ass",
                "otway-titus-and-berenice",
                "shadwell-the-miser",
                "dryden-amphitryon"
                    ]
        },
    },
    { 
        "corpusname": "fre",
        "repository": "https://github.com/dracor-org/fredracor",
        "commit": "a9fa55c94986eb5a7c58b099d8e8cae60abe081c",
        "include" : {
            "type" : "id",
            "ids" : ["fre000090", "fre000424", "fre000920", "fre000995", "fre000996", "fre001006", "fre001010",
                    "fre001172", "fre001218", "fre001220", "fre001244", "fre001253", "fre001334"
                    ]
        },
    },
    { 
        "corpusname": "ger",
        "repository": "https://github.com/dracor-org/gerdracor",
        "commit": "a370bbcc806ba19fa2784e4ce43b1a44142aaf16",
        "include" : {
            "type" : "id",
            "ids" : ["ger000249"]
        },
    },
    { 
        "corpusname": "ita",
        "repository": "https://github.com/dracor-org/itadracor",
        "commit": "0c9c04b56f774f9f057f18259c2dff54f08d776f",
        "include" : {
            "type" : "id",
            "ids" : ["ita000035", "ita000036", "ita000038", "ita000041", "ita000042"]
        },
    },
    { 
        "corpusname": "shake",
        "repository": "https://github.com/dracor-org/shakedracor",
        "commit": "6438b3c2c632bd18e2529e0b7ba30477b66facd4",
        "include" : {
            "type" : "id",
            "ids" : ["shake000011"]
        },
    }
    ]

## Load plays and add to corpus

In [6]:
def load_play(corpus_info:dict, playname:str):
    """
    Load a single play to the corpus
    expects the corpus dictionary from corpora_to_include
    """
    
    headers = {'Content-Type': 'application/xml'}

    #define the variables
    #have to change this to be able to get dracor/ingoboerner shakedracor
    if "/dracor-org/" in corpus_info["repository"]:
        corpus_repo_part = corpus_info["repository"].split("/dracor-org/")[1] # this is not the same as "corpusname"!
    elif "/ingoboerner/" in corpus_info["repository"]:
        corpus_repo_part = corpus_info["repository"].split("/ingoboerner/")[1] # this is not the same as "corpusname"!
    else:
        raise Exception("Unexpected source repo.")
    
    commit_id = corpus_info["commit"]
    filename = playname + ".xml"

    #concatinate to a download url
    if "/dracor-org/" in corpus_info["repository"]:
        download_url = "https://raw.githubusercontent.com/dracor-org/" + corpus_repo_part + "/" + commit_id + "/tei/" + filename
    elif "/ingoboerner/" in corpus_info["repository"]:
        download_url = "https://raw.githubusercontent.com/ingoboerner/" + corpus_repo_part + "/" + commit_id + "/tei/" + filename
    else:
        raise Exception("Unexpected source repo when creating download-url.")
    
    
    get_r = requests.get(download_url)
    #get only the text from the response and encode it in UTF-8 (important!)
    if get_r.status_code == 200:
        #successful
        tei = get_r.text.encode('utf-8')
        
        #construct the URL to use in the PUT request:
        put_request_url = "http://localhost:8088/api/corpora/" + new_corpus_name + "/play/" + playname + "/tei"
        
        put_r = requests.put(put_request_url, data=tei, headers=headers, auth=HTTPBasicAuth(usr, pwd))
        
        return put_r.status_code #should be 200 if successful
    else: 
        return 480

In [7]:
def dracor_id_to_playname(id:str):
    """Translate DraCor ID to playname/slug"""
    headers = { "Accept" : "application/json" }
    r = requests.get("https://dracor.org/api/id/" + id,headers=headers)
    result = json.loads(r.text)
    return result["name"]
    

In [8]:
dracor_id_to_playname("ger000023")

'wedekind-hidalla'

In [9]:
#actually, this should look into the files that are int /tei in the commit making use of the github API, not check on Dracor
#currently, I assume, that plays are added, but not deleted
def load_corpus(corpus_info:dict):
    
    
    
    if "include" in corpus_info and "exclude" not in corpus_info:
        # Only include plays, not exclude; not whole corpora
        
        playnames = []
        
        
        if corpus_info["include"]["type"] == "slug":
            # need to translate to playname
            for item in corpus_info["include"]["ids"]:
                playnames.append(item)
            
        else:
            for item in corpus_info["include"]["ids"]: 
                playname = dracor_id_to_playname(item)
                playnames.append(playname)
        
        #errors
        errors = []
    
        for playname in playnames:
            print(f"Loading {corpus_info['corpusname']} – {playname}")
            l = load_play(corpus_info, playname)
            if l != 200:
                error = {}
                error["playname"] = playname
                error["status_code"] = l
                errors.append(error)
            else:
                print(f"Stored.")
    
    
        if len(errors) == 0:
            print("Everything fine!")
            return [True, errors]
        else:
            print("There have been " + str(len(errors)) + " errors.")
            return [False, errors]
    else:
        print("Not implemented.")
            

In [10]:
%%time
overall_errors = []
#load the data
for corpus in corpora_to_include:
    load_operation = load_corpus(corpus)
    if load_operation[0] == False:
        overall_errors.append(load_operation[1])

Loading ep – heywood-the-english-traveller
Stored.
Loading ep – jonson-every-man-in-his-humour
Stored.
Loading ep – jonson-volpone
Stored.
Loading ep – jonson-the-alchemist
Stored.
Loading ep – jonson-the-case-is-altered
Stored.
Loading ep – barry-ram-alley
Stored.
Loading ep – terence-andria
Stored.
Loading ep – terence-the-two-first-comedies-andria-the-eunuch
Stored.
Loading ep – udall-ralph-roister-doister
Stored.
Loading ep – chapman-all-fools
Stored.
Loading ep – dryden-sir-martin-mar-all
Stored.
Loading ep – jonson-the-devil-is-an-ass
Stored.
Loading ep – otway-titus-and-berenice
Stored.
Loading ep – shadwell-the-miser
Stored.
Loading ep – dryden-amphitryon
Stored.
Everything fine!
Loading fre – beaumarchais-barbier-de-seville
Stored.
Loading fre – cyrano-pedant-joue
Stored.
Loading fre – mareschala-veritable-capitaine-matamore
Stored.
Loading fre – moliere-amphitryon
Stored.
Loading fre – moliere-avare
Stored.
Loading fre – moliere-etourdi
Stored.
Loading fre – moliere-fourberie