# Understanding and re-documenting the *Folger Shakespeare API Tools* 

Author: Ingo Börner

This notebook is used to evaluate the "Folger Shakespeare API Tools" for the report on "Programmable Corpora" within the CLS INFRA project. 

The notebook generates an OpenAPI documentation for the Folger API, that documents it in a standardized way. 

It also contains some observations about the API that wil eventually be elaborated in more detail in the upcoming report. 

It could also be reworked to some kind of Python API wrapper (we will see).

Official HTML Documentation see https://www.folgerdigitaltexts.org/api

In [1]:
#!pip install apispec
#!pip install marshmallow

In [2]:
# Packages used for Documentation and schemas
# https://apispec.readthedocs.io/
from apispec import APISpec
from apispec.ext.marshmallow import MarshmallowPlugin
from marshmallow import Schema, fields

In [3]:
#!pip install PyYAML

In [4]:
# Packages used for writing the specification
import yaml
import json

In [5]:
# Send HTTP Requests to the API
import requests

In [6]:
#service URL of the Folger Shakespeare API Tools
SERVICE_BASE = "https://www.folgerdigitaltexts.org"

### Setting up a dummy API with `flask`

This probably is a workaround: We need some dummy API to generate the OpenAPI Documentation from. We use `flask` and the package `apispec` to make use of the docstrings and path annotations to the functions calling the real Folger API. 

In [7]:
#!pip install flask

In [8]:
#!pip install apispec_webframeworks

In [9]:
#we setup a dummy api with flask
import flask
from apispec_webframeworks.flask import FlaskPlugin

In [10]:
api = flask.Flask(__name__)

## Exploring and re-documenting the API

The official API offers two drop-down menues to "build" queries to the API:

![Dropdown with play codes and functions](images/play-codes_functions_dropdowns.png)

There doesn't see to be a function to request all the IDs of the plays available. Therefore we had to hardcode the IDs in the following dictionary.

In [11]:
#Each play has its identifier ("play code") – see Official Documention:
# put that codes into a dictionary, maybe reuse in a schema later

PLAYCODES = {
    "AWW" : "All's Well That Ends Well",
    "Ant" : "Antony and Cleopatra",
    "AYL" : "As You Like It",
    "Err" : "The Comedy of Errors",
    "Cor" : "Coriolanus",
    "Cym" : "Cymbeline",
    "Ham" : "Hamlet",
    "1H4" : "Henry IV, Part 1",
    "2H4" : "Henry IV, Part 2",
    "H5" : "Henry V",
    "1H6" : "Henry VI, Part 1",
    "2H6" : "Henry VI, Part 2",
    "3H6" : "Henry VI, Part 3",
    "H8" : "Henry VIII",
    "JC" : "Julius Caesar",
    "Jn" : "King John",
    "Lr" : "King Lear",
    "LLL" : "Love's Labor's Lost",
    "Mac" : "Macbeth",
    "MM" : "Measure for Measure",
    "MV" : "The Merchant of Venice",
    "Wiv" : "The Merry Wives of Windsor",
    "MND" : "A Midsummer Night's Dream",
    "Ado" : "Much Ado About Nothing" ,
    "Oth" : "Othello",
    "Per" : "Pericles",
    "R2" : "Richard II",
    "R3" : "Richard III",
    "Rom" : "Romeo and Juliet",
    "Shr" : "The Taming of the Shrew", 
    "Tmp" : "The Tempest",
    "Tim" : "Timon of Athens",
    "Tit" : "Titus Andronicus",
    "Tro" : "Troilus and Cressida",
    "TN" : "Twelfth Night",
    "TGV" : "Two Gentlemen of Verona",
    "TNK" : "Two Noble Kinsmen",
    "WT" : "The Winter's Tale"
}


In [12]:
print("There are " + str(len(PLAYCODES.keys())) + " play codes.")

There are 38 play codes.


In [13]:
# as a marshmallow schema to be re-used
class play_codes_schema(Schema):
    playcode = fields.String(
        required=True,
        metadata={
            "enum": list(PLAYCODES.keys())
        }
    )

The official documentation lists the **15** following functions:

* **synopsis**: (+ act/scene, optionally) returns a synopsis of the play and its scenes
* **ftln** (+ Folger through line number): returns the spoken text at that FTLN
* **word** (+ word id) : returns information about that word
* **segment** (+ object id) : returns the text of that xml:id
* **text**: returns only the spoken text in that play
* **charText**: returns a list of characters arranged according to amount of lines spoken, with a link to each character's entire spoken text
* **charTextMinus**: returns a list of characters arranged according to amount of lines spoken, with a link to the play's spoken text, minus this character
* **concordance**: lists the words used (in spoken text) and their frequency
* **monologue** (+ optional line count): provides a list of speeches longer than the given line count (defaults to 30 lines)
* **onStage** (+ ftln): returns a list of characters on stage at that line
* **charChart**: provides a graphical representation of who is on stage across a timeline of the play
* **parts**: provides parts or cue scripts for each character
* **witScript**: provides "witScripts" for each character. "Witness" or "Witmore" scripts attempt to show what a character sees. They offer the play text only when that character is on stage.
* **sounds**: returns a list of all stage directions that contain sounds (i.e., "music," "flourish," "thunder")
* **scenes**: returns a list of all the scenes in the play

(cf. Official API Documentation)

They will be turned into Python functions.

Some general remarks on the documentation/api at that point:

* not machine-readable. The documentation of the API is indended for humans. One can use it, because it documents all (?) endpoints, offers a description of what an endpoint does; also has some examples.
* there are not examples for all functions, e.g. functions without examples: `segment`, `charText`, `charTextMinus`, `monologue`, `sounds`.
* The return format is not specified; it might be the case, that it's always HTML pages, but we will see.
* it might be actually hard to use the API because there is no way of getting possible values for parameters, e.g. how to know which characters are there to put in `https://www.folgerdigitaltexts.org/Mac/parts/{character-id}.html`. There are "discovery endpoints" missing, that would allow a user to retrieve these values, I think.

### Helper Functions
These are not functions provided by the Folger API. We need them for evaluating the API endpoint and handling the response.

In [14]:
def test_accept_header(url:str, mime:str) -> str:
    """Formats that can be used in the accept header.
    
    The function requests data from an URL sending a GET request with an 
    explicitly set "Accept" Header. It returns the Content-Type
    from the reponse header. If the endpoint supports different Media-Types
    it should be the same as in the Accept header of the request.
    
    Args:
        url (str): Url to send a request to.
        mime (str): Format to test, e.g. "application/xml".
    
    Returns:
        str: Content-Type from the Response Header. 
    """
    headers = {"Accept" : mime}
    r = requests.get(url, headers=headers)
    
    return r.headers["Content-Type"]
    

In [15]:
def test_accept_header_formats(url:str) -> dict:
    """Test for different response formats.
    
    Tests for some standard mime-types: 
        "text/html", "text/csv", "text/plain", "application/xml", 
        "application/tei+xml", "application/json"
    
    Args:
        url (str): A request URL to test against the formats.
    
    Returns:
        dict: A report containing the mime-type and value True/False.
            True means it is actually returned when requested.
    """
    #some standard types that we could accept
    mime_types_to_test = ["text/html",
                          "text/csv", 
                          "text/plain",
                          "application/xml", 
                          "application/tei+xml",
                          "application/json"]
    report = {}
    for mime_type in mime_types_to_test:
        headers = {"Accept" : mime_type}
        r = requests.get(url,headers=headers)
        if r.status_code == 200:
            content_type = r.headers["Content-Type"].split(";")[0].strip()
            if content_type == mime_type:
                #set to True if the mime type is actually returned
                report[mime_type] = True
            else:
                report[mime_type] = False
    
    return report

In [16]:
#test this function:
test_accept_header_formats("https://www.folgerdigitaltexts.org/TNK/synopsis/")

{'text/html': True,
 'text/csv': False,
 'text/plain': False,
 'application/xml': False,
 'application/tei+xml': False,
 'application/json': False}

In [17]:
def get(url:str, accept:str="text/html") -> bytes:
    """Send a get request.
    
    Helper Function to send a GET request to the Folger API.
    
    Args:
        url (str): request url of the endpoint. All params should be included.
        accept (str, optional): Content-Type to request in the Accept Header.
            Defaults to "text/html".
    Returns:
        bytes: Content of the GET request
    
    Raises:
        ConnectionErr: Endpoint didn't return a 200 status code.
        FormatErr: Content-Type is the requested format.
    """
    headers = {"Accept" : accept}
    r = requests.get(url, headers=headers)
    
    if r.status_code == 200:
        if r.headers["Content-Type"].split(";")[0].strip() == accept:
            return r.content
        else: 
            raise FormatErr("Response Body has a different Mime-Type: " + r.headers["Content-Type"].split(";")[0].strip() + ".")
    else:
        raise ConnectionErr("Sever returned status code " + str(r.status_code) + "." )
    

In [18]:
#test "get"
type(get("https://www.folgerdigitaltexts.org/TNK/synopsis/"))

bytes

### Functions to query Folger API Endpoints

#### `synopsis`

Description: (+ act/scene, optionally) returns a synopsis of the play and its scenes.

Examples: 

* https://www.folgerdigitaltexts.org/TNK/synopsis/
* https://www.folgerdigitaltexts.org/TNK/synopsis/5/EPI

In [19]:
#Check for supported mime-types: – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("https://www.folgerdigitaltexts.org/TNK/synopsis/")

The endpoint `synopsis` returns an HTML page only. We create a function to request the data, but do not parse it. This would be helpful, because then the data would be somewhat machine-readable. This would involve more testing of the actual results because a meaningful parsing of the HTML would be necessary. This is something, that could be implemented in an actual API wrapper.

The documentation of the endpoint declare that the parameters to request the synopsis of an act and a scene are optional (cf. the example queries above). Although we could define the function below (`synopsis`; will be kept for reference only), that would take this into account by declaring optional arguments, we can not document this endpoint in the OpenAPI Specification, because path parameters seem to be always required and can not be declared as "optional" (see `required: false` in the docstring, which is ignored by `APISpec`). Therefore we have to split the endpoint up into three separate endpoints: `synopsis_of_play`, `synopsis_of_act`, `synopsis_of_scene`.   

In [20]:
# "synopsis" as a function
#only for reference, as mentioned in the text above, we have to split it up into 3 functions.
#@api.route("/<path:playcode>/synopsis/<path:act>/<path:scene>", methods=["GET"])
def synopsis(playcode:str=None, act:str=None, scene:str=None) -> bytes:
    """synopsis
    
    Returns a synopsis of the play and its scenes.
    
    Args:
        playcode (str): ID of the play.
        act (str, optional): ID of the act
        scene (str, optional): ID of the scene
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        BadRequest: mandatory play code was not supplied.
    ---
    get:
        summary: synopsis
        description: Returns a synopsis of the play and its scenes.
        responses:
            200:
                description: successful.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
            -   in: path
                name: act
                description: ID of the act.
                schema:
                    type: string
                required: false
            -   in: path
                name: scene
                description: ID of the scene.
                schema:
                    type: string
                required: false        
    """
    if playcode == None:
        raise BadRequest("A playcode is mandatory.")
    
    #build the request url
    if act == None and scene == None:
        url = SERVICE_BASE + "/" + playcode + "/synopsis"
    
    elif act != None and scene == None:
        #can we use only act?
        url = SERVICE_BASE + "/" + playcode + "/synopsis/" + act
    elif act != None and scene != None:
        # act and scene as in the second example
        url = SERVICE_BASE + "/" + playcode + "/synopsis/" + act + "/" + scene 
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data        

In [21]:
#test the example: https://www.folgerdigitaltexts.org/TNK/synopsis/
playcode="TNK"
#instead of displaying the whole data here, we just check for the size
len(synopsis(playcode=playcode))

8038

In [22]:
#test the example: https://www.folgerdigitaltexts.org/TNK/synopsis/5/EPI
playcode = "TNK"
act="5"
scene="EPI"
len(synopsis(playcode=playcode, act=act, scene=scene))

149

In [23]:
@api.route("/<path:playcode>/synopsis", methods=["GET"])
def synopsis_of_play(playcode:str) -> bytes:
    """synopsis of play.
    
    Returns a synopsis of the play and its scenes.
    
    Args:
        playcode (str): ID of the play.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        BadRequest: mandatory play code was not supplied.
    ---
    get:
        tags:
            - synopsis
        summary: synopsis of a play
        description: Returns a synopsis of the play and its scenes.
        operationId: get_synopsis_of_play
        responses:
            200:
                description: successful. HTML page with the synopsis.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
    """
    if playcode == None:
        raise BadRequest("A playcode is mandatory.")
    
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/synopsis"
        
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data 

In [24]:
#synopsis of an act
@api.route("/<path:playcode>/synopsis/<path:act>", methods=["GET"])
def synopsis_of_act(playcode:str, act:str) -> bytes:
    """synopsis of an act.
    
    Returns a synopsis of an act and its scenes.
    
    Args:
        playcode (str): ID of the play.
        act (str): ID of the act.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode("A playcode must be supplied.")
        MissingAct("An act must be supplied.")
    ---
    get:
        tags:
            - synopsis
        summary: synopsis of an act
        description: Returns a synopsis of an act and its scenes.
        operationId: get_synopsis_of_act
        responses:
            200:
                description: successful. HTML page with the synopsis.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
            -   in: path
                name: act
                description: ID of the act.
                schema:
                    type: string
                required: true      
    """
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    if act == None:
        raise MissingAct("An act is mandatory.")
    
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/synopsis/" + act
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [25]:
#synopsis of a scene
@api.route("/<path:playcode>/synopsis/<path:act>/<path:scene>", methods=["GET"])
def synopsis_of_scene(playcode:str, act:str, scene:str) -> bytes:
    """synopsis of a scene.
    
    Returns a synopsis of a scene.
    
    Args:
        playcode (str): ID of the play.
        act (str): ID of the act.
        scene (str): ID of the scene.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode("A playcode must be supplied.")
        MissingAct("An act must be supplied.")
        MissingScene("A scene must be supplied.")
    ---
    get:
        tags:
            - synopsis
        summary: synopsis of a scene
        description: Returns a synopsis a scene.
        operationId: get_synopsis_of_scene
        responses:
            200:
                description: successful. HTML page with the synopsis.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
            -   in: path
                name: act
                description: ID of the act.
                schema:
                    type: string
                required: true
            -   in: path
                name: scene
                description: ID of the scene.
                schema:
                    type: string
                required: true
    """
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    if act == None:
        raise MissingAct("An act is mandatory.")
        
    if scene == None:
        raise MissingScene("A scene is mandatory.")
    
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/synopsis/" + act + "/" + scene 
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

##### parsing the response
Some notes on how the resonse (a HTML page) could be parsed to get a more processable format:

Everything is in `<body>` of the HTML.
We have a title

```
<h2>Synopsis of<i> Two Noble Kinsmen</i>:</h2>
```

The data is in paragraphs `<p>`:
```
<p>Arcite prays to Mars for victory; Palamon, to Venus for Emilia’s love. Both prayers are answered. Arcite wins, but dies after a riding accident. Palamon, spared from execution, marries Emilia.</p>
```

`<hr>` is used as a divider between the synopsis of a play an the acts.

Acts and scene numbers would have to be parsed from the plaintext of the synopsis by divider `:`.
```
<p>Act 1, scene 2: Two noble cousins, Palamon and Arcite, discuss leaving Thebes, where the reign of their despised uncle Creon has corrupted the state. News comes of Theseus’s advance on Thebes and, despite their hatred of Creon, they go to the city’s defense.</p>
```

There are some special acronymes, e.g. `EPI` for "Epilogue":
https://www.folgerdigitaltexts.org/TNK/synopsis/5/EPI

TODO: A parser could be built for the first synopsis endpoint as an example.

#### `tln` 

Description: (+ Folger through line number): returns the spoken text at that FTLN.

Examples:
* https://www.folgerdigitaltexts.org/WT/ftln/1201

Remarks: it's actually `ftln` in the URL of the endpoint, so maybe this must be changed in the original documentation.

In [26]:
#test for other response formats – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("https://www.folgerdigitaltexts.org/WT/ftln/1201")

Only `text/html` is available as a response format, but the HTML is structured (relevant for parsing the response):

```
<body>Winter’s Tale
<br>FTLN: 1201<br>Line: 3.1.27
<br> Speech: <a href="http://www.folgerdigitaltexts.org/WT/segment/sp-1196">sp-1196</a>
<br>
 Speaker: #Dion_WT
 <br>Type: short
 <br>Text:  
 <a href="http://www.folgerdigitaltexts.org/WT/word/w0175960" title="w0175960">And</a>
 <a href="http://www.folgerdigitaltexts.org/WT/word/c0175970" title="c0175970"> </a>
 <a href="http://www.folgerdigitaltexts.org/WT/word/w0175980" title="w0175980">gracious</a>
 <a href="http://www.folgerdigitaltexts.org/WT/word/c0175990" title="c0175990"> </a>
 <a href="http://www.folgerdigitaltexts.org/WT/word/w0176000" title="w0176000">be</a>
 <a href="http://www.folgerdigitaltexts.org/WT/word/c0176010" title="c0176010"> </a>
 <a href="http://www.folgerdigitaltexts.org/WT/word/w0176020" title="w0176020">the</a>
 <a href="http://www.folgerdigitaltexts.org/WT/word/c0176030" title="c0176030"> </a>
 <a href="http://www.folgerdigitaltexts.org/WT/word/w0176040" title="w0176040">issue</a>
 <a href="http://www.folgerdigitaltexts.org/WT/word/p0176050" title="p0176050">.</a>
 </body>
```

It references the `segment` and the `word` endpoints. That's interesting, because I assumed, that the endpoints were monolithic and separate. There might actually be a way to get the missing values of the parameters (see comment "some general remarks on the documentation/api" above).

In [27]:
@api.route("/<path:playcode>/ftln/<path:ftln>", methods=["GET"])
def ftln(playcode:str, ftln:str) -> bytes:
    """Text at FTLN (Folger through line number)
    
    returns the spoken text at that FTLN (Folger through line number)
    
    Args:
        playcode (str): ID of the play.
        ftln (str): Folger through line number.
    
    Returns:
        bytes: data returned by the endpoint.
    ---
    get:
        tags:
            - ftln
        summary: spoken text at that FTLN
        description: Returns the spoken text at that FTLN (Folger through line number).
        operationId: get_ftln
        responses:
            200:
                description: successful. HTML page with the text spoken at the ftln.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
            -   in: path
                name: ftln
                description: Folger through line number.
                schema:
                    type: string
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    if ftln == None:
        raise MissingFtln("Folger through line number (ftln) is mandatory.")
        
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/ftln/" + ftln
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data
    

In [28]:
#test the example https://www.folgerdigitaltexts.org/WT/ftln/1201
ftln("WT","1201")

b'Winter\xe2\x80\x99s Tale<br/>FTLN: 1201<br/>Line: 3.1.27<br/> Speech: <a href="http://www.folgerdigitaltexts.org/WT/segment/sp-1196">sp-1196</a><br/>\n Speaker: #Dion_WT<br/>\n Type: short<br/>Text:  <a href="http://www.folgerdigitaltexts.org/WT/word/w0175960" title="w0175960">And</a><a href="http://www.folgerdigitaltexts.org/WT/word/c0175970" title="c0175970"> </a><a href="http://www.folgerdigitaltexts.org/WT/word/w0175980" title="w0175980">gracious</a><a href="http://www.folgerdigitaltexts.org/WT/word/c0175990" title="c0175990"> </a><a href="http://www.folgerdigitaltexts.org/WT/word/w0176000" title="w0176000">be</a><a href="http://www.folgerdigitaltexts.org/WT/word/c0176010" title="c0176010"> </a><a href="http://www.folgerdigitaltexts.org/WT/word/w0176020" title="w0176020">the</a><a href="http://www.folgerdigitaltexts.org/WT/word/c0176030" title="c0176030"> </a><a href="http://www.folgerdigitaltexts.org/WT/word/w0176040" title="w0176040">issue</a><a href="http://www.folgerdigitalte

##### Parsing of `ftln` response
TODO: see response; could be parsed into an object maybe, but define a marshmallow schema first!

#### `word`

Description: (+ word id) : returns information about that word.

Examples:
* https://www.folgerdigitaltexts.org/WT/word/w0176040
* https://www.folgerdigitaltexts.org/Ham/word/w0259380

In [29]:
#Check for supported mime-types: – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("https://www.folgerdigitaltexts.org/Ham/word/w0259380")

Only `text/html` is returned.

```
<body>
Hamlet
<br>Word (w0259380): stallion
<br>Speech (#Hamlet_Ham): <a href="http://www.folgerdigitaltexts.org/Ham/segment/sp-1639">sp-1639</a>
<br>FTLN: <a href="http://www.folgerdigitaltexts.org/Ham/ftln/1680">ftln-1680</a>
<br>Line: 2.2.616
<br>Emendation: text from the Folio not found in the Second Quarto
<br>Alternate reading: scullion (#print #adobe)
<br>

<br>View in <a href="http://earlyprint.wustl.edu/tooleebospellingbrowserv2.html?requestFromClient={%221%22:{%22spe%22:%22%22,%22reg%22:%22stallion%22,%22lem%22:%22%22,%22pos%22:%22%22,%22originalPos%22:%22%22},%222%22:{%22spe%22:%22%22,%22reg%22:%22%22,%22lem%22:%22%22,%22pos%22:%22%22,%22originalPos%22:%22%22},%223%22:{%22spe%22:%22%22,%22reg%22:%22%22,%22lem%22:%22%22,%22pos%22:%22%22,%22originalPos%22:%22%22},%22instructionToggle%22:%22hide%22,%22databaseType%22:%22unigrams%22,%22smoothing%22:%22True%22,%22rollingAverage%22:%2220_year%22}" target="_blank">EEBO-TCP N-Gram Browser</a> (Humanities Digital Workshop at Washington University in St. Louis)<br>

</body>
```

In [30]:
@api.route("/<path:playcode>/word/<path:word_id>", methods=["GET"])
def word(playcode:str, word_id:str) -> bytes:
    """Information about a word
    
    returns information about a word.
    
    Args:
        playcode (str): ID of the play.
        word_id (str): ID of the word.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
        MissingWord: ID of the word must be supplied.
    ---
    get:
        tags:
            - word
        summary: information about a word
        description: Returns information about a word.
        operationId: get_word
        responses:
            200:
                description: successful. HTML page with the text spoken at the ftln.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
            -   in: path
                name: word_id
                description: ID of the word.
                schema:
                    type: string
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    if ftln == None:
        raise MissingWord("An ID of a word is mandatory.")
        
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/word/" + word_id
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [31]:
#test the hamlet example:
#https://www.folgerdigitaltexts.org/Ham/word/w0259380
word("Ham", "w0259380")

b'Hamlet<br/>Word (w0259380): stallion<br/>Speech (#Hamlet_Ham): <a href="http://www.folgerdigitaltexts.org/Ham/segment/sp-1639">sp-1639</a>\n<br/>FTLN: <a href="http://www.folgerdigitaltexts.org/Ham/ftln/1680">ftln-1680</a><br/>\nLine: 2.2.616<br/>\nEmendation: text from the Folio not found in the Second Quarto<br/>\nAlternate reading: scullion (#print #adobe)<br/>\n<br/>View in <a href="http://earlyprint.wustl.edu/tooleebospellingbrowserv2.html?requestFromClient={%221%22:{%22spe%22:%22%22,%22reg%22:%22stallion%22,%22lem%22:%22%22,%22pos%22:%22%22,%22originalPos%22:%22%22},%222%22:{%22spe%22:%22%22,%22reg%22:%22%22,%22lem%22:%22%22,%22pos%22:%22%22,%22originalPos%22:%22%22},%223%22:{%22spe%22:%22%22,%22reg%22:%22%22,%22lem%22:%22%22,%22pos%22:%22%22,%22originalPos%22:%22%22},%22instructionToggle%22:%22hide%22,%22databaseType%22:%22unigrams%22,%22smoothing%22:%22True%22,%22rollingAverage%22:%2220_year%22}" target="_blank">EEBO-TCP N-Gram Browser</a> (Humanities Digital Workshop at Wash

##### Parsing `word` response
TODO: marshmallow schema of the object, + parser

#### `segment`

Description: (+ object id) : returns the text of that xml:id.

Examples: None.

An example can be found in the response of the `ftln` example: http://www.folgerdigitaltexts.org/Ham/segment/sp-1639

In [32]:
#Check for supported mime-types: – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("http://www.folgerdigitaltexts.org/Ham/segment/sp-1639")

As we expected, only HTML will be returned. The response looks like this:

```
<body><span style="font-weight:bold">HAMLET</span> <br>
Ay, so, good-bye to you.<br>
 <span style="font-style:italic">Rosencrantz and Guildenstern exit.</span><br>
Now I am alone.<br>
O, what a rogue and peasant slave am I!<br>
Is it not monstrous that this player here,<br>
But in a fiction, in a dream of passion,<br>
Could force his soul so to his own conceit<br>
That from her working all his visage wanned,<br>
Tears in his eyes, distraction in his aspect,<br>
A broken voice, and his whole function suiting<br>
With forms to his conceit—and all for nothing!<br>
For Hecuba!<br>
<!-- ... -->
</body>
```

Suppose, this is an HTML rendering of the `<tei:sp>`. It doesn't really make sense to parse this out, because a meaningful representation would be the actual TEI and such an upconvert is not necessary or do-able. We could think of getting this as plaintext by stripping the HTML. Maybe this would be a useful format.

In [33]:
@api.route("/<path:playcode>/segment/<path:object_id>", methods=["GET"])
def segment(playcode:str, object_id:str) -> bytes:
    """Text of a segment
    
    returns the text of that xml:id.
    
    Args:
        playcode (str): ID of the play.
        object_id (str): ID of the segment.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
        MissingSegID: ID of the word must be supplied.
    ---
    get:
        tags:
            - segment
        summary: text of a segment
        description: Returns the text of that xml:id.
        operationId: get_segment
        responses:
            200:
                description: successful. HTML page with the text spoken at the ftln.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
            -   in: path
                name: object_id
                description: ID of the segment.
                schema:
                    type: string
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    if ftln == None:
        raise MissingSegID("An ID of a segment is mandatory.")
        
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/segment/" + object_id
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [34]:
# testing example: "http://www.folgerdigitaltexts.org/Ham/segment/sp-1639"
len(segment("Ham","sp-1639"))

2900

#### `text`

Description: returns only the spoken text in that play.

Examples:
* https://www.folgerdigitaltexts.org/WT/text/

In [35]:
#Check for supported mime-types: – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("https://www.folgerdigitaltexts.org/WT/text/")

Returns only `text/html`. Result is a HTML page with the text in `<body>` and line-break elements `<br>`. Useful format would be plaintext, but maybe there are some additional tags (italics?), that if expected more closely.

In [36]:
@api.route("/<path:playcode>/text>", methods=["GET"])
def text(playcode:str) -> bytes:
    """Spoken text in a play
    
    returns only the spoken text in that play.
    
    Args:
        playcode (str): ID of the play.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
    ---
    get:
        tags:
            - text
        summary: Spoken text in a play
        description: Returns only the spoken text in that play.
        operationId: get_text
        responses:
            200:
                description: successful. HTML page with the text.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/text"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [37]:
#test the example: https://www.folgerdigitaltexts.org/WT/text/
len(text("WT"))

149008

#### `charText`

Description: returns a list of characters arranged according to amount of lines spoken, with a link to each character's entire spoken text.

Examples: None.

In [38]:
#Check for supported mime-types: – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("https://www.folgerdigitaltexts.org/WT/charText")

An example, that would work: https://www.folgerdigitaltexts.org/WT/charText. I looked into the response; the links point to an endpoint, that returns the text of a character, e.g. 
https://folgerdigitaltexts.org/WT/charText/WT_Mopsa.html. The `.html` is important, otherwhise it doesn't work. It only returns HTML, I guess. 
The OpenAPI Documentation will have two endpoints. The first one is more of a discovery endpoint. This should be parsed to extract the speaking characters, which might be needed as  parameter-values for some other endpoints.

In [39]:
@api.route("/<path:playcode>/charText", methods=["GET"])
def char_text(playcode:str) -> bytes:
    """Character's Text
    
    returns a list of characters arranged according to amount of lines spoken, with a link to each character's entire spoken text.
    
    Args:
        playcode (str): ID of the play.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
    ---
    get:
        tags:
            - charText
        summary: Character's Text
        description: Returns a list of characters arranged according to amount of lines spoken, with a link to each character's entire spoken text.
        operationId: get_character_texts
        responses:
            200:
                description: successful. HTML page with list of characters arranged according to amount of lines spoken, with a link to each character's entire spoken text.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/charText"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [40]:
#test the example: https://www.folgerdigitaltexts.org/WT/charText
char_text("WT")

b'<html>\r\n<head>\r\n<meta charset=\'utf-8\'>\r\n</head>\r\n<body>\r\n<div style="float:left; width: 60px;"><b>Words</b></div><div style="float:left;"><b>Character</b></div><br/>\r\n<div style="float:left; width: 60px;">4903</div><div style="float:left;"><a href="WT_Leontes.html">Leontes</a></div><br/>\r\n<div style="float:left; width: 60px;">2395</div><div style="float:left;"><a href="WT_Autolycus.html">Autolycus</a></div><br/>\r\n<div style="float:left; width: 60px;">2390</div><div style="float:left;"><a href="WT_Paulina.html">Paulina</a></div><br/>\r\n<div style="float:left; width: 60px;">2094</div><div style="float:left;"><a href="WT_Camillo.html">Camillo</a></div><br/>\r\n<div style="float:left; width: 60px;">1960</div><div style="float:left;"><a href="WT_Polixenes.html">Polixenes</a></div><br/>\r\n<div style="float:left; width: 60px;">1597</div><div style="float:left;"><a href="WT_ShepherdsSon.html">ShepherdsSon</a></div><br/>\r\n<div style="float:left; width: 60px;">1584</div><

In [41]:
@api.route("/<path:playcode>/charText/<path:character_id>.html", methods=["GET"])
def char_text_by_character_id(playcode:str, character_id:str) -> bytes:
    """A single character's text
    
    returns the character's entire spoken text.
    
    Args:
        playcode (str): ID of the play.
        character_id (str): ID of the character.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
        MissingCharID: ID of the character must be supplied.
    ---
    get:
        tags:
            - charText
        summary: Single character's text
        description: Returns the character's entire spoken text.
        operationId: get_character_text_by_id
        responses:
            200:
                description: successful. HTML page with spoken text of a character.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
            -   in: path
                name: character_id
                description: ID of the character.
                schema:
                    type: string
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
        
    if character_id == None:
        raise MissingCharID("An ID of a character is mandatory.")
    
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/charText/" + character_id + ".html"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [42]:
#test the example https://folgerdigitaltexts.org/WT/charText/WT_Mopsa.html
#the amount is not correct when simply counting! but it still works
len(char_text_by_character_id("WT", "WT_Mopsa"))

901

TODO: The functions work here in the notebook but not in the Swagger Interface. This might be due to a redirect or something. Have to check in Postman.

##### Parsing `charText`

#### `charTextMinus`

Description: returns a list of characters arranged according to amount of lines spoken, with a link to the play's spoken text, minus this character.

Examples: None.

I don't really get this function. Not very well documented. Skip for now. Maybe also add to the "charText" Tag. 

It seems to return the same list of characters as its sister function `charText` but the links point to a different endpoint, e.g. https://folgerdigitaltexts.org/WT/charTextMinus/WT_Mopsa.html The text returned by the `charTextMinus` is way longer than the text return by the `charText`; so I suppose, it's the whole spoken text of the play except the text spoken by the mopsa character. This is also what the original documentation indicates.
I think, this is a very specific function, what would be the idea behind that? Easily compare the characters text to all other text?
Anyways, we implement the same endpoints as above, just replacing the function part in the URL. Probably, the same parsing could be used.

In [43]:
@api.route("/<path:playcode>/charTextMinus", methods=["GET"])
def char_text_minus(playcode:str) -> bytes:
    """charTextMinus
    
    returns a list of characters arranged according to amount of lines spoken, with a link to the play's spoken text, minus this character.
    
    Args:
        playcode (str): ID of the play.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
    ---
    get:
        tags:
            - charText
        summary: Character's Text
        description: Returns a list of characters arranged according to amount of lines spoken, with a link to the play's spoken text, minus this character.
        operationId: get_character_texts_minus
        responses:
            200:
                description: successful. HTML page with list of characters arranged according to amount of lines spoken, with a link to each character's entire spoken text.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/charTextMinus"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [44]:
@api.route("/<path:playcode>/charTextMinus/<path:character_id>.html", methods=["GET"])
def char_text_minus_by_character_id(playcode:str, character_id:str) -> bytes:
    """All others characters texts (except this character)
    
    returns all play's spoken text, except the text of the indicated character.
    
    Args:
        playcode (str): ID of the play.
        character_id (str): ID of the character.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
        MissingCharID: ID of the character must be supplied.
    ---
    get:
        tags:
            - charText
        summary: Single character's text
        description: Returns the character's entire spoken text.
        operationId: get_all_character_text_minus_character
        responses:
            200:
                description: successful. HTML page with the play's text without the text of the character.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
            -   in: path
                name: character_id
                description: ID of the character.
                schema:
                    type: string
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
        
    if character_id == None:
        raise MissingCharID("An ID of a character is mandatory.")
    
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/charTextMinus/" + character_id + ".html"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

#### `concordance`

Description: lists the words used (in spoken text) and their frequency.

Examples:
* https://www.folgerdigitaltexts.org/WT/concordance/

In [45]:
#Check for supported mime-types: – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("https://www.folgerdigitaltexts.org/WT/concordance")

Returns only `text/html` – as always. The result is a list with a count and type, e.g. `756: The` ordered by frequency. We don't really know where the types a comming from and if they are somewhat preprocessed. They are probably token-types, because there are inflected forms, e.g. `5: Killed`.

```
<h2>Concordance of <i>Winter’s Tale</i>:</h2>756: The<br/>
628: I<br/>
623: And<br/>
620: To<br/>
470: Of<br/>
449: You<br/>
404: My<br/>
402: A<br/>
319: That<br/>
305: Not<br/>
<!-- ... -->
1: ’shrew<br/>
1: ’twill<br/>
1: ’twould<br/>
</body>
```

A `.csv` or `.tsv` would be a good format to represent it.

In [46]:
@api.route("/<path:playcode>/concordance", methods=["GET"])
def concordance(playcode:str) -> bytes:
    """Concordance
    
    lists the words used (in spoken text) and their frequency.
    
    Args:
        playcode (str): ID of the play.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
    ---
    get:
        tags:
            - concordance
        summary: concordance
        description: Lists the words used (in spoken text) and their frequency.
        operationId: get_concordance
        responses:
            200:
                description: successful. HTML page with token-count and type.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/concordance"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [47]:
#test the example: https://www.folgerdigitaltexts.org/WT/concordance
len(concordance("WT")) #this might be byes, for sure, not the number of entries!

60585

##### parsing `concordance`
TODO: This would be an easy endpoint to parse: use a reqex to split after a number and a `:`

#### `monologue`

Description:  (+ optional line count): provides a list of speeches longer than the given line count (defaults to 30 lines).

Examples: None.

Two examples, that can be easily constructed:
* https://www.folgerdigitaltexts.org/WT/monologue
* https://www.folgerdigitaltexts.org/WT/monologue/10

The optional parameter `line_count` is a path parameter as well, resulting in two endpoints in the OpenAPI Documentation. 

In the (HTML) response there is a link to the `segment` endpoint pointing to the segment that is considered a monologue. 

```
Hermione (33): <a href="http://www.folgerdigitaltexts.org/WT/segment/sp-1224">Since what I am to say must be but that...</a><br/>
```

It should be fairly easy to parse. 


A challenge could be to link the segment to the character because the ID of the characer is not included. There is a label of the charater, so the ID can be guessed only: For the example above it should be something like `WT_Hermione`, bit it is not always that simple:

```
Shepherd’s Son (15): <a href="http://www.folgerdigitaltexts.org/WT/segment/sp-1563">I would you did but see how it chafes,...</a><br/>
```

Either it's necessary to look at the output of a different endpoint, e.g. in the `charText` output, we can find the following character:

```
<div style="float:left; width: 60px;">1597</div><div style="float:left;"><a href="WT_ShepherdsSon.html">ShepherdsSon</a></div><br/>
```

So the ID of the character would be `WT_ShepherdsSon`. 

Another option would be to follow the link included and see, if the ID of the character can be found there: the answer is NO, because https://folgerdigitaltexts.org/WT/segment/sp-1563 displays the speech, but doesn't have an ID included. This is really complicating using the endpoints programmatically.

Another note on this monologue endpoint: Here we have a function, that operationalizes a concept of literary studies "Monologue", which is here approximated by the number of text lines.


In [48]:
#Check for supported mime-types: – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("https://www.folgerdigitaltexts.org/WT/monologue")

In [49]:
# a single function, that can be used by the two endpoints?
@api.route("/<path:playcode>/monologue", methods=["GET"])
def monologue(playcode:str, line_count:int=30) -> bytes:
    """Monologue
    
    provides a list of speeches longer than the given line count (defaults to 30 lines).
    
    Args:
        playcode (str): ID of the play.
        line_count (int, optional): Number of lines. Defaults to 30.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
    ---
    get:
        tags:
            - monologue
        summary: monologue
        description: list of speeches longer than 30 lines.
        operationId: get_monolouges_by_30_lines
        responses:
            200:
                description: successful. HTML page with segments considered a monolouge.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    #build the request url
    #handle also an individual line count
    if line_count == 30:
        url = SERVICE_BASE + "/" + playcode + "/monologue"
    else:
        url = SERVICE_BASE + "/" + playcode + "/monologue/" + str(line_count)
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [50]:
@api.route("/<path:playcode>/monologue/<path:line_count>", methods=["GET"])
def monologue_by_line_count(playcode:str, line_count:int=30) -> bytes:
    """Monologue
    
    provides a list of speeches longer than the given line count (defaults to 30 lines).
    This function does the same as ``monologue`` and is only necessary 
    to have two endpoints. 
    
    Args:
        playcode (str): ID of the play.
        line_count (int, optional): Number of lines. Defaults to 30.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
    ---
    get:
        tags:
            - monologue
        summary: monologue by line count
        description: list of speeches longer than a given line count.
        operationId: get_monolouges_by_line_count
        responses:
            200:
                description: successful. HTML page with segments considered a monolouge.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
            -   in: path
                name: line_count
                description: Minimum number of lines.
                schema:
                    type: string
                required: true
                example: 30
    """
    #get data from the function monologue
    data = monologue(playcode, line_count)
    
    return data

#### `onStage`

Description: (+ ftln): returns a list of characters on stage at that line 

Examples: 
* https://www.folgerdigitaltexts.org/WT/onStage/1196

In [51]:
#Check for supported mime-types: – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("https://www.folgerdigitaltexts.org/WT/onStage/1196")

Responses are `text/html` only and contain the names of the characters (no IDs, only the labels):

```
<body>
<h2>Characters on stage at<br>line 1196 of <i>Winter’s Tale</i>:</h2>
Cleomenes<br>
Dion<br>
</body>
```

It might be very easy to parse, but because the response is lacking any IDs, it is quite difficult to come up with a meaingful format.

In [52]:
@api.route("/<path:playcode>/onStage/<path:ftln>", methods=["GET"])
def on_stage(playcode:str, ftln:str) -> bytes:
    """Characters on stage
    
    returns a list of characters on stage at a line identified by a Folger through line number.
    
    Args:
        playcode (str): ID of the play.
        ftln (int, optional): Folger through line number (ftln).
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
        MissingLineNo: Folger through line number must be supplied.
    ---
    get:
        tags:
            - onStage
        summary: onStage
        description: list of characters on stage at a line identified by a Folger through line number (ftln).
        operationId: get_on_stage
        responses:
            200:
                description: successful. HTML page with character names.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
            -   in: path
                name: ftln
                description: Folger through line number.
                schema:
                    type: string
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
        
    if ftln == None:
        raise MissingLineNo("A line number (ftln) is mandatory.")
    
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/onStage/" + ftln
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [53]:
#test example: https://www.folgerdigitaltexts.org/WT/onStage/1196
on_stage("WT", "1196")

b'<h2>Characters on stage at<br/>line 1196 of <i>Winter\xe2\x80\x99s Tale</i>:</h2>Cleomenes<br/>Dion<br/>'

#### `charChart`

Description: provides a graphical representation of who is on stage across a timeline of the play.

Example:
* https://www.folgerdigitaltexts.org/WT/charChart/

This endpoint returns a visualization (in html format). I is intended for humans, parsing doesn't make sense, probably.

In [54]:
@api.route("/<path:playcode>/charChart", methods=["GET"])
def char_chart(playcode:str) -> bytes:
    """Timeline chart of characters
    
    provides a graphical representation of who is on stage across a timeline of the play.
    
    Args:
        playcode (str): ID of the play.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
    ---
    get:
        tags:
            - charChart
        summary: character chart
        description: provides a graphical representation of who is on stage across a timeline of the play.
        operationId: get_characters_chart
        responses:
            200:
                description: successful. HTML page with a graphical representation of who is on stage across a timeline.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
        
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/charChart"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

#### `parts`

Description: provides parts or cue scripts for each character.

Examples:
* https://www.folgerdigitaltexts.org/WT/parts/
* https://www.folgerdigitaltexts.org/WT/parts/Dion.html
* https://www.folgerdigitaltexts.org/WT/parts/Bear.html
* https://www.folgerdigitaltexts.org/Mac/parts/Porter.html

In [55]:
#Check for supported mime-types: – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("https://www.folgerdigitaltexts.org/WT/parts/")

Returns HTML only. There will be two endpoints in the OpenAPI documentation. The first endpoint returns list of links `<a>`, e.g. `<a href="Antigonus.html">Antigonus</a><br/>` to the second endpoint. "cue scripts" are something very specific; should look into the TEI if this is somehow marked up; maybe it only returns the line before a character speaks or acts. To be clarified!

In [56]:
@api.route("/<path:playcode>/parts", methods=["GET"])
def parts(playcode:str) -> bytes:
    """parts or cue scripts
    
    provides parts or cue scripts for each character.
    
    Args:
        playcode (str): ID of the play.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
    ---
    get:
        tags:
            - parts
        summary: parts for each character
        description: provides parts or cue scripts for each character.
        operationId: get_list_of_character_parts
        responses:
            200:
                description: successful. HTML page with links to cue scripts for the characters of a play.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
        
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/parts"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [57]:
@api.route("/<path:playcode>/parts/<path:character>.html", methods=["GET"])
def parts_of_character(playcode:str, character:str) -> bytes:
    """parts or cue scripts for a character
    
    provides parts or cue scripts for a single character.
    
    Args:
        playcode (str): ID of the play.
        character (str): Identifier of the character.
            Could also be the name. It is not {playcode}_{character}, but only the label.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
        MissingCharacter: A character must be supplied.
    ---
    get:
        tags:
            - parts
        summary: parts for character
        description: provides parts or cue scripts for a single character.
        operationId: get_parts_of_character
        responses:
            200:
                description: successful. HTML page with links to cue scripts for a single characters of a play.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
            -   in: path
                name: character
                description: Name (?) of a character. Not in the ID-format {playcode}_{character}, but only the label.
                schema:
                    type: string
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    if character == None:
        raise MissingCharacter("A character is mandatory.")
        
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/parts/" + character + ".html"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In the Swagger Editor, we have a connection or somewhat else error "Failed to fetch." It's the same for the `charChart` and the `charText` like endpoints. It has often something to do with characters, also the `.html` ending is something suspicious. This might have something to do with the SwaggerEditor at https://editor.swagger.io/, must test this in a different setting.

In [58]:
# It works here:
#parts_of_character("Ham", "Hamlet")

#### `witScript`

Description: provides "witScripts" for each character. "Witness" or "Witmore" scripts attempt to show what a character sees. They offer the play text only when that character is on stage.

Examples:
* https://www.folgerdigitaltexts.org/Ham/witScript/
* https://www.folgerdigitaltexts.org/Ham/witScript/Polonius.html

In [59]:
#Check for supported mime-types: – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("https://www.folgerdigitaltexts.org/Ham/witScript/")

Endpoint returns HTML only. The first endpoint offers a list with links to the witness scripts. The second endpoint returns the witness script for a given character:

```
<body>
<a href="AMBASSADORS.html">AMBASSADORS</a><br/>
<!-- ...-->
</body>
```

The response would be easy to parse; In general it is similar to the endpoint `parts`. Very specific and research driven functionality.

Same problem with the Swagger Editor.

In [60]:
@api.route("/<path:playcode>/witScript", methods=["GET"])
def wit_script(playcode:str) -> bytes:
    """Witness scripts
    
    provides links to "witScripts" for each character. "Witness" or "Witmore" scripts attempt to show what a character sees. 
    They offer the play text only when that character is on stage.
    
    Args:
        playcode (str): ID of the play.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
    ---
    get:
        tags:
            - witScript
        summary: witness scripts for each character
        description: provides links to witness scripts for each character.
        operationId: get_list_of_witness_scripts
        responses:
            200:
                description: successful. HTML page with links to witness scripts for the characters of a play.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
        
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/witScript"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [61]:
@api.route("/<path:playcode>/witScript/<path:character>.html", methods=["GET"])
def wit_script_of_character(playcode:str, character:str) -> bytes:
    """Witness script for a character
    
    provides a "Witness" or "Witmore" script for a given character. "witScripts" attempt to show what a character sees. 
    They offer the play text only when that character is on stage.
    
    Args:
        playcode (str): ID of the play.
        character (str): Identifier of the character.
            Could also be the name. It is not {playcode}_{character}, but only the label.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
        MissingCharacter: A character must be supplied.
    ---
    get:
        tags:
            - witScript
        summary: witness script for a character
        description: provides a "Witness" or "Witmore" script for a given character. "witScripts" attempt to show what a character sees. 
            They offer the play text only when that character is on stage.
        operationId: get_witness_script_of_character
        responses:
            200:
                description: successful. HTML page with links to the witness script for a single characters of a play.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
            -   in: path
                name: character
                description: Name (?) of a character. Not in the ID-format {playcode}_{character}, but only the label.
                schema:
                    type: string
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
    
    if character == None:
        raise MissingCharacter("A character is mandatory.")
        
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/witScript/" + character + ".html"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [62]:
#testing example: https://www.folgerdigitaltexts.org/Ham/witScript
wit_script("Ham")

b'<html>\r\n<head>\r\n<meta charset=\'utf-8\'>\r\n</head>\r\n<body>\r\n<a href="AMBASSADORS.html">AMBASSADORS</a><br/>\r\n<a href="ATTENDANTS.html">ATTENDANTS</a><br/>\r\n&nbsp;&nbsp;<a href="ATTENDANTS.1.html">ATTENDANTS.1</a><br/>\r\n&nbsp;&nbsp;<a href="ATTENDANTS.2.html">ATTENDANTS.2</a><br/>\r\n&nbsp;&nbsp;<a href="ATTENDANTS.GENTLEMEN.html">ATTENDANTS.GENTLEMEN</a><br/>\r\n<a href="ATTENDANTS.GUARDS.html">ATTENDANTS.GUARDS</a><br/>\r\n<a href="Barnardo.html">Barnardo</a><br/>\r\n<a href="Claudius.html">Claudius</a><br/>\r\n<a href="Cornelius.html">Cornelius</a><br/>\r\n<a href="Doctor.html">Doctor</a><br/>\r\n<a href="FOLLOWERS.LAERTES.html">FOLLOWERS.LAERTES</a><br/>\r\n<a href="Fortinbras.html">Fortinbras</a><br/>\r\n<a href="Francisco.html">Francisco</a><br/>\r\n<a href="Gertrude.html">Gertrude</a><br/>\r\n<a href="Ghost.html">Ghost</a><br/>\r\n<a href="Gravedigger.html">Gravedigger</a><br/>\r\n<a href="GravediggersCompanion.html">GravediggersCompanion</a><br/>\r\n<a href="Gui

In [63]:
#testing https://www.folgerdigitaltexts.org/Ham/witScript/Polonius.html:
len(wit_script_of_character("Ham","Polonius"))

87719

#### `sounds`

Description: returns a list of all stage directions that contain sounds (i.e., "music," "flourish," "thunder").

Examples: None.

We use this example for testing: https://www.folgerdigitaltexts.org/Ham/sounds

In [64]:
#Check for supported mime-types: – uncomment to run, this slows down the notebook when running all cells
#test_accept_header_formats("https://www.folgerdigitaltexts.org/Ham/sounds")

The endpoint returns HTML only. 

```
<h2>Sounds in <i>Hamlet</i>:</h2>SD 1.2.0: Flourish (flourish): 
Flourish
.
<br/>SD 1.2.132.1: Flourish (flourish): 
Flourish
.
<br/>
<!-- .. -->
```

`SD 1.2.132.1` somehow references a stage direction, but in the response, there is no link included that would allow a client to access this stage direction. Do we have a stage direction endpoint? No.

In [65]:
@api.route("/<path:playcode>/sounds", methods=["GET"])
def sounds(playcode:str) -> bytes:
    """Sounds
    
    returns a list of all stage directions that contain sounds (i.e., "music," "flourish," "thunder").
    
    Args:
        playcode (str): ID of the play.
    
    Returns:
        bytes: data returned by the endpoint.
    
    Raises:
        MissingPlaycode: ID of the play must be supplied.
    ---
    get:
        tags:
            - sounds
        summary: sounds
        description: returns a list of all stage directions that contain sounds (i.e., "music," "flourish," "thunder").
        operationId: get_sounds
        responses:
            200:
                description: successful. HTML page with stage directions that contain sounds.
                content:
                    text/html:
                        schema:
                            type: string
        parameters:
            -   in: path
                name: playcode
                description: ID of the play.
                schema: play_codes_schema
                required: true
    """
    
    if playcode == None:
        raise MissingPlaycode("A playcode is mandatory.")
        
    #build the request url
    url = SERVICE_BASE + "/" + playcode + "/sounds"
    
    # call function to send the actual request using the get helper function
    data = get(url, accept="text/html")
    
    return data

In [66]:
#testing the example
sounds("Ham")

b'<h2>Sounds in <i>Hamlet</i>:</h2>SD 1.2.0: Flourish (flourish): \nFlourish\n.\n<br/>SD 1.2.132.1: Flourish (flourish): \nFlourish\n.\n<br/>SD 1.4.7.1: Flourish (trumpet ordnance): \nA\n \nflourish\n \nof\n \ntrumpets\n \nand\n \ntwo\n \npieces\n \ngoes\n \noff\n.\n<br/>SD 2.2.0: Flourish (flourish): \nFlourish\n.\n<br/>SD 2.2.391.1: Flourish (flourish): \nA\n \nflourish\n \nfor\n \nthe\n \nPlayers\n.\n<br/>SD 3.2.95.1: Flourish (flourish): \nSound\n \na\n \nflourish\n.\n<br/>SD 3.2.97.1: Flourish (trumpet drum): \nEnter\n \nTrumpets\n \nand\n \nKettle\n \nDrums\n.\n<br/>SD 3.2.144.1: Flourish (trumpet): \nThe\n \ntrumpets\n \nsounds\n.\n<br/>SD 5.2.238.1: Flourish (trumpet drum): \nTrumpets\n,\n \nDrums\n<br/>SD 5.2.298.1: Flourish (trumpet): \nTrumpets\n \nthe\n \nwhile\n.\n<br/>SD 5.2.301.2: Flourish (drum trumpet ordnance): \nDrum\n,\n \ntrumpets\n,\n \nand\n \nshot\n.\n<br/>SD 5.2.384.1: Military (march drum ordnance): \nA\n \nmarch\n \nafar\n \noff\n \nand\n \nshot\n \nwithin\n.

## Setting up the OpenAPI Documentation

In [67]:
#Where does this show up?
INFO = dict(
        description="""
        This is an unoffical documentation of the API of the Folger Shakespeare Project.
        The offical documentation can be found here: https://www.folgerdigitaltexts.org/api.
        """.strip() ,
        version="1.0",
        contact=dict(
            name="Folger Shakespeare Project",
            email="info@folger.edu"
            ), 
        license=dict(
            name="License Unknown",
            url='None'
            )
        )

In [68]:
#Description of the Servers
SERVERS = [
        dict(
            description="Folger API",
            url=SERVICE_BASE
            )
        ]

In [69]:
# Tags
# Because some of the endpoints are split up due to path-variables,
# we can define tags to bundle them in the documentation.
TAGS = [
    dict(
        name="synopsis",
        description="synopsis of the play and its scenes"
    ),
    dict(
        name="ftln",
        description="text at Folger through line number"
    ),
    dict(
        name="word",
        description="information about a word"
    ),
    dict(
        name="segment",
        description="text of a segment identified by xml:id"
    ),
    dict(
        name="text",
        description="spoken text in a play"
    ),
    dict(
        name="charText",
        description="character's text"
    ),
    dict(
        name="concordance",
        description="words used (in spoken text) and their frequency"
    ),
    dict(
        name="monologue",
        description="speeches longer than the given line count"
    ),
    dict(
        name="onStage",
        description="characters on stage"
    ),
    dict(
        name="charChart",
        description="character chart"
    ),
    dict(
        name="parts",
        description="parts/cue scripts"
    ),
    dict(
        name="witScript",
        description="witness scripts"
    ),
    dict(
        name="sounds",
        description="sounds"
    )
]

In [70]:
#Gernerate the API Specification
spec = APISpec(
    title="Folger Shakespeare API Tools",
    version="1.0",
    openapi_version="3.0.3",
    info = INFO,
    servers = SERVERS,
    externalDocs=dict(
            description="OpenAPI Documentation",
            url="https://github.com/ingoboerner/folger-shakespeare-openapi"
        ),
    tags = TAGS,
    plugins=[FlaskPlugin(), MarshmallowPlugin()]
)

In [71]:
with api.test_request_context():
    #spec.path(view=synopsis) #this is the legacy synopsis function which was replace by 3
    spec.path(view=synopsis_of_play)
    spec.path(view=synopsis_of_act)
    spec.path(view=synopsis_of_scene)
    spec.path(view=ftln)
    spec.path(view=word)
    spec.path(view=segment)
    spec.path(view=text)
    spec.path(view=char_text)
    spec.path(view=char_text_by_character_id)
    spec.path(view=char_text_minus)
    spec.path(view=char_text_minus_by_character_id)
    spec.path(view=concordance)
    spec.path(view=monologue)
    spec.path(view=monologue_by_line_count)
    spec.path(view=on_stage)
    spec.path(view=char_chart)
    spec.path(view=parts)
    spec.path(view=parts_of_character)
    spec.path(view=wit_script)
    spec.path(view=wit_script_of_character)
    spec.path(view=sounds)
    

## Exporting the OpenAPI Documentation

In [72]:
#write the specification as JSON
with open('openapi.json', 'w') as f:
    json.dump(spec.to_dict(), f)

In [73]:
#write the specification as YAML
with open('openapi.yaml', 'w') as f:
    f.write(spec.to_yaml())