# Evaluation on mixed-metre poetry

This Notebook contains the evaluation metrics for [`Rantanplan`](https://pypi.org/project/rantanplan/0.4.3/) v0.4.3

In [1]:
from datetime import datetime
print(f"Last run: {datetime.utcnow().strftime('%B %d %Y - %H:%M:%S')}")

Last run: June 19 2020 - 08:39:51


## Setup

For the evaluation of mixed-metre poetry we used Antonio Carvajal's annotated _Extravagante jerarquía (1958-1982)_, Madrid: Hiperión, 1983. Due to copyright issues we cannot redistribute the corpus. Therefore, the next analysis uses the outputs from both Navarro-Colorado's system and Rantanplan

In [2]:
!pip install -q pandas numpy "spacy<2.3.0" spacy_affixes

In [3]:
%%bash --out _
# pip install https://github.com/explosion/spacy-models/archive/es_core_news_md-2.2.5.zip
python -m spacy download es_core_news_md
python -m spacy_affixes download es

In [4]:
!pip install -q "rantanplan==0.4.3"

In [5]:
import math
import pandas as pd
import numpy as np
from glob import glob
from xml.etree import ElementTree


def clean_text(string):
    output = string.strip()
    # replacements = (("“", '"'), ("”", '"'), ("//", ""), ("«", '"'), ("»",'"'))
    replacements = (("“", ''), ("”", ''), ("//", ""), ("«", ''), ("»",''))
    for replacement in replacements:
        output = output.replace(*replacement)
    output = re.sub(r'(?is)\s+', ' ', output)
    output = re.sub(r"(\w)-(\w)", r"\1\2", output)  # "Villa-nueva" breaks Navarro-Colorado's system
    return output

def num2sym(metric, length):
    if "/" in metric:
        hemi1, hemi2 = metric.split("/")
        return num2sym(hemi1, math.floor(length / 2)) + num2sym(hemi2, math.ceil(length / 2))
    else:
        symbols = int(length) * ["-"]
        for i in metric.split("-"):
            symbols[int(i) - 1] = "+"
        return "".join(symbols)

def load_tei(filename):
    lines = []
    with open(filename, "r") as xml:
        contents = xml.read()
        tree = ElementTree.fromstring(contents)
        tags = tree.findall(".//{http://www.tei-c.org/ns/1.0}l")
        for tag in tags:
            text = clean_text(tag.text)
            lines.append((text, tag.attrib['met']))
    return pd.DataFrame(lines, columns=["line_text", "metrical_pattern"])

The corpus provided does not have any information about the texts, just metrical patterns.

In [6]:
carvajal = pd.read_csv("data/carvajal.csv")
carvajal.Length = carvajal.Length.astype(str)

In [7]:
carvajal["MetricSymbol"] = carvajal[["Metric", "Length"]].apply(
    lambda row: num2sym(row["Metric"].strip(), float(row["Length"])), axis=1
)

Leaving next cell for reference.

---

## Navarro-Colorado

Next cells illustrate how to run the code with the actual Carvajal's corpus. However, we can only release the results after running Navarro-Colorado's system and not the text of the corpus.

Alternatively, we'll use the pre-calculated results

In [8]:
with open("data/navarro_colorado_carvajal.txt") as file:
    navarro_colorado_stress = np.array(file.read().split())

### Accuracy on Carvajal

In [9]:
correct = sum(navarro_colorado_stress == carvajal.MetricSymbol.values)
navarro_colorado_accuracy = correct / carvajal.MetricSymbol.size
print(f"Navarro-Colorado on Carvajal: {navarro_colorado_accuracy:.4f}")

Navarro-Colorado on Carvajal: 0.4938


---

## Rantanplan

In [10]:
from rantanplan import get_scansion

Leaving next cell for reference

Alternatively, we load the results from a file.

In [11]:
with open("data/rantanplan_carvajal.txt") as file:
    rantanplan_carvajal_stress = np.array(file.read().split())

### Accuracy on Carvajal

In [12]:
correct = sum(rantanplan_carvajal_stress == carvajal.MetricSymbol.values)
rantanplan_carvajal_accuracy = correct / carvajal.MetricSymbol.size
print(f"Rantanplan on on Carvajal: {rantanplan_carvajal_accuracy:.4f}")

Rantanplan on on Carvajal: 0.6763


# Results

In [13]:
from IPython.display import display, HTML

# Using pre-calculated time values
display(HTML(
    pd.DataFrame([
        ["Navarro-Colorado", navarro_colorado_accuracy, 7484],
        ["Rantanplan", rantanplan_carvajal_accuracy, 27]
    ], columns=["Model", "Accuracy", "Time"]).to_html(index=False)
))

Model,Accuracy,Time
Navarro-Colorado,0.493833,7484
Rantanplan,0.676336,27
