# Evaluation on fixed-metre poetry

This Notebook contains the evaluation metrics for Jumper real-time scansion system. It's based on https://github.com/linhd-postdata/rantanplan-evaluation/blob/master/evaluation-fixed-metre.ipynb

In [1]:
from datetime import datetime
print(f"Last run: {datetime.utcnow().strftime('%B %d %Y - %H:%M:%S')}")

Last run: December 16 2020 - 10:34:21


## System info

Installing dependencies and downloading necessary corpora using [`Averell`](https://pypi.org/project/averell/).

In [2]:
cat /proc/cpuinfo | grep 'model name' | uniq

model name	: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz


In [3]:
cat /proc/meminfo | grep 'MemTotal' | uniq

MemTotal:       16279060 kB


# Setup

In [4]:
pip install -q --no-cache https://github.com/linhd-postdata/averell/archive/803685bd7e00cc7def6837f9843ab560085e8fca.zip

Note: you may need to restart the kernel to use updated packages.


In [5]:
!averell list

  id  name                size      docs    words  granularity    license
----  ------------------  ------  ------  -------  -------------  -----------
   1  Disco V2            22M       4088   381539  stanza         CC-BY
                                                   line
   2  Disco V3            28M       4080   377978  stanza         CC-BY
                                                   line
   3  Sonetos Siglo       6.8M      5078   466012  stanza         CC-BY-NC
      de Oro                                       line           4.0
   4  ADSO 100            128K       100     9208  stanza         CC-BY-NC
      poems corpus                                 line           4.0
   5  Poesía Lírica       3.8M       475   299402  stanza         CC-BY-NC
      Castellana Siglo                             line           4.0
      de Oro                                       word
                                                   syllable
   6  Gongocorpus         9

In [6]:
%%bash
averell download 3 4 > /dev/null 2>&1
averell export 3 --granularity line
mv corpora/line.json corpora/sonnets.json
averell export 4 --granularity line
mv corpora/line.json corpora/adso.json
du -h corpora/*.json

Using corpora folder: './corpora'
Using corpora folder: './corpora'
720K	corpora/adso.json
35M	corpora/sonnets.json


Defining helper functions

In [7]:
import json
import math
import re
from io import StringIO
import time
import numpy as np
import pandas as pd

def clean_text(string):
    output = string.strip()
    # replacements = (("“", '"'), ("”", '"'), ("//", ""), ("«", '"'), ("»",'"'))
    replacements = (("“", ''), ("”", ''), ("//", ""), ("«", ''), ("»",''))
    for replacement in replacements:
        output = output.replace(*replacement)
    output = re.sub(r'(?is)\s+', ' ', output)
    output = re.sub(r"(\w)-(\w)", r"\1\2", output)  # "Villa-nueva" breaks Navarro-Colorado's system
    return output

La siguiente función convierte la cadena de sílabas acentuadas y no acentuadas en un vector de acentos. Ejemplo, de '-+---+---+-' a [2,6,10]

In [8]:
def to_vector(acento):
    acentos = []
    for i,c in enumerate(acento):
        if c == '+':
            acentos.append(i+1)
    return acentos

# Import

In [9]:
import jumper

# Results

### Accuracy on ADSO

In [10]:
adso = pd.DataFrame.from_records(
    json.load(open("corpora/adso.json"))
)[["line_text", "metrical_pattern"]].reset_index(drop=True)
adso.line_text = adso.line_text.apply(clean_text)
adso.metrical_pattern = adso.metrical_pattern.apply(to_vector)
adso

Unnamed: 0,line_text,metrical_pattern
0,"Yo vi unos ojos bellos, que hirieron","[1, 2, 4, 6, 10]"
1,"con dulce flecha un corazón cuitado,","[2, 4, 5, 8, 10]"
2,y que para encender nuevo cuidado,"[6, 7, 10]"
3,su fuerza toda contra mí pusieron.,"[2, 4, 8, 10]"
4,Yo vi que muchas veces prometieron,"[1, 2, 4, 6, 10]"
...,...,...
1399,"cobró armado y prudente su denuedo,","[2, 3, 6, 10]"
1400,que sin victorias no contó algún día.,"[4, 6, 8, 9, 10]"
1401,Esto fue don Fadrique de Toledo.,"[1, 3, 6, 10]"
1402,"Hoy nos da, desatado en sombra fría,","[1, 3, 6, 8, 10]"


In [11]:
start_time = time.time()
analisis_adso = jumper.escandir_lista_versos(adso.line_text.tolist())
time_adso = time.time() - start_time

In [12]:
adso_output_df = pd.DataFrame.from_records(analisis_adso, columns=['Verso', 'Verso etiquetado', "Sílabas", "acentos", 'Sin acentos extrarrítmicos', 'Tipo', 'Coincidencia'])
adso_output_df

Unnamed: 0,Verso,Verso etiquetado,Sílabas,acentos,Sin acentos extrarrítmicos,Tipo,Coincidencia
0,"Yo vi unos ojos bellos, que hirieron",yo vi unos ojos bellos que hirieron,11,"[1, 2, 4, 6, 10]","[2, 4, 6, 10]",Endecasílabo heroico corto,0.9
1,"con dulce flecha un corazón cuitado,","con dulce flecha un corazón cuitado,",11,"[2, 4, 5, 8, 10]","[2, 4, 8, 10]",Endecasílabo sáfico largo pleno,0.9
2,y que para encender nuevo cuidado,y que para encender nuevo cuidado,11,"[6, 7, 10]","[6, 10]",Endecasílabo vacío puro,0.9
3,su fuerza toda contra mí pusieron.,su fuerza toda contra mí pusieron.,11,"[2, 4, 8, 10]","[2, 4, 8, 10]",Endecasílabo sáfico largo pleno,1.0
4,Yo vi que muchas veces prometieron,Yo vi que muchas veces prometieron,11,"[1, 2, 4, 6, 10]","[2, 4, 6, 10]",Endecasílabo heroico corto,0.9
...,...,...,...,...,...,...,...
1399,"cobró armado y prudente su denuedo,","cobró armado y prudente su denuedo,",11,"[2, 3, 6, 10]","[2, 6, 10]",Endecasílabo heroico puro,0.9
1400,que sin victorias no contó algún día.,que sin victorias no contó algún día.,11,"[4, 6, 8, 9, 10]","[4, 6, 8, 10]",Endecasílabo sáfico largo,0.9
1401,Esto fue don Fadrique de Toledo.,Esto fue don Fadrique de Toledo.,11,"[1, 3, 6, 10]","[1, 3, 6, 10]",Endecasílabo melódico corto,1.0
1402,"Hoy nos da, desatado en sombra fría,","Hoy nos da, desatado en sombra fría,",11,"[1, 3, 6, 8, 10]","[1, 3, 6, 8, 10]",Endecasílabo melódico pleno,1.0


In [13]:
accuracy_adso = sum(adso_output_df.acentos == adso.metrical_pattern) / adso.metrical_pattern.size

In [14]:
print(f"Jumper scansion on ADSO: {accuracy_adso:.2f} ({time_adso:.2f}s)")

Jumper scansion on ADSO: 0.95 (0.33s)


### Accuracy on Sonnets

In [15]:
sonnets = pd.DataFrame.from_records(
    json.load(open("corpora/sonnets.json"))
).query("manually_checked == True")[["line_text", "metrical_pattern"]].reset_index(drop=True)
sonnets.line_text = sonnets.line_text.apply(clean_text)
sonnets.metrical_pattern = sonnets.metrical_pattern.apply(to_vector)
sonnets

Unnamed: 0,line_text,metrical_pattern
0,Cuando la alegre y dulce primavera,"[4, 6, 10]"
1,"a partir sus riquezas comenzaba,","[3, 6, 10]"
2,y de los verdes campos desterraba,"[4, 6, 10]"
3,"aquella estéril sequedad primera,","[2, 4, 8, 10]"
4,un pastor triste y solo en la ribera,"[1, 3, 4, 6, 10]"
...,...,...
10264,y en tus labios purpúrea competencia,"[3, 6, 10]"
10265,"agora al alba y al clavel ofrece,","[2, 4, 8, 10]"
10266,"la edad, con invisible diligencia,","[2, 6, 10]"
10267,en el común ocaso lo oscurece;,"[4, 6, 10]"


In [16]:
start_time = time.time()
analisis_sonnets = jumper.escandir_lista_versos(sonnets.line_text.tolist())
time_sonnets = time.time() - start_time

In [17]:
sonnets_output_df = pd.DataFrame.from_records(analisis_sonnets, columns=['Verso', 'Verso etiquetado', "Sílabas", "acentos", 'Sin acentos extrarrítmicos', 'Tipo', 'Coincidencia'])
sonnets_output_df

Unnamed: 0,Verso,Verso etiquetado,Sílabas,acentos,Sin acentos extrarrítmicos,Tipo,Coincidencia
0,Cuando la alegre y dulce primavera,Cuando la alegre y dulce primavera,11,"[4, 6, 10]","[4, 6, 10]",Endecasílabo sáfico corto,1.0
1,"a partir sus riquezas comenzaba,","a partir sus riquezas comenzaba,",11,"[3, 6, 10]","[3, 6, 10]",Endecasílabo melódico puro,1.0
2,y de los verdes campos desterraba,y de los verdes campos desterraba,11,"[4, 6, 10]","[4, 6, 10]",Endecasílabo sáfico corto,1.0
3,"aquella estéril sequedad primera,","aquella estéril sequedad primera,",11,"[2, 4, 8, 10]","[2, 4, 8, 10]",Endecasílabo sáfico largo pleno,1.0
4,un pastor triste y solo en la ribera,un pastor triste y solo en la ribera,11,"[1, 3, 4, 6, 10]","[1, 3, 6, 10]",Endecasílabo melódico corto,0.9
...,...,...,...,...,...,...,...
10264,y en tus labios purpúrea competencia,y en tus labios purpúr#a competencia,11,"[3, 6, 10]","[3, 6, 10]",Endecasílabo melódico puro,1.0
10265,"agora al alba y al clavel ofrece,",agora al alba y al clavel ofrece,11,"[2, 4, 8, 10]","[2, 4, 8, 10]",Endecasílabo sáfico largo pleno,1.0
10266,"la edad, con invisible diligencia,","la edad, con invisible diligencia,",11,"[2, 6, 10]","[2, 6, 10]",Endecasílabo heroico puro,1.0
10267,en el común ocaso lo oscurece;,en el común ocaso lo oscurece;,11,"[4, 6, 10]","[4, 6, 10]",Endecasílabo sáfico corto,1.0


In [18]:
accuracy_sonnets = sum(sonnets_output_df.acentos == sonnets.metrical_pattern) / sonnets.metrical_pattern.size

In [19]:
print(f"Jumper scansion on Sonnets: {accuracy_sonnets:.2f} ({time_sonnets:.2f}s)")

Jumper scansion on Sonnets: 0.95 (2.48s)


# Failure analysis

In [20]:
errores = sonnets_output_df['acentos'] != sonnets.metrical_pattern
df_errores = sonnets_output_df.loc[errores].copy()
df_errores['Data set'] = sonnets.loc[errores, 'metrical_pattern']
df_errores

Unnamed: 0,Verso,Verso etiquetado,Sílabas,acentos,Sin acentos extrarrítmicos,Tipo,Coincidencia,Data set
11,"que, cuanto más sin pena se hallare,",que cuanto más sin pena se hallare,11,"[4, 6, 10]","[4, 6, 10]",Endecasílabo sáfico corto,1.0,"[2, 4, 6, 10]"
12,"si a Silvia la cruel pastora viere,",si a silvia la cruel pastora viere,11,"[3, 6, 8, 10]","[3, 6, 8, 10]",Endecasílabo melódico largo,1.0,"[2, 6, 8, 10]"
66,"que, puesto que tu vivo ardor te mueve,","que, puesto que tu vivo ardor te mueve,",11,"[2, 6, 8, 10]","[2, 6, 8, 10]",Endecasílabo heroico largo,1.0,"[6, 8, 10]"
81,"¡Oh gran consuelo a mi esperanza vana,","¡Oh gran consuelo a mi esperanza vana,",11,"[1, 2, 4, 8, 10]","[1, 4, 8, 10]",Endecasílabo sáfico puro pleno,0.9,"[1, 4, 8, 10]"
86,"reqüiescant in bello, que no in pace,","reqüiescant in bello, que no in pace,",11,"[4, 5, 6, 9, 10]","[4, 6, 10]",Endecasílabo sáfico corto,0.8,"[2, 5, 8, 10]"
...,...,...,...,...,...,...,...,...
10160,"básteme a mí sentir, ya que no veo","básteme a mí sentir, ya que no veo",11,"[1, 4, 6, 7, 9, 10]","[1, 4, 6, 10]",Endecasílabo sáfico corto pleno,0.8,"[1, 4, 6, 9, 10]"
10181,Ya que cantar en estas frescas sombras,Ya que cantar en estas frescas sombras,11,"[1, 4, 6, 8, 10]","[1, 4, 6, 8, 10]",Endecasílabo sáfico pleno,1.0,"[4, 6, 8, 10]"
10192,Minerva eternamente la acompaña.,Minerva eternamente la acompaña.,11,"[2, 4, 6, 10]","[2, 4, 6, 10]",Endecasílabo heroico corto,1.0,"[2, 6, 10]"
10202,y el Amor por la mano te guiaba.,y el amor por la mano te gi~aba,11,"[3, 6, 10]","[3, 6, 10]",Endecasílabo melódico puro,1.0,"[4, 7, 10]"
