# Computational Analysis of Poetry

This notebook demonstrates the use of the package `reading_poetry`.  It needs to be installed first. The code in this notebook also uses `nltk`, `vaderSentiment` and `pyvis`. 

In [None]:
!pip install reading_poetry
!pip install vaderSentiment
!pip install pyvis
!pip install nltk

In [None]:
import reading_poetry as rp
import os
import re

import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('tagsets')
nltk.download('wordnet')
nltk.download('sentiwordnet')

You can download the poems to be analysed using the code below. 

In [None]:
url = 'https://github.com/peterverhaar/reading_poetry_notebooks/raw/main/Poems.zip'

import requests

response = requests.get(url)
if response:
    open("Poems.zip", "wb").write(response.content)

from zipfile import ZipFile

with ZipFile('Poems.zip', 'r') as zipObj:
   # Extract all the contents of zip file in current directory
   zipObj.extractall()

## Phonetic transcription

The `reading_poetry` package has a fuction named `transcribe()` which you can use to create phonetic transcriptions in [SAMPA](https://www.phon.ucl.ac.uk/home/sampa/index.html) notation. 


In [None]:
verse_line = "The drunkenness of things being various"

print( rp.transcribe(verse_line) )

 You may test this function using lines from the following two poems. 
    
    THE LAKE ISLE OF INNISFREE

    I WILL arise and go now, and go to Innisfree,
    And a small cabin build there, of clay and wattles made:
    Nine bean-rows will I have there, a hive for the honeybee,
    And live alone in the bee-loud glade.
    
    And I shall have some peace there, for peace comes dropping slow,
    Dropping from the veils of the mourning to where the cricket sings;
    There midnight's all a glimmer, and noon a purple glow,
    And evening full of the linnet's wings.
    
    I will arise and go now, for always night and day
    I hear lake water lapping with low sounds by the shore;
    While I stand on the roadway, or on the pavements grey,
    I hear it in the deep heart's core.



    Snow

    The room was suddenly rich and the great bay-window was
    Spawning snow and pink roses against it
    Soundlessly collateral and incompatible:
    World is suddener than we fancy it.

    World is crazier and more of it than we think,
    Incorrigibly plural. I peel and portion
    A tangerine and spit the pips and feel
    The drunkenness of things being various.

    And the fire flames with a bubbling sound for world
    Is more spiteful and gay than one supposes— 
    On the tongue on the eyes on the ears in the palms of one's hands—
    There is more than glass between the snow and the huge roses.

## Add annotations in TEI

The function `add_annotations()` can be used to add data about the POS category, the lemma and the phonetic transcription. The annotations are stored as TEI.  

In [None]:
path = os.path.join('Poems','Yeats','TheLakeIsleOfInnisfree.txt' )
xml = rp.add_annotations(path)
print(xml)

The cell below annotates all the lines in the poems you dowloaded, and saves the TEI files in a folder named `XML`.

In [None]:
if not os.path.isdir('XML'):
    os.mkdir('XML')
    
dir = 'Poems'

author = dict()
texts = []

for root, dirs, files in os.walk(dir):
    for file in files:
        if re.search( r'\.txt$' , file ):
            texts.append( os.path.join(root, file) )

for t in sorted(texts):

    out_file = os.path.basename(t)
    out_file = re.sub( 'txt$' , 'xml' , out_file )
    out_file = out_file
    
    if re.search( 'Yeats' , t , re.IGNORECASE ):
        author[out_file] = 'Yeats'
    else:
        author[out_file] = 'MacNeice'
        
    out = open( os.path.join( 'XML' , out_file ) , 'w' , encoding='utf-8' ,    errors= 'replace')
    
    tei = rp.add_annotations(t)
    out.write(tei)
    out.close()
    
print('Done!')
    

## Basic information about poems

In [None]:
dir = 'XML'
file = 'TheLakeIsleOfInnisfree.xml'
path = os.path.join( dir, file )

poem = rp.Poem( path )


print( f'Number of lines: {poem.nr_lines} ' )
print( f'Number of words: {poem.nr_words} ' )
print( f'Number of stanzas: {poem.nr_stanzas} ' )

In [None]:
print( poem.title + '\n')

stanzas = poem.stanza_structure
print(stanzas)
print('\n', end = '')

last_lines = []
for s in stanzas:
    last_lines.append(stanzas[s][-1])
    
lines = poem.lines
for n in lines:    
    print( f'{n}. {lines[n]}' )
    if n in last_lines:
        print('\n', end = '')


We can print all the phonetic transcriptions (in SAMPA notation).

In [None]:
transcr = poem.transcriptions
lines = poem.lines

for n in lines:
    print( lines[n] )
    print( transcr[n] )
    print('\n' , end = '')

## Alliteration

In [None]:
dir = 'XML'

poems = [ 'Belfast.xml' , 'TheLakeIsleOfInnisfree.xml' ]

for tei in poems:
    path = os.path.join( dir, tei ) 
    poem = rp.Poem( path )
    print(poem.title + '\n')
    poem.show_alliteration()
    

## Perfect rhyme

In [None]:
import os
dir = 'XML'

poems = [ 'DownByTheSalleyGardens.xml' , 'ACoat.xml' , 'SundayMorning.xml' ]

for tei in poems:
    path = os.path.join( dir, tei ) 
    poem = rp.Poem( path )
    print(poem.title + '\n')
    poem.show_perfect_rhyme()

## Internal rhyme

In [None]:
poems = [ 'TheDoubleVisionOfMichaelRobartes.xml' , 'HeWishesForTheClothsOfHeaven.xml'
, 'WesternLandscape.xml' , 'Budgie.xml' , 'ThePhasesOfTheMoon.xml']

for tei in poems:
    path = os.path.join( dir, tei ) 
    poem = rp.Poem( path )
    print(poem.title + '\n')
    poem.show_internal_rhyme()

## Slant Rhyme

In [None]:
poems = [ 'Birmingham.xml'  ]
dir = 'XML'

for tei in poems:
    path = os.path.join( dir, tei ) 
    poem = rp.Poem( path )
    print(poem.title + '\n')
    poem.show_slant_rhyme_consonance()


In [None]:
poems = [ 'BagpipeMusic.xml' ]
dir = 'XML'

for tei in poems:
    path = os.path.join( dir, tei ) 
    poem = rp.Poem( path )
    print(poem.title + '\n')
    poem.show_slant_rhyme_assonance()


## Texture

In [None]:
from IPython.display import SVG, display, HTML
poems = [ 'TheFallingOfTheLeaves.xml' , 'ThePalePanther.xml']
dir = 'XML'
for tei in poems:
    path = os.path.join( dir, tei ) 
    poem = rp.Poem(path)
    print( poem.title )
#     lines = poem.lines
#     for l in lines:
#         print(lines[l])
    svg = poem.texture()
    display(HTML(svg))

## Anaphora

In [None]:
poems = [ 'WhenYouAreOld.xml', 
 'FlightOfTheHeart.xml' ,
    'TheClosingAlbum.xml' ,
'Vistas.xml' ,
'TrainToDublin.xml' ,
'TheBlackTower.xml' ]



for tei in poems:
     
    poem = rp.Poem(os.path.join( 'XML' , tei ))

    lines = poem.lines
    lines = set(list(lines.values()))

    a = rp.anaphora(lines)
    if len(a) > 0:
        print(poem.title)
        print(a)
        for l in lines:
            for ra in a:
                if re.search( '^{}'.format(ra) , l.lower() ):
                    print(l)
        print()

        

## Comparative analysis

The code below can be used to collect quantitative data about all the poems in the corpus. The data are saved in a CSV file. 

In [None]:
import os
import re
from nltk import word_tokenize
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer 
ana = SentimentIntensityAnalyzer()

out = open( 'data.csv', 'w' , encoding = 'utf-8' )


columns = [ 'title' , 
           'nr_words' , 
           'nr_stanzas' , 
           'nr_lines' , 
           'alliteration' , 
           'internal_rhyme' , 
           'perfect_rhyme' , 
            'slant_rhyme_assonance', 
           'slant_rhyme_consonance', 
           'anaphora' , 
           'neg_words' , 
           'pos_words' , 
           'author']

header = ','.join(columns)
out.write( header )
out.write('\n')

for file in os.listdir('XML'):
    if re.search( 'xml$' , file ):
        print( f'Collecting data for {file} ...')

        data = dict()
        data['author'] = author[file]
        
        poem = rp.Poem(os.path.join( 'XML' , file))
        data['title'] = poem.title
        data['nr_words'] = poem.nr_words
        data['nr_stanzas'] = poem.nr_stanzas
        data['nr_lines'] = poem.nr_lines

        tr = poem.transcriptions
        alliteration_count = 0
        internal_rhyme_count = 0 

        for line in tr:

            alliteration = rp.alliteration(tr[line])
            alliteration = re.sub('-|\s+', '' , alliteration )
            alliteration_count += len( alliteration )
            internal_rhyme = rp.internal_rhyme(tr[line])
            if len(internal_rhyme) > 0:
                internal_rhyme_count += len(internal_rhyme)
                


        data['alliteration'] = alliteration_count
        data['internal_rhyme'] = internal_rhyme_count

        lines = poem.lines
        neg_words = 0 
        pos_words = 0 
        
        for line in lines:
            words = rp.word_tokenise(lines[line])
            for word in words:
                scores = ana.polarity_scores(word)
                if scores['neg'] > 0: 
                    neg_words += 1
                if scores['pos'] > 0: 
                    pos_words += 1
                    
        data['pos_words'] = pos_words
        data['neg_words'] = neg_words
        
        data['anaphora'] = poem.anaphora_count_lines()
    
        stanzas = poem.stanza_structure
        pr_count = 0
        sra_count = 0
        src_count = 0

        for s in stanzas:
            stanza_lines = []
            for n in stanzas[s]:
                stanza_lines.append(tr[n])

            ## perfect rhyme
            pr = rp.perfect_rhyme(stanza_lines)
            if re.search( r'\d' , pr ):
                pr = re.sub('-|\s+', '' , pr )
                pr_count += len( pr )
                
            sra = rp.slant_rhyme_assonance(stanza_lines)
            if re.search( r'\d' , sra ):
                sra = re.sub('-|\s+', '' , sra )
                sra_count += len( sra )

            src = rp.slant_rhyme_consonance(stanza_lines)
            if re.search( r'\d' , src ):
                src = re.sub('-|\s+', '' , src )
                src_count += len( src )

        data['perfect_rhyme'] = pr_count    
        data['slant_rhyme_assonance'] = sra_count
        data['slant_rhyme_consonance'] = src_count  


        for i,c in enumerate(columns):
            out.write( f'{data[c]}')
            if i == len(columns)-1:
                out.write('\n')
            else:
                out.write(',')

out.close() 

print('Done!')
 

In [None]:
import pandas as pd

df = pd.read_csv('data.csv')

df.head(10)

## Normalisation

In [None]:
df['perfect_rhyme_normalised'] = df['perfect_rhyme'] / df['nr_lines']
df['slant_rhyme_a_normalised'] = df['slant_rhyme_assonance'] / df['nr_lines']
df['slant_rhyme_c_normalised'] = df['slant_rhyme_consonance'] / df['nr_lines']
df['anaphora_normalised'] = df['anaphora'] / df['nr_lines']
df['alliteration_normalised'] = df['alliteration'] / df['nr_words']
df['internal_rhyme_normalised'] = df['internal_rhyme'] / df['nr_words']
df['no_rhyme'] = df['nr_lines'] - ( df['perfect_rhyme'] + df['slant_rhyme_assonance'] )
df['no_rhyme_normalised'] = df['no_rhyme'] / df['nr_lines']

In [None]:
df.head()

## Visualisation of perfect rhyme and slant rhyme 

In [None]:
x_axis = 'perfect_rhyme_normalised'
y_axis = 'slant_rhyme_a_normalised'

import seaborn as sns
import matplotlib.pyplot as plt 

#colours = [  '#a88732' ,  '#265c28' , '#a0061a' ,  '#431670' ]

## The line below applies a stylesheet
## and also adds spacing in between the lines of the legend 
sns.set(style='whitegrid', rc = {'legend.labelspacing': 2})


fig = plt.figure( figsize = ( 12,8 ))

colours = ['#EE7733','#007788']
ax = sns.scatterplot(x = x_axis , y = y_axis , data=df , s=150 , hue = 'author' , palette = colours )

# this next line makes sure that the legend is shown outside of the graph
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)


ax.set_xlabel('Perfect rhyme')
ax.set_ylabel('Slant rhyme') 

for index, row in df.iterrows():
    alpha = 0.95
    if row[x_axis] < 0.05 or row[y_axis] < 0.05:
        alpha = 0.1
    plt.text( row[x_axis]-0.02, row[y_axis]+ 0.015 , row['title'] , alpha = alpha , fontsize=12.8)

plt.show()

In [None]:
x_axis = 'alliteration_normalised'
y_axis = 'anaphora_normalised'

import seaborn as sns
import matplotlib.pyplot as plt 

#colours = [  '#a88732' ,  '#265c28' , '#a0061a' ,  '#431670' ]

## The line below applies a stylesheet
## and also adds spacing in between the lines of the legend 
sns.set(style='whitegrid', rc = {'legend.labelspacing': 2})


fig = plt.figure( figsize = ( 12,8 ))

colours = ['#EE7733','#007788']
ax = sns.scatterplot(x = x_axis , y = y_axis , data=df , s=150 , hue='author' , palette=colours)

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

ax.set_xlabel('Alliteration')
ax.set_ylabel('Anaphora') 

for index, row in df.iterrows():
    alpha = 0.95
    if row[x_axis] < 0.05 or row[y_axis] < 0.05:
        alpha = 0.1
    plt.text( row[x_axis]-0.02, row[y_axis]+ 0.01 , row['title'] , alpha = alpha , fontsize=12.8)

plt.show()

## Visualisation of internal rhyme

In [None]:
x_axis = 'internal_rhyme_normalised'
y_axis = 'title'


dfs = df.sort_values(by=[x_axis], ascending=False)

fig = plt.figure( figsize = ( 6,8 ))

colours = ['#EE7733','#007788']
ax = sns.barplot( data = dfs , x = x_axis , y= y_axis , color = '#22106b' , hue='author' , palette=colours)

ax.set_ylabel('Poem') 
ax.set_xlabel('Alliteration (normalised by number of words)')

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.xticks(rotation= 90 )
plt.show()

## Perfect rhyme and slant rhyme in combination

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

#set seaborn plotting aesthetics
sns.set(style='white')

df_rhyme = df[ ['title' , 'perfect_rhyme_normalised' , 'slant_rhyme_a_normalised' , 'no_rhyme_normalised'  ] ]
df_rhyme = df_rhyme.sort_values(by=['perfect_rhyme_normalised'], ascending=False)

sns.set(rc={'figure.figsize':(8,7)})

#create stacked bar chart
ax = df_rhyme.set_index('title').plot(kind='bar', stacked=True, color=['#3361ab', '#d93f2e' , '#d1cac9'])

ax.set_xlabel('Title') 
ax.set_ylabel('Perfect rhyme and slant rhyme')

# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

## Similarity

In [None]:
columns = [ 'perfect_rhyme_normalised', 'slant_rhyme_a_normalised', 'anaphora_normalised',
       'alliteration_normalised', 'internal_rhyme_normalised', 'no_rhyme_normalised' ]
titles = df['title']
df_network = df[ columns ]

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

matrix = cosine_similarity(df_network)
matrix_df = pd.DataFrame( matrix , columns = titles , index = titles )

from pyvis.network import Network
net = Network(notebook=True , height="750px", width="100%" , bgcolor="#dce5f2" )

net.force_atlas_2based(
        gravity=-60,
        central_gravity=0.01,
        spring_length=100,
        spring_strength=0.08,
        damping=0.4,
        overlap= 0 )
               
related_texts = list(matrix_df.columns)

## an edge is drawn in between two nodes
# if the cosine similarity is 0.9 or higher
min_similarity = 0.95



for text,values in matrix_df.iterrows():
    for rt in related_texts:
        if text != rt:
            if values[rt] >= min_similarity:
                if author[f'{text}.xml'] == 'Yeats':
                    c ='#EE7733'
                else:
                    c = '#007788'  
                net.add_node(text , title=text ,  color= c , value = 15 )
                
                if author[f'{rt}.xml'] == 'Yeats':
                    c ='#EE7733'
                else:
                    c = '#007788'
                net.add_node(rt , title = rt, color = c , value = 15)
                
                net.add_edge( text , rt) 
                


net.show('network.html')

## Visualisation of perfect rhyme and alliteration

In [None]:
poems = [ 'Autobiography.xml' , 'SelvaOscura.xml']

for tei in poems:
    path = os.path.join('XML',tei)
    poem = rp.Poem(path)
    print(poem.title + '\n')
    
#     for l in poem.transcriptions.values():
#         print(l)
    
    svg = poem.visualise_rhyme_alliteration()
    
    out = open( f'{poem.title}_svg.html' , 'w' , encoding = 'utf-8' )
    out.write(svg)
    out.close()

    from IPython.display import SVG, display, HTML
    display(HTML(svg))

## Sentiment analysis

In [None]:
import math

poems = ['TheBlackTower.xml', 'LedaAndTheSwan.xml' ]

for tei in poems:
    path = os.path.join('XML',tei)
    poem = rp.Poem(path)
    lines = poem.lines 


    colours_pos = ['#fecac9','#ffaaa9','#ff9a99','#ff9290','#ff6968','#ff3937','#ff0906','#e40200','#cc0200','#a30200']
    colours_neg = ['#a0cbf7','#3d96f0','#1782ed','#0e6ac5','#0e65bd','#0d61b6','#0c5dae','#0c59a7','#0b559f','#0b5197']

    from nltk import word_tokenize
    from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer 

    ana = SentimentIntensityAnalyzer()

    html = ''
    html += '<html><body>'
    html += f'<h2>{poem.title}</h2>'

    for line in lines:
        words = rp.word_tokenise(lines[line])

        for word in words:
            scores = ana.polarity_scores(word)
            if scores['neg'] > 0: 
                score = abs(round( scores['compound']*10))  
                colour = colours_neg[score-1]
                font = '#FFFFFF'
            elif scores['pos'] > 0: 
                score = abs(round( scores['compound']*10)) 
                colour = colours_pos[score-1]
                font = '#FFFFFF'
            else: 
                colour = '#FFFFFF'
                font = '#000000'


            if len(word)>1:
                html += ' '
            html += f'<span style="color: {font} ; background-color: {colour}">{word}</span>' 

        html += '<br/>'

    html += '</body></html>'

    from IPython.display import display, HTML
    display(HTML(html))


In [None]:
x_axis = 'perfect_rhyme_normalised'
df['neg_words_normalised'] = df['neg_words'] / df['nr_words']
y_axis = 'neg_words_normalised'


import seaborn as sns
import matplotlib.pyplot as plt 

#colours = [  '#a88732' ,  '#265c28' , '#a0061a' ,  '#431670' ]

## The line below applies a stylesheet
## and also adds spacing in between the lines of the legend 
sns.set(style='whitegrid', rc = {'legend.labelspacing': 2})


fig = plt.figure( figsize = ( 12,8 ))

colours = ['#3361ab', '#d93f2e']
ax = sns.scatterplot(x = x_axis , y = y_axis , data=df , s=100 , hue = 'author', palette=colours)

# this next line makes sure that the legend is shown outside of the graph
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)


for index, row in df.iterrows():
    alpha = 0.95
    if row[x_axis] < 0.05 and row[y_axis] < 0.05:
        alpha = 0.1
    plt.text( row[x_axis], row[y_axis] , row['title'] , alpha = alpha , fontsize=12.8)

plt.show()