In my [previous notebook](https://fpgmaas.nl/2019-12-06-1.0-summary-statistics-upvotes-and-awards/), I did some initial analysis on the dataset that contains all poems by [/u/poem_for_your_sprog](https://reddit.com/u/poem_for_your_sprog). I noted there that the rhythm, or poetic meter, of the poems, is one of the main characteristics that makes these poems appealing to me.

In this notebook, I want to get to know a bit more about poetic meter, and explore the meters that sprog utilizes in his poems on [https://www.reddit.com/r/AskReddit/](/r/AskReddit).

In [None]:
%%capture
%load_ext autoreload
%autoreload 2

import nltk
from nltk.tokenize import RegexpTokenizer
nltk.download('stopwords')
nltk.download('punkt')

In [None]:
import string
import datetime as dt
import pandas as pd
import numpy as np
import time
import re
import matplotlib.pyplot as plt 
from string import punctuation
from plotly.offline import download_plotlyjs, init_notebook_mode
import plotly.graph_objs as go
init_notebook_mode(connected=False)

from src.utils import print2_list, print2, export_ipynb_for_github_pages
from src.plotly import plot_histogram, plot_timeline, plot_horizontal_bar, plot_heatmap, \
plot_grouped_scatter, plot_multiple_timelines, plot_meter
from src.meter import get_word_scansion, get_line_scansion, get_syllables_per_line_combined, \
combine_line_scansions, merge_lines, get_known_meter
from src.data import load_data

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', -1)

df = load_data('data/comments.txt', False)
df = df[df['type']=='poem']

# Understanding poetic meter

To explore poetic meter, let's start by taking a look at one of my favourite poems, which contains a message about dealing with regret: 
```
I should have hurried youth, in truth,
And moved more quickly on -
I should have made the most of youth,
Before the time was gone.

I should have followed fancy, free,
Before it thought to fade -
I should have picked a good degree,
Or found myself a trade.

I should have stopped to stare above;
To share another's dreams -
I should have never welcomed love,
And lost it all, it seems.

No matter what the aim or end -
No matter what you do -
Regrets are part of life, my friend:

Don't let them conquer you.
```
[SOURCE](https://www.reddit.com/r/AskReddit/comments/4fsds1/adults_from_reddit_what_do_you_regret_most_from/d2bmumz/)

If you say this poem out loud, you are clearly able to hear the rhytym;

```
I SHOULD have HURried YOUTH, in TRUTH,
And MOVED more QUICKly ON -
```

Or, maybe a clearer way of representing this:

In [None]:
z_text =[['i', 'should','have','hur','ried','youth','in','truth'],
    ['and','moved','more','quick','ly','on','','']]
z =[[0,1,0,1,0,1,0,1],
    [0,1,0,1,0,1,np.nan,np.nan]]
fig = plot_meter(  
    text = z_text[::-1],
    meter = z[::-1],
    title = '',
    colorscale = colorscale
)
fig.show()

The meter of the first line is iambic tetrameter, whereas the second line is iambic trimeter. To break that down: First, they are both iambic, because they consist of iambic feet. An iambic foot is a unstressed syllable, followed by a stressed syllable, so a grey square followed by a blue square in the visualization above. The first line consists out of four iambic feet (pentameter), and the second line consists out of three iambic feet (trimeter). The nomenclature for the number of feet per line is as follows:

- monometer (1) or foot
- dimeter (2)
- trimeter (3)
- tetrameter (4)
- pentameter (5)
- hexameter (6)

Two other commonly used types of feet are the trochaic foot (stressed, unstressed), and the anapestic foot (unstressed, unstressed, stressed). To illustrate, the three types of feet we have encountered until now are:

In [None]:
example_feet = {
    'iambic foot' :
        {
        'z_text' : [['','']],
        'z' : [[0,1]]
        },
    'trochaic foot' :
        {
        'z_text' : [['','']],
        'z' : [[1,0]]
        },
    'anapestic foot' :
        {
        'z_text' : [['','','']],
        'z' : [[0,0,1]]
        }
}

colorscale=['#d2d2d2','#1f77b4']
for key in example_feet.keys():
    fig = plot_meter(  
        text = example_feet[key]['z_text'][::-1],
        meter = example_feet[key]['z'][::-1],
        title = key,
        colorscale = colorscale
    )
    fig.show()

By combining these three feet in a large variety of ways, we can construct the meter for almost all the poems by /u/poem_for_your_sprog. To perform analysis on the meters that sprog uses however, we should first identify which meter each poem uses. We could do that by reading each poem and assigning the meter by hand, but that will be very time consuming. So let's see if we can use Python to do this for us!

# Identifying meters in poetry using Python

In order to identify which meter each poem uses, 

In [None]:
# * = acatalectic, i.e. the last (unstressed) syllable is omitted
# ** = iambic subsitution, i.e. the first (unstressed) syllable is omitted from an anapestic foot

known_meters = {
    'iambic hexameter'       : '010101010101',
    'iambic hexameter*'      : '01010101010',
    'iambic pentameter'      : '0101010101',
    'iambic pentameter*'     : '010101010',
    'iambic tetrameter'      : '01010101',
    'iambic tetrameter*'     : '0101010',
    'iambic trimeter'        : '010101',
    'iambic trimeter*'       : '01010',
    'iambic dimeter'         : '0101',
    'iambic dimeter*'        : '010',
    'iambic monometer'       : '01',
    
    'anapestic tetrameter'   : '001001001001',
    'anapestic tetrameter**' : '01001001001',
    'anapestic trimeter'     : '001001001',
    'anapestic trimeter**'   : '01001001',
    'anapestic dimeter'      : '001001',
    'anapestic dimeter**'    : '01001',
    'anapestic monometer'    : '001',

    'trochaic hexameter'     : '101010101010',
    'trochaic hexameter*'    : '10101010101',
    'trochaic pentameter'    : '1010101010',
    'trochaic pentameter*'   : '101010101',
    'trochaic tetrameter'    : '10101010',
    'trochaic tetrameter*'   : '1010101',
    'trochaic trimeter'      : '101010',
    'trochaic trimeter*'     : '10101',
    'trochaic bimeter'       : '1010',
    'trochaic bimeter*'      : '101',
    'trochaic monometer'     : '10',
    
    'amphibrachic dimeter'   : '010010'
}
known_meters_inv = inv_map = {v: k for k, v in known_meters.items()}

In [None]:
# Determine the scansion of each poem, and which lines to combine based on this scansion.
df['poem_as_list'] = [poem.split('>') for poem in df['poem']]
df['scansion'] = [[get_line_scansion(line) for line in poem] for poem in df['poem_as_list']]
df['lines_to_combine'] = [combine_line_scansions(x) for x in df['scansion']]

# combine scansion and poem lines based on the suggested improvements.
df['scansion_modified'] = [merge_lines(row['scansion'], row['lines_to_combine']) for ix, row in df.iterrows()]
df['poem_modified_as_list'] = [merge_lines(row['poem_as_list'], row['lines_to_combine'], sep = ' ') for ix, row in df.iterrows()]
df['poem_modified'] = ['>'.join(x) for x in df['poem_modified_as_list']]

# Determine which of our known meters the poem is.
df['meter_list'] = [get_known_meter(x, known_meters_inv) for x in df['scansion_modified']]
df['meter'] = [', '.join(x) for x in df['meter_list']]

In [None]:
df_most_common_meters = (df
                         .groupby('meter')
                         .agg(n=('ups', len), 
                              avg_ups=('ups', 'mean'))
                         .sort_values('n',ascending=False)
                        )
df_most_common_meters.reset_index(inplace=True)
df_most_common_meters_10 = df_most_common_meters.head(10)

In [None]:
print('Total number of poems: {0}\nTotal number of poems in top 10 meters: {1} ({2:.1f}% of total)'.format(
    len(df),
    df_most_common_meters_10['n'].sum(),
    df_most_common_meters_10['n'].sum()/len(df)*100
))

In [None]:
fig = plot_horizontal_bar(
    labels = df_most_common_meters_10['meter'][::-1],
    values = df_most_common_meters_10['n'][::-1],
    title = 'The 10 most common meters in poems by /u/poem_for_your_sprog',
    xaxis_title = 'Number of poems',
    yaxis_title='',
    figsize=(800,600)
)
fig.show()

In [None]:
meter_examples = {
    'iambic tetrameter, iambic trimeter':
    {
        'z_text' :  
        [
            ['i', 'should','have','hur','ried','youth','in','truth'],
            ['and','moved','more','quick','ly','on','','']
        ],
        'z' : 
        [
            [0,1,0,1,0,1,0,1],
            [0,1,0,1,0,1,np.nan,np.nan]
        ]
    },
    'anapestic tetrameter**':
    {
        'z_text' : 
        [
            ['so', 'throw','off','the','chains','of','op','pres','sion','said','he'],
            ['be','fair','ly','un','fet','terred','and','free','to','be','free']
        ],
        'z' : 
        [
            [0,1,0,0,1,0,0,1,0,0,1],
            [0,1,0,0,1,0,0,1,0,0,1]
        ]
    },
    'iambic tetrameter':
    {
        'z_text' : 
        [
            ['from','time','to','time','i','think','of','then'],
            ['i','turn','my','gaze','be','fore','a','gain']
        ],
        'z' : 
        [
            [0,1,0,1,0,1,0,1],
            [0,1,0,1,0,1,0,1]
        ]
    },
    'trochaic tetrameter, trochaic tetrameter*':
    {
        'z_text' : 
        [
            ['would', 'you','suck','er','punch','a','mon','key?'],
            ['would','you','up','per','cut','a','bear?','']
        ],
        'z' : 
        [
            [1,0,1,0,1,0,1,0],
            [1,0,1,0,1,0,1,np.nan]
        ]
    },
        'trochaic tetrameter':
    {
        'z_text' : 
        [
            ['wave','good','bye','your','e','go','bro','ther'],
            ['kiss','your','kids','and','call','your', 'mo','ther']
        ],
        'z' : 
        [
            [1,0,1,0,1,0,1,0],
            [1,0,1,0,1,0,1,0]
        ]
    },
    'trochaic tetrameter, trochaic trimeter*':
    {
        'z_text' : 
        [
            ['no','one\'s','quite','as','strong','as','stan','ley'],
            ['stan','ley\'s','been','to','war','','','']
        ],
        'z' : 
        [
            [1,0,1,0,1,0,1,0],
            [1,0,1,0,1,np.nan,np.nan,np.nan]
        ]
    },    
    'trochaic tetrameter*':
    {
        'z_text' : 
        [
            ['when', 'you\'re','full','of','doubt','and','fear'],
            ['i\'ll','be','with','you','wait','ing','here']
        ],
        'z' : 
        [
            [1,0,1,0,1,0,1],
            [1,0,1,0,1,0,1]
        ]
    },
    'iambic dimeter':
    {
        'z_text' : 
        [
            ['my','name','is','dog'],
            ['and','e','ven', 'though']
        ],
        'z' : 
        [
            [0,1,0,1],
            [0,1,0,1]
        ]
    },
    'amphibrachic dimeter, anapestic dimeter**':
    {
        'z_text' : 
        [
            ['you\'re','sea','soned','in','sad','ness'],
            ['you\'re','prac','ticed','in','doubt','']
        ],
        'z' : 
        [
            [0,1,0,0,1,0],
            [0,1,0,0,1,np.nan]
        ]
    },
    'anapestic tetrameter':
    {
        'z_text' : 
        [
            ['there\'s','a','laugh','on','her','lips','and','a','light','in','her','eye'],
            ['and','a','warmth','in','her','voice','and','a','smile','in','her','sigh']
        ],
        'z' : 
        [
            [0,0,1,0,0,1,0,0,1,0,0,1],
            [0,0,1,0,0,1,0,0,1,0,0,1]
        ]
    },
    'anapestic tetrameter':
    {
        'z_text' : 
        [
            ['there\'s','a','laugh','on','her','lips','and','a','light','in','her','eye'],
            ['and','a','warmth','in','her','voice','and','a','smile','in','her','sigh']
        ],
        'z' : 
        [
            [0,0,1,0,0,1,0,0,1,0,0,1],
            [0,0,1,0,0,1,0,0,1,0,0,1]
        ]
    },
    'anapestic dimeter':
    {
        'z_text' : 
        [
            ['i','re','mem','ber','the','way'],
            ['that','i','thought','a','bout','love']
        ],
        'z' : 
        [
            [0,0,1,0,0,1],
            [0,0,1,0,0,1]
        ]
    },
    
}

for key in df_most_common_meters_10['meter']:
    fig = plot_meter(  
        text = meter_examples[key]['z_text'][::-1],
        meter = meter_examples[key]['z'][::-1],
        title = key,
        colorscale = colorscale
    )
    fig.show()

In [None]:
fig = plot_grouped_scatter(
    x = df_most_common_meters_10['n'],
    y = df_most_common_meters_10['avg_ups'],
    groups = df_most_common_meters_10['meter'],
    unique_groups = df_most_common_meters_10['meter'],
    text = np.array(['{}<br>Average upvotes: {}<br>n: {}'.format(row['meter'],round(row['avg_ups'],1),row['n'])  
                       for index, row in (df_most_common_meters_10).iterrows()]),
    title = 'Number of poems vs average number of upvotes of the 10 most common meters<br> by u/poem_for_your_sprog',
    xaxis_title = 'n',
    yaxis_title = 'Average upvotes'
)
fig.show()

In [None]:
# Create a DataFrame with the top 10 poems based on upvotes for each in the 10 most commonly used meters.
df_poems_top_meters = df[df['meter'].isin(df_most_common_meters_10['meter'])]
df_10_poems_top_meters = (df_poems_top_meters
                 .groupby(["meter"])
                 .apply(lambda x: x.sort_values(["ups"], ascending = False))
                 .reset_index(drop=True)
                 .groupby('meter')
                 .head(10))

In [None]:
fig = plot_grouped_scatter(
    x = df_10_poems_top_meters ['average_line_length'],
    y = df_10_poems_top_meters ['ups'],
    groups = df_10_poems_top_meters['meter'],
    unique_groups = df_most_common_meters_10['meter'],
    text = np.array([re.sub('>','<br>',comment) for comment in df_10_poems_top_meters ['poem']]),
    title = 'Top rated poems in the 10 most common meters by u/poem_for_your_sprog',
    xaxis_title = 'Average line length',
    yaxis_title = 'Upvotes'
)
fig.show()

In [None]:
# Create a DataFrame with the total number of poems per meter per month, and the fraction of total
df_poems_top_meters['date'] = pd.to_datetime(df_poems_top_meters['date'])
df_poems_top_meters['month'] = df_poems_top_meters['date'].dt.to_period('M')
df_meters_per_month = (df_poems_top_meters
                            .groupby(['month','meter'])['month']
                            .agg(n='count')
                            .unstack('meter')
                            .fillna(0)
                            .stack('meter')
                            .reset_index(inplace=False)
)
df_meters_per_month['month'] = [x.to_timestamp() for x in df_meters_per_month['month']]
df_meters_per_month['month_total'] = df_meters_per_month['n'].groupby(df_meters_per_month['month']).transform('sum')
df_meters_per_month['frac']=df_meters_per_month['n']/df_meters_per_month['month_total']

In [None]:
fig = plot_multiple_timelines(
    x = df_meters_per_month['month'],
    y = df_meters_per_month['frac'],
    groups = df_meters_per_month['meter'],
    unique_groups = df_most_common_meters_10['meter'],
    text=np.array([
        "{}: {:.0f} ({:.2%})".format(row['month'].strftime("%b %Y"),
                                     row['n'],
                                     row['frac']) 
                   for ix, row in df_meters_per_month.iterrows()]),
    title='Fraction of poems per meter per month',
    xaxis_title = 'month',
    yaxis_title ='fraction of poems',
    figsize=(1000,1200)
)

fig.show()

In [None]:
# less popular meters

In [None]:
df_most_common_meters_10to20 = df_most_common_meters.iloc[10:20]

In [None]:
fig = plot_horizontal_bar(
    labels = df_most_common_meters_10to20['meter'][::-1],
    values = df_most_common_meters_10to20['n'][::-1],
    title = 'The 10 most common meters in poems by /u/poem_for_your_sprog',
    xaxis_title = 'Number of poems',
    yaxis_title='',
    figsize=(800,600)
)
fig.show()

In [None]:
# Create a DataFrame with the top 10 poems based on upvotes for each in the 10 most commonly used meters.
df_poems_top_meters = df[df['meter'].isin(df_most_common_meters_10to20['meter'])]
df_10_poems_top_meters = (df_poems_top_meters
                 .groupby(["meter"])
                 .apply(lambda x: x.sort_values(["ups"], ascending = False))
                 .reset_index(drop=True)
                 .groupby('meter')
                 .head(10))

fig = plot_grouped_scatter(
    x = df_10_poems_top_meters ['average_line_length'],
    y = df_10_poems_top_meters ['ups'],
    groups = df_10_poems_top_meters['meter'],
    unique_groups = df_most_common_meters_10to20['meter'],
    text = np.array([re.sub('>','<br>',comment) for comment in df_10_poems_top_meters ['poem']]),
    title = 'Top rated poems in the 10 most common meters by u/poem_for_your_sprog',
    xaxis_title = 'Average line length',
    yaxis_title = 'Upvotes'
)
fig.show()

In [None]:
front_matter_str = """---
layout: post
title: Poetry & Data - Part 2
subtitle: Analyzing the meter in the poetry of /u/poem_for/your_sprog on Reddit
tags: [python, poetry, poem_for_your_sprog, reddit]
layout: html_post
---"""

from IPython.display import display, Javascript
display(Javascript('IPython.notebook.save_checkpoint();'))
import time
time.sleep(10)
export_ipynb_for_github_pages(filename="3.0-meter.ipynb",
                              front_matter_str=front_matter_str,
                              prefix = dt.datetime.today().strftime('%Y-%m-%d') +'-')