# PPA genre topic model

This notebook contains code for generating topics for genre-focused pages in the "literary" collection of the Princeton Prosody Archive. It uses Maria Antoniak's [Little Mallet Wrapper](https://github.com/maria-antoniak/little-mallet-wrapper), and some of the code is adapted from Melanie Walsh's [Introduction to Cultural Analytics](http://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/08-Topic-Modeling-Text-Files.html).

The corpus file `filtered_ppa_corpus.csv` is required for this notebook.

In [1]:
#!pip install little_mallet_wrapper

In [154]:
import pandas as pd
import re

from collections import defaultdict
from datetime import datetime
import math
from operator import itemgetter
import os
import random
import glob
from pathlib import Path

import numpy as np

%matplotlib inline

import little_mallet_wrapper as lmw

pd.set_option('display.max_rows', 100)

## Reading in and filtering corpus

In [155]:
filtered_literary_df=pd.read_csv("filtered_ppa_corpus.csv")
filtered_literary_df

Unnamed: 0,page_id,work_id,order,form,tags,counts,contexts,page_text,spelling,source_id,...,author,pub_year,publisher,pub_place,collections,work_type,source,source_url,sort_title,subtitle
0,A01224.36,A01224,36,Elegy,['book'],1,"[""ns chaleur, sans poux, d'amoreuse langueur\n...","\nverse, one Adoniū be added thereunto, as\n\...",['elegiac'],A01224,...,"Fraunce, Abraham, fl. 1587-1633",1588.0,Thomas Orwin,At London,"['Linguistic', 'Literary']",full-work,EEBO-TCP,http://name.umdl.umich.edu/A01224.0001.001,Arcadian rhetorike: or The præcepts of rhetori...,"Greeke, Latin, English, Italian, French, Spani..."
1,A01224.52,A01224,52,Sonnet,['book'],1,"['ebat,\nImperium solenne socer.\n\nSir P. Syd...",Salust. 4. Semaine.\nReglant ensemblement nos ...,['sonet'],A01224,...,"Fraunce, Abraham, fl. 1587-1633",1588.0,Thomas Orwin,At London,"['Linguistic', 'Literary']",full-work,EEBO-TCP,http://name.umdl.umich.edu/A01224.0001.001,Arcadian rhetorike: or The præcepts of rhetori...,"Greeke, Latin, English, Italian, French, Spani..."
2,A01224.53,A01224,53,Sonnet,['book'],1,"['ebat,\nImperium solenne socer.\n\nSir P. Syd...","And,\n—Socer arua Latinus habebat,\nImperium s...",['sonet'],A01224,...,"Fraunce, Abraham, fl. 1587-1633",1588.0,Thomas Orwin,At London,"['Linguistic', 'Literary']",full-work,EEBO-TCP,http://name.umdl.umich.edu/A01224.0001.001,Arcadian rhetorike: or The præcepts of rhetori...,"Greeke, Latin, English, Italian, French, Spani..."
3,A01224.57,A01224,57,Pastoral,['book'],1,"['nnot perswade?\n3, \nBut namelesse hee, for ...",\nword is changed in signification by changing...,['pastoral'],A01224,...,"Fraunce, Abraham, fl. 1587-1633",1588.0,Thomas Orwin,At London,"['Linguistic', 'Literary']",full-work,EEBO-TCP,http://name.umdl.umich.edu/A01224.0001.001,Arcadian rhetorike: or The præcepts of rhetori...,"Greeke, Latin, English, Italian, French, Spani..."
4,A01224.61,A01224,61,Sestina,['book'],2,"['es: but let them passe, and come we to such ...","\nintangled verses: but let them passe, and co...",['sestine'],A01224,...,"Fraunce, Abraham, fl. 1587-1633",1588.0,Thomas Orwin,At London,"['Linguistic', 'Literary']",full-work,EEBO-TCP,http://name.umdl.umich.edu/A01224.0001.001,Arcadian rhetorike: or The præcepts of rhetori...,"Greeke, Latin, English, Italian, French, Spani..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
179648,yale.39002032008188.00000317,yale.39002032008188,317,Elegy,,2,"[""hythm in Babylonian\nand Hebrew narratives o...","INDEX II\n\nOF MATTERS\n\nAbu'l-Walid, 9\nAcro...",['elegie'],yale.39002032008188,...,"Gray, George Buchanan, 1865-1922",1915.0,Hodder and Stoughton,"London, New York","['Literary', 'Original Bibliography']",full-work,HathiTrust,https://hdl.handle.net/2027/yale.39002032008188,forms of Hebrew poetry; considered with specia...,considered with special reference to the criti...
179649,yale.39002088447587.00000049,yale.39002088447587,49,Ode,,1,"['ds, like v. 4.\nformer line. V. 11. >3N afte...",INTRODUCTION. xiii\ntioos of the first term ar...,['ode'],yale.39002088447587,...,"Clarke, George Somers",1810.0,Richard Taylor,London,['Literary'],full-work,HathiTrust,https://hdl.handle.net/2027/yale.39002088447587,Hebrew criticism and poetry : or the patriarch...,with appendixes of readings and interpretation...
179650,yale.39002088447587.00000052,yale.39002088447587,52,Ode,,2,"['gulph-of destruc-\n"" tion.""\nIt is conceived...","XVI INTRODUCTION.\n"" For.I-will^arise against-...",['ode'],yale.39002088447587,...,"Clarke, George Somers",1810.0,Richard Taylor,London,['Literary'],full-work,HathiTrust,https://hdl.handle.net/2027/yale.39002088447587,Hebrew criticism and poetry : or the patriarch...,with appendixes of readings and interpretation...
179651,yale.39002088447587.00000071,yale.39002088447587,71,Pastoral,,1,['rthens is rather\nunderstood to signify lite...,UPON HIS TWELVE SONS. 15\nZebulon's peopled po...,['pastoral'],yale.39002088447587,...,"Clarke, George Somers",1810.0,Richard Taylor,London,['Literary'],full-work,HathiTrust,https://hdl.handle.net/2027/yale.39002088447587,Hebrew criticism and poetry : or the patriarch...,with appendixes of readings and interpretation...


In [163]:
#set time period--in this case all PPA
filtered_literary_df["pub_year"] = pd.to_numeric(filtered_literary_df["pub_year"], errors="coerce")

df_1559_1929 = filtered_literary_df[(filtered_literary_df["pub_year"] >= 1559) & (filtered_literary_df["pub_year"] <= 1929)].copy()

print(filtered_literary_df["pub_year"].min(), filtered_literary_df["pub_year"].max())
print(df_1559_1929["pub_year"].min(), df_1559_1929["pub_year"].max())

1559.0 1929.0
1559.0 1929.0


In [164]:
#Set df for topic model
df=df_1559_1929
df

Unnamed: 0,page_id,work_id,order,form,tags,counts,contexts,page_text,spelling,source_id,...,author,pub_year,publisher,pub_place,collections,work_type,source,source_url,sort_title,subtitle
0,A01224.36,A01224,36,Elegy,['book'],1,"[""ns chaleur, sans poux, d'amoreuse langueur\n...","\nverse, one Adoniū be added thereunto, as\n\...",['elegiac'],A01224,...,"Fraunce, Abraham, fl. 1587-1633",1588.0,Thomas Orwin,At London,"['Linguistic', 'Literary']",full-work,EEBO-TCP,http://name.umdl.umich.edu/A01224.0001.001,Arcadian rhetorike: or The præcepts of rhetori...,"Greeke, Latin, English, Italian, French, Spani..."
1,A01224.52,A01224,52,Sonnet,['book'],1,"['ebat,\nImperium solenne socer.\n\nSir P. Syd...",Salust. 4. Semaine.\nReglant ensemblement nos ...,['sonet'],A01224,...,"Fraunce, Abraham, fl. 1587-1633",1588.0,Thomas Orwin,At London,"['Linguistic', 'Literary']",full-work,EEBO-TCP,http://name.umdl.umich.edu/A01224.0001.001,Arcadian rhetorike: or The præcepts of rhetori...,"Greeke, Latin, English, Italian, French, Spani..."
2,A01224.53,A01224,53,Sonnet,['book'],1,"['ebat,\nImperium solenne socer.\n\nSir P. Syd...","And,\n—Socer arua Latinus habebat,\nImperium s...",['sonet'],A01224,...,"Fraunce, Abraham, fl. 1587-1633",1588.0,Thomas Orwin,At London,"['Linguistic', 'Literary']",full-work,EEBO-TCP,http://name.umdl.umich.edu/A01224.0001.001,Arcadian rhetorike: or The præcepts of rhetori...,"Greeke, Latin, English, Italian, French, Spani..."
3,A01224.57,A01224,57,Pastoral,['book'],1,"['nnot perswade?\n3, \nBut namelesse hee, for ...",\nword is changed in signification by changing...,['pastoral'],A01224,...,"Fraunce, Abraham, fl. 1587-1633",1588.0,Thomas Orwin,At London,"['Linguistic', 'Literary']",full-work,EEBO-TCP,http://name.umdl.umich.edu/A01224.0001.001,Arcadian rhetorike: or The præcepts of rhetori...,"Greeke, Latin, English, Italian, French, Spani..."
4,A01224.61,A01224,61,Sestina,['book'],2,"['es: but let them passe, and come we to such ...","\nintangled verses: but let them passe, and co...",['sestine'],A01224,...,"Fraunce, Abraham, fl. 1587-1633",1588.0,Thomas Orwin,At London,"['Linguistic', 'Literary']",full-work,EEBO-TCP,http://name.umdl.umich.edu/A01224.0001.001,Arcadian rhetorike: or The præcepts of rhetori...,"Greeke, Latin, English, Italian, French, Spani..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
179648,yale.39002032008188.00000317,yale.39002032008188,317,Elegy,,2,"[""hythm in Babylonian\nand Hebrew narratives o...","INDEX II\n\nOF MATTERS\n\nAbu'l-Walid, 9\nAcro...",['elegie'],yale.39002032008188,...,"Gray, George Buchanan, 1865-1922",1915.0,Hodder and Stoughton,"London, New York","['Literary', 'Original Bibliography']",full-work,HathiTrust,https://hdl.handle.net/2027/yale.39002032008188,forms of Hebrew poetry; considered with specia...,considered with special reference to the criti...
179649,yale.39002088447587.00000049,yale.39002088447587,49,Ode,,1,"['ds, like v. 4.\nformer line. V. 11. >3N afte...",INTRODUCTION. xiii\ntioos of the first term ar...,['ode'],yale.39002088447587,...,"Clarke, George Somers",1810.0,Richard Taylor,London,['Literary'],full-work,HathiTrust,https://hdl.handle.net/2027/yale.39002088447587,Hebrew criticism and poetry : or the patriarch...,with appendixes of readings and interpretation...
179650,yale.39002088447587.00000052,yale.39002088447587,52,Ode,,2,"['gulph-of destruc-\n"" tion.""\nIt is conceived...","XVI INTRODUCTION.\n"" For.I-will^arise against-...",['ode'],yale.39002088447587,...,"Clarke, George Somers",1810.0,Richard Taylor,London,['Literary'],full-work,HathiTrust,https://hdl.handle.net/2027/yale.39002088447587,Hebrew criticism and poetry : or the patriarch...,with appendixes of readings and interpretation...
179651,yale.39002088447587.00000071,yale.39002088447587,71,Pastoral,,1,['rthens is rather\nunderstood to signify lite...,UPON HIS TWELVE SONS. 15\nZebulon's peopled po...,['pastoral'],yale.39002088447587,...,"Clarke, George Somers",1810.0,Richard Taylor,London,['Literary'],full-work,HathiTrust,https://hdl.handle.net/2027/yale.39002088447587,Hebrew criticism and poetry : or the patriarch...,with appendixes of readings and interpretation...


### Topic modeling 

In [165]:
path_to_mallet = '/Applications/mallet-2.0.8/bin/mallet'

In [166]:
training_data = [lmw.process_string(text, numbers='remove') for text in df['page_text']]

In [167]:
original_texts = [text for text in df['page_text']]

In [168]:
training_data = [lmw.process_string(t) for t in df['page_text'].tolist()]
training_data = [d for d in training_data if d.strip()]

len(training_data)

In [169]:
page_ids = [title for title in df['page_id']]

In [170]:
lmw.print_dataset_stats(training_data)

Number of Documents: 179647
Mean Number of Words per Document: 189.9
Vocabulary Size: 548806


In [171]:
num_topics = 100

output_directory_path = 'topic-model-output/ppa-100-topics_all'

In [172]:
Path(f"{output_directory_path}").mkdir(parents=True, exist_ok=True)

path_to_training_data           = f"{output_directory_path}/training.txt"
path_to_formatted_training_data = f"{output_directory_path}/mallet.training"
path_to_model                   = f"{output_directory_path}/mallet.model.{str(num_topics)}"
path_to_topic_keys              = f"{output_directory_path}/mallet.topic_keys.{str(num_topics)}"
path_to_topic_distributions     = f"{output_directory_path}/mallet.topic_distributions.{str(num_topics)}"

In [173]:
topic_keys, topic_distributions = lmw.quick_train_topic_model(path_to_mallet, 
                                                              output_directory_path, 
                                                              num_topics, 
                                                              training_data)

Importing data...
Complete
Training topic model...


Mallet LDA: 100 topics, 7 topic bits, 1111111 topic mask
Data loaded.
max tokens: 1204
total tokens: 34105779
<10> LL/token: -10.98392
<20> LL/token: -10.17862
<30> LL/token: -9.91782
<40> LL/token: -9.78678

0	0.05	verse prose one line rhythm may lines even english time metre would metrical blank form though much fact almost two 
1	0.05	song love john sir william poems ode death lady drayton page thomas sonnet book elegy robert contents ben drummond songs 
2	0.05	poems years published first year life work one time two written volume died poem born author wrote college london works 
3	0.05	chaucer french see tale poem gower lydgate mss tales balade english latin also love written canterbury king prologue ant printed 
4	0.05	one may ode would read odes page must though line many word without seems also however edition part first another 
5	0.05	one men death sea man like old war world shall yet may dead away would long life day last ever 
6	0.05	poem much poet would though great tasso s

<50> LL/token: -9.71039
<60> LL/token: -9.65981
<70> LL/token: -9.62471
<80> LL/token: -9.59857
<90> LL/token: -9.57905

0	0.05	verse line rhythm one prose lines may even time metrical form english would metre blank two prosody almost much fact 
1	0.05	song john sir william love thomas death ode poems lady drayton robert page george contents book drummond browne henry elegy 
2	0.05	years poems published year first life born died one time work college two written london wrote poet author poem volume 
3	0.05	chaucer see tale french lydgate gower poem english tales balade mss love iii also king may part poems prologue troilus 
4	0.05	ode line one read odes page may would strophe pindaric word original first note though another edition however part pindar 
5	0.05	men one war death sea man old battle like fight sword shall blood land yet life world upon long away 
6	0.05	poem epic tasso poet much great subject work though characters whole homer character part virgil incidents would reader s

<100> LL/token: -9.56167
<110> LL/token: -9.54843
<120> LL/token: -9.53727
<130> LL/token: -9.52773
<140> LL/token: -9.51939

0	0.05	verse rhythm line one prose lines may time even metrical form blank two would english metre almost prosody free much 
1	0.05	song john sir william love thomas browne death drayton poems ode george drummond robert jonson lady book henry ben page 
2	0.05	years year published poems life first born died time college one london poet work written wrote two death cambridge works 
3	0.05	chaucer tale lydgate see gower french english poem tales balade mss love iii may also king prologue poems part women 
4	0.05	ode odes line page read strophe one pindaric pindar may epode greek antistrophe original strophes part first word great irregular 
5	0.05	men war sea battle man one death fight old like sword blood shall land king long yet brave many last 
6	0.05	poem epic tasso poet much great subject homer virgil characters work whole though part character jerusalem arios

<150> LL/token: -9.5112
<160> LL/token: -9.50496
<170> LL/token: -9.49926
<180> LL/token: -9.49401
<190> LL/token: -9.49002

0	0.05	verse rhythm line lines one prose may time metrical form even blank would two metre english free use almost prosody 
1	0.05	song sir john william browne love thomas drayton jonson death drummond george ben ode wither poems lord book daniel henry 
2	0.05	years year life published poems first born died time college one london poet work written wrote death two cambridge became 
3	0.05	chaucer lydgate tale gower see english poem french balade tales love mss may iii also prologue sir ant women king 
4	0.05	ode odes strophe pindaric pindar page one line read epode greek may antistrophe strophes irregular cowley gray great regular part 
5	0.05	men war sea battle one man death fight old like sword shall blood king long brave land yet many great 
6	0.05	poem epic tasso great homer poet virgil much subject characters work whole jerusalem though book genius many inci

<200> LL/token: -9.48572
[beta: 0.00398] 
<210> LL/token: -9.45831
[beta: 0.00379] 
<220> LL/token: -9.40592
[beta: 0.0038] 
<230> LL/token: -9.37702
[beta: 0.00382] 
<240> LL/token: -9.35863

0	0.04286	verse rhythm line lines one prose may time metrical form blank two even metre english would free number end use 
1	0.02059	sir song john william browne thomas drayton love jonson drummond death george ben ode poems lord wither daniel henry shepherd 
2	0.0432	years life year published poems died born first time college one london poet written wrote death work cambridge two became 
3	0.01208	chaucer lydgate tale gower see english french poem balade love tales mss may iii also king prologue poems sir skeat 
4	0.01135	ode odes strophe pindar pindaric one epode greek antistrophe strophes irregular may cowley page gray great english regular chorus collins 
5	0.03348	men war sea one battle man death fight old like blood king sword shall long land brave great many upon 
6	0.01472	poem homer epi

[beta: 0.00384] 
<250> LL/token: -9.34638
[beta: 0.00385] 
<260> LL/token: -9.33811
[beta: 0.00386] 
<270> LL/token: -9.3309
[beta: 0.00387] 
<280> LL/token: -9.32595
[beta: 0.00388] 
<290> LL/token: -9.32124

0	0.03765	verse rhythm line lines prose one may metrical time form blank two even metre english would use syllables end free 
1	0.01775	sir song john william browne thomas drayton love jonson drummond ben george death lord ode wither daniel shepherd herbert donne 
2	0.03761	years life year published poems born died first time college one london poet written wrote cambridge work death two became 
3	0.01026	chaucer lydgate see tale gower english french balade poem love tales mss iii may also king prologue sir poems skeat 
4	0.00779	ode odes strophe pindar pindaric epode one greek antistrophe strophes may cowley irregular gray great regular chorus english collins lyric 
5	0.03009	war men battle sea one man death fight king blood like old sword shall long great land brave god many 
6

[beta: 0.00388] 
<300> LL/token: -9.3179
[beta: 0.00389] 
<310> LL/token: -9.31414
[beta: 0.00389] 
<320> LL/token: -9.3117
[beta: 0.0039] 
<330> LL/token: -9.30878
[beta: 0.00391] 
<340> LL/token: -9.30627

0	0.03617	verse rhythm line lines one prose may metrical time blank two form even metre syllables english would use end number 
1	0.0168	sir song john william browne drayton thomas jonson love drummond george death ben lord ode wither shepherd daniel donne herrick 
2	0.03602	years life year published poems died born first time college london one poet written cambridge death wrote work two became 
3	0.00977	chaucer lydgate tale see gower english french balade love may tales mss iii poem also prologue poems women king sir 
4	0.00692	ode odes strophe pindar pindaric epode one greek antistrophe strophes cowley irregular gray may chorus regular english great collins three 
5	0.02919	war men sea battle one man death fight king like old blood sword shall long great land brave many god 
6	

[beta: 0.00391] 
<350> LL/token: -9.30432
[beta: 0.00391] 
<360> LL/token: -9.30159
[beta: 0.00392] 
<370> LL/token: -9.29989
[beta: 0.00392] 
<380> LL/token: -9.29789
[beta: 0.00393] 
<390> LL/token: -9.29579

0	0.03513	verse rhythm line lines prose one may metrical blank time two form metre syllables even use end english pause would 
1	0.01601	sir song john william browne drayton thomas jonson love drummond george ben ode lord wither death daniel donne herrick henry 
2	0.03468	years life year published poems died born first college time london one poet cambridge wrote written death two became work 
3	0.00953	chaucer lydgate tale see gower english french balade poem tales love may mss iii king also prologue ant sir women 
4	0.00648	ode odes strophe pindar pindaric epode one greek antistrophe strophes irregular cowley gray may great english regular chorus lyric three 
5	0.02894	war men battle sea one man death fight king blood like sword old great shall long land brave god many 
6	0.00

[beta: 0.00393] 
<400> LL/token: -9.29386
[beta: 0.00394] 
<410> LL/token: -9.29216
[beta: 0.00394] 
<420> LL/token: -9.29065
[beta: 0.00394] 
<430> LL/token: -9.28884
[beta: 0.00395] 
<440> LL/token: -9.28749

0	0.03394	verse rhythm line lines prose one may metrical blank time two form syllables metre end words pause even english use 
1	0.01552	sir john song william browne drayton thomas jonson love drummond ben george lord wither donne ode daniel death herrick poems 
2	0.03366	years life year published poems born died first college time london one cambridge poet death written wrote work became two 
3	0.00925	chaucer lydgate tale see gower english french balade poem love tales may mss iii king poems also ant prologue women 
4	0.00618	ode odes strophe pindar pindaric one epode greek antistrophe strophes cowley irregular gray english may regular great chorus lyric three 
5	0.02864	war men battle sea one man death king fight like blood old great sword shall land long brave god hand 
6	0.

[beta: 0.00395] 
<450> LL/token: -9.28599
[beta: 0.00395] 
<460> LL/token: -9.28473
[beta: 0.00396] 
<470> LL/token: -9.28321
[beta: 0.00396] 
<480> LL/token: -9.28189
[beta: 0.00396] 
<490> LL/token: -9.28094

0	0.03331	verse rhythm line lines prose one may blank metrical two time form syllables metre words use pause english end even 
1	0.01548	sir john song william browne drayton thomas jonson love george drummond ben donne death wither lord ode daniel poems herrick 
2	0.03304	years life year published poems born died college first time london one poet cambridge death written wrote work became oxford 
3	0.0091	chaucer lydgate see tale gower english french balade poem may tales love mss iii poems king also prologue god ant 
4	0.00616	ode odes strophe pindar pindaric epode one greek antistrophe strophes irregular gray cowley english regular may lyric three great chorus 
5	0.02868	war men battle sea one man death fight king blood old like great shall sword land brave long god many 
6	0.

[beta: 0.00396] 
<500> LL/token: -9.27916
[beta: 0.00397] 
<510> LL/token: -9.27801
[beta: 0.00397] 
<520> LL/token: -9.27704
[beta: 0.00397] 
<530> LL/token: -9.27582
[beta: 0.00398] 
<540> LL/token: -9.27496

0	0.03229	verse rhythm line lines prose one may metrical blank two time form syllables metre words end use pause english number 
1	0.01501	sir john song william browne drayton thomas jonson love ben george drummond donne lord wither death daniel ode herrick book 
2	0.03255	years life year poems published born died college first time london one cambridge poet work wrote death written became oxford 
3	0.00887	chaucer lydgate see tale gower english french balade tales mss love may poem iii king poems sir ant prologue women 
4	0.00599	ode odes strophe pindar pindaric epode one greek antistrophe strophes irregular may cowley gray english regular great lyric chorus collins 
5	0.0284	war men battle sea one man fight king death like old blood sword great shall god land long brave many 


[beta: 0.00398] 
<550> LL/token: -9.2734
[beta: 0.00399] 
<560> LL/token: -9.27205
[beta: 0.00399] 
<570> LL/token: -9.27049
[beta: 0.00399] 
<580> LL/token: -9.26966
[beta: 0.00399] 
<590> LL/token: -9.26876

0	0.03138	verse rhythm line lines prose one blank may metrical two time form syllables metre words pause use end english music 
1	0.01491	sir john song william browne drayton thomas jonson love ben george drummond donne wither death lord daniel ode poems herrick 
2	0.03171	years life year published poems born died first college time london one poet cambridge death work wrote written became oxford 
3	0.00868	chaucer lydgate tale gower see english french balade love tales may poem mss iii prologue king ant god also women 
4	0.00589	ode odes strophe pindar pindaric epode greek one antistrophe strophes irregular cowley gray may great english lyric regular chorus three 
5	0.02843	war men battle sea one man death old fight king like blood sword great land shall many brave god long 
6	0

[beta: 0.00399] 
<600> LL/token: -9.26815
[beta: 0.004] 
<610> LL/token: -9.26724
[beta: 0.004] 
<620> LL/token: -9.26611
[beta: 0.004] 
<630> LL/token: -9.26518
[beta: 0.004] 
<640> LL/token: -9.26456

0	0.03096	verse rhythm line lines prose one blank may metrical two time syllables form metre words pause use english end music 
1	0.01494	sir john song william browne drayton thomas jonson love george ben drummond donne lord poems wither daniel ode death herrick 
2	0.03125	years life year published born poems died college first time london one cambridge death poet wrote written became work father 
3	0.00848	chaucer see lydgate tale gower english french balade love tales may poem mss iii king poems god sir women skeat 
4	0.00565	ode odes pindar strophe pindaric epode one greek antistrophe strophes irregular cowley gray english may great regular lyric chorus three 
5	0.02828	war men battle sea one death king man fight like old blood sword great shall land god brave long many 
6	0.00712	ho

[beta: 0.004] 
<650> LL/token: -9.26358
[beta: 0.00401] 
<660> LL/token: -9.26272
[beta: 0.00401] 
<670> LL/token: -9.26175
[beta: 0.00401] 
<680> LL/token: -9.26129
[beta: 0.00401] 
<690> LL/token: -9.26008

0	0.0302	verse rhythm line lines prose one blank metrical may two time syllables form words metre pause use english music end 
1	0.01476	sir john song william browne thomas drayton jonson love ben george donne lord drummond wither poems death ode daniel herrick 
2	0.0309	years life year published died born poems college first london time one cambridge poet death written work became wrote oxford 
3	0.00834	chaucer tale see lydgate gower french english balade love may tales mss poem iii also god poems ant king prologue 
4	0.00564	ode odes pindar strophe pindaric epode greek one antistrophe strophes irregular gray cowley english may great regular lyric chorus three 
5	0.02807	war men battle sea one death king man like fight old blood sword great shall land god brave long many 
6	0.00

[beta: 0.00401] 
<700> LL/token: -9.25953
[beta: 0.00402] 
<710> LL/token: -9.25855
[beta: 0.00402] 
<720> LL/token: -9.25795
[beta: 0.00402] 
<730> LL/token: -9.2574
[beta: 0.00402] 
<740> LL/token: -9.25611

0	0.02976	verse rhythm lines line prose one blank may metrical two time syllables form words metre pause use english music end 
1	0.01483	sir john song william browne thomas drayton jonson george love ben donne drummond lord wither herrick daniel death ode poems 
2	0.03038	years life year published born poems died college first time london one cambridge poet death became work written two father 
3	0.00832	chaucer tale lydgate see gower english french balade love may poem tales mss iii king god also ant women thou 
4	0.00561	ode odes strophe pindar pindaric epode greek one antistrophe strophes irregular cowley gray english great may regular lyric chorus poet 
5	0.02852	war men battle sea one death man king fight old like blood sword shall great land god brave long hand 
6	0.00679	

[beta: 0.00402] 
<750> LL/token: -9.25512
[beta: 0.00402] 
<760> LL/token: -9.25466
[beta: 0.00403] 
<770> LL/token: -9.25373
[beta: 0.00403] 
<780> LL/token: -9.25355
[beta: 0.00403] 
<790> LL/token: -9.25321

0	0.02913	verse rhythm lines line prose one blank metrical may two time syllables form metre words pause music use english end 
1	0.0146	sir john song william browne thomas drayton jonson love george ben donne drummond lord herrick wither ode poems daniel death 
2	0.03014	years life year published born poems died college time first london one cambridge death poet wrote written work father became 
3	0.00826	chaucer lydgate see tale gower english french balade love may mss poem tales iii also god poems king ant women 
4	0.00559	ode odes pindar strophe pindaric epode one greek antistrophe strophes irregular cowley gray may english great chorus lyric regular three 
5	0.02848	war men battle sea one death king man fight like old great blood sword shall land long god brave many 
6	0.00

[beta: 0.00403] 
<800> LL/token: -9.25261
[beta: 0.00403] 
<810> LL/token: -9.25181
[beta: 0.00403] 
<820> LL/token: -9.25128
[beta: 0.00403] 
<830> LL/token: -9.25064
[beta: 0.00403] 
<840> LL/token: -9.24977

0	0.02908	verse rhythm lines line prose one blank metrical may time two syllables words metre form music pause english use end 
1	0.01481	sir john william song browne thomas drayton jonson love ben george donne lord drummond herrick death wither daniel poems ode 
2	0.02958	years life year published poems born died college time first london one cambridge death poet father written became work wrote 
3	0.00817	chaucer tale see lydgate gower french english balade may love mss tales poem iii also god poems king skeat thou 
4	0.00549	ode odes strophe pindar pindaric epode one greek antistrophe strophes irregular cowley gray may english regular great chorus lyric collins 
5	0.02853	war men battle sea one king death like man fight blood old sword great shall land god many brave long 
6	

[beta: 0.00404] 
<850> LL/token: -9.2495
[beta: 0.00403] 
<860> LL/token: -9.24872
[beta: 0.00404] 
<870> LL/token: -9.24839
[beta: 0.00404] 
<880> LL/token: -9.24788
[beta: 0.00404] 
<890> LL/token: -9.24777

0	0.02836	verse rhythm lines line prose one blank metrical may two time syllables music form words metre pause use end english 
1	0.01466	sir john william song browne thomas drayton jonson love george ben donne lord drummond herrick wither ode poems death daniel 
2	0.02946	years life year published poems born died college first time london one cambridge work death poet father wrote written became 
3	0.00809	chaucer tale lydgate see gower french balade english love may tales mss poem iii poems king god ant also thou 
4	0.00523	ode odes strophe pindar pindaric epode one greek antistrophe strophes cowley irregular gray english may great regular lyric chorus collins 
5	0.02859	war men battle sea one king death man like fight blood old great sword shall land many brave long came 
6	0.

[beta: 0.00404] 
<900> LL/token: -9.24697
[beta: 0.00404] 
<910> LL/token: -9.24627
[beta: 0.00405] 
<920> LL/token: -9.24592
[beta: 0.00405] 
<930> LL/token: -9.24514
[beta: 0.00405] 
<940> LL/token: -9.24444

0	0.028	verse rhythm lines line prose blank one metrical may two syllables time metre music form words pause use end english 
1	0.01465	sir john william song browne thomas drayton jonson donne george love ben lord wither herrick drummond iii daniel death note 
2	0.02898	years life year published poems died born college time london first one cambridge poet death work father became written oxford 
3	0.00801	chaucer tale lydgate see gower balade french english love may mss tales poem iii god also king thou ant women 
4	0.00534	ode odes strophe pindar pindaric epode greek one antistrophe cowley strophes irregular gray may english great chorus regular lyric collins 
5	0.02817	war men battle sea king one death fight man like old blood shall sword great land brave long many came 
6	0.0

[beta: 0.00405] 
<950> LL/token: -9.24417
[beta: 0.00405] 
<960> LL/token: -9.24362
[beta: 0.00405] 
<970> LL/token: -9.24342
[beta: 0.00405] 
<980> LL/token: -9.24276
[beta: 0.00405] 
<990> LL/token: -9.24269

0	0.02758	verse rhythm line lines prose blank one metrical may syllables time two music form metre words pause end english use 
1	0.01444	sir john william song browne thomas drayton jonson george donne love ben lord iii death wither note herrick drummond poems 
2	0.02865	years life year poems published born college died time first london one cambridge poet work death father became wrote oxford 
3	0.00801	chaucer tale lydgate see gower balade french may english love mss tales poem iii god thou poems king also ant 
4	0.00523	ode odes strophe pindar pindaric epode one greek antistrophe strophes irregular cowley gray may great english regular chorus lyric collins 
5	0.02805	war men battle sea one king like man death fight blood old shall sword great land brave came long god 
6	0.006

[beta: 0.00405] 
<1000> LL/token: -9.24222

Total time: 1 hours 21 minutes 35 seconds


Complete


In [174]:
topics = lmw.load_topic_keys(path_to_topic_keys)

for topic_number, topic in enumerate(topics):
    print(f"✨Topic {topic_number}✨\n\n{topic}\n")

✨Topic 0✨

['verse', 'rhythm', 'line', 'lines', 'prose', 'blank', 'one', 'metrical', 'may', 'syllables', 'time', 'two', 'music', 'words', 'form', 'metre', 'pause', 'english', 'end', 'use']

✨Topic 1✨

['sir', 'john', 'william', 'song', 'browne', 'thomas', 'drayton', 'jonson', 'george', 'donne', 'love', 'ben', 'lord', 'iii', 'death', 'wither', 'herrick', 'poems', 'note', 'drummond']

✨Topic 2✨

['years', 'life', 'year', 'published', 'born', 'poems', 'died', 'college', 'first', 'london', 'time', 'one', 'cambridge', 'death', 'poet', 'became', 'work', 'father', 'oxford', 'two']

✨Topic 3✨

['chaucer', 'tale', 'lydgate', 'see', 'gower', 'french', 'balade', 'english', 'love', 'may', 'mss', 'tales', 'poem', 'iii', 'also', 'thou', 'god', 'king', 'poems', 'ant']

✨Topic 4✨

['ode', 'odes', 'pindar', 'strophe', 'pindaric', 'epode', 'one', 'greek', 'antistrophe', 'strophes', 'cowley', 'gray', 'irregular', 'may', 'great', 'english', 'regular', 'lyric', 'chorus', 'collins']

✨Topic 5✨

['war', 'men

In [175]:
topic_distributions = lmw.load_topic_distributions(path_to_topic_distributions)

In [176]:
topic_distributions[10]

[0.0003564083456805333,
 0.0001867086067485727,
 0.00036731916258307337,
 0.0001035011353150639,
 6.761546954466625e-05,
 0.09064930575428476,
 8.89199688531044e-05,
 0.0001665042066589847,
 0.00012051533870638359,
 0.0004950672075597135,
 0.00027627775051505913,
 0.00028876672208476284,
 8.742123918552898e-05,
 0.00017040488303839482,
 0.00022505760207135842,
 0.0001478953430020974,
 9.743803293655116e-05,
 0.0007417543999461098,
 0.00022779345093189315,
 0.0001112510328866425,
 0.18070461990780376,
 0.0006279714752032712,
 0.00022030138795386213,
 8.087252501439713e-05,
 5.3852315415866276e-05,
 0.00048540936945251346,
 0.00019872801517790295,
 3.7165820424536204e-05,
 0.00019988449074613774,
 7.80389619538245e-05,
 0.0007263031482673412,
 5.5258035289012826e-05,
 0.0007463426480007402,
 0.00015801984618400122,
 0.0008002955937840362,
 3.821848990728524e-05,
 3.224002568511934e-05,
 7.099062949124995e-05,
 0.00012321629255073276,
 0.00012205195451989202,
 4.9972668144654216e-05,
 0.0

In [177]:
page_id_to_check = "mdp.39015030930088.00000348"

page_id_number = page_ids.index(page_id_to_check)

print(f"Topic Distributions for {page_ids[page_id_number]}\n")
for topic_number, (topic, topic_distribution) in enumerate(zip(topics, topic_distributions[page_id_number])):
    print(f"✨Topic {topic_number} {topic[:20]} ✨\nProbability: {round(topic_distribution, 3)}\n")


Topic Distributions for mdp.39015030930088.00000348

✨Topic 0 ['verse', 'rhythm', 'line', 'lines', 'prose', 'blank', 'one', 'metrical', 'may', 'syllables', 'time', 'two', 'music', 'words', 'form', 'metre', 'pause', 'english', 'end', 'use'] ✨
Probability: 0.0

✨Topic 1 ['sir', 'john', 'william', 'song', 'browne', 'thomas', 'drayton', 'jonson', 'george', 'donne', 'love', 'ben', 'lord', 'iii', 'death', 'wither', 'herrick', 'poems', 'note', 'drummond'] ✨
Probability: 0.0

✨Topic 2 ['years', 'life', 'year', 'published', 'born', 'poems', 'died', 'college', 'first', 'london', 'time', 'one', 'cambridge', 'death', 'poet', 'became', 'work', 'father', 'oxford', 'two'] ✨
Probability: 0.0

✨Topic 3 ['chaucer', 'tale', 'lydgate', 'see', 'gower', 'french', 'balade', 'english', 'love', 'may', 'mss', 'tales', 'poem', 'iii', 'also', 'thou', 'god', 'king', 'poems', 'ant'] ✨
Probability: 0.232

✨Topic 4 ['ode', 'odes', 'pindar', 'strophe', 'pindaric', 'epode', 'one', 'greek', 'antistrophe', 'strophes', 'c

In [178]:
target_labels = random.sample(page_ids, 10)

In [179]:
training_data_page_ids = dict(zip(training_data, page_ids))
training_data_original_text = dict(zip(training_data, original_texts))

In [180]:
def display_top_titles_per_topic(topic_number=10, number_of_documents=5):
    
    print(f"✨Topic {topic_number}✨\n\n{topics[topic_number]}\n")

    for probability, document in lmw.get_top_docs(training_data, topic_distributions, topic_number, n=number_of_documents):
        print(round(probability, 4), training_data_page_ids[document] + "\n")
    return

In [128]:
display_top_titles_per_topic(topic_number=10, number_of_documents=5)

✨Topic 10✨

['ballad', 'old', 'ancient', 'see', 'king', 'two', 'printed', 'one', 'time', 'copy', 'many', 'may', 'vol', 'great', 'sir', 'ballads', 'following', 'english', 'song', 'would']

0.9981 hvd.32044089057772.00000117

0.9981 hvd.32044089057764.00000111

0.9969 mdp.39015028765124.00000102

0.9962 hvd.32044089057772.00000117

0.9959 uc1.b3310854.00000312

