# Corpus: Chadwyck-Healey poetry collections

## Loading corpus from source

In [1]:
import sys
sys.path.append('../')
from generative_formalism import *

# Get the Chadwyck-Healey corpus path
print(f"""{"✓" if PATH_CHADWYCK_HEALEY_TXT and os.path.exists(PATH_CHADWYCK_HEALEY_TXT) else "X"} Chadwyck-Healey corpus path: {PATH_CHADWYCK_HEALEY_TXT}""")
print(f"""{"✓" if PATH_CHADWYCK_HEALEY_METADATA and os.path.exists(PATH_CHADWYCK_HEALEY_METADATA) else "X"} Chadwyck-Healey metadata path: {PATH_CHADWYCK_HEALEY_METADATA}""")

# Download if necessary?
print(f"""{"✓" if URL_CHADWYCK_HEALEY_METADATA and URL_CHADWYCK_HEALEY_METADATA else "X"} Metadata file URL set in environment (.env or shell)""")
print(f"""{"✓" if URL_CHADWYCK_HEALEY_TXT and URL_CHADWYCK_HEALEY_TXT else "X"} Corpus text file URL set in environment (.env or shell)""")

✓ Chadwyck-Healey corpus path: /Users/rj416/github/generative-formalism/data/chadwyck_poetry/txt
✓ Chadwyck-Healey metadata path: /Users/rj416/github/generative-formalism/data/chadwyck_poetry/metadata.csv
✓ Metadata file URL set in environment (.env or shell)
✓ Corpus text file URL set in environment (.env or shell)


In [2]:
printm(f'### Loading corpus metadata')
df_meta = get_chadwyck_corpus_metadata(
    fields=CHADWYCK_CORPUS_FIELDS,
    period_by=50,
    download_if_necessary=True,
    overwrite=False,
    min_num_lines=10,
    max_num_lines=100,
    min_author_dob=1600,
    max_author_dob=2000,
)

describe_corpus(df_meta)

### Loading corpus metadata

#### Getting Chadwyck-Healey corpus metadata

* Loading metadata from /Users/rj416/github/generative-formalism/data/chadwyck_poetry/metadata.csv
* Loaded 336180 rows of metadata
* Filtering: 259,310 rows after author birth year >= 1600
* Filtering: 259,310 rows after author birth year <= 2000
* Filtering: 225,986 rows after number of lines >= 10
* Filtering: 204,514 rows after number of lines <= 100


----

#### Subcorpus breakdown

subcorpus
English Poetry              127738
American Poetry              62116
Modern Poetry                 6478
African-American Poetry       5063
The Faber Poetry Library      3119
Name: count, dtype: int64



----

#### Historical period breakdown (from metadata)

period_meta
1900-1999 Twentieth-Century                    60865
1835-1869 Mid Nineteenth-Century               37829
1870-1899 Later Nineteenth-Century             29684
1800-1834 Early Nineteenth-Century             22029
                                               20723
1700-1749 Early Eighteenth-Century             10693
1750-1799 Later Eighteenth-Century              9740
1603-1660 Jacobean and Caroline                 5701
1660-1700 Restoration                           4808
1550-1900 Miscellanies and Collections          1600
1500-1700 Emblems, Epigrams, Formal Satires      461
1500-1580 Tudor                                  149
1880-1901 Late Victorian                         134
1860-1880 Mid-Victorian                           78
1837-1860 Early Victorian                         12
1500-1700 Songbooks                                6
1901-1914 Edwardian Period                         2
Name: count, dtype: int64



----

#### Historical period breakdown (from author birth year)

period
1800-1850    64434
1900-1950    39044
1850-1900    32164
1750-1800    30135
1700-1750    14511
1600-1650     9076
1650-1700     7741
1950-2000     7409
Name: count, dtype: int64



----

#### Historical period + subcorpus breakdown

                                    count
period    subcorpus                      
1600-1650 American Poetry             361
          English Poetry             8715
1650-1700 American Poetry              74
          English Poetry             7667
1700-1750 African-American Poetry       3
          American Poetry             340
          English Poetry            14166
          The Faber Poetry Library      2
1750-1800 African-American Poetry     284
          American Poetry            4935
          English Poetry            24914
          The Faber Poetry Library      2
1800-1850 African-American Poetry     542
          American Poetry           20934
          English Poetry            42956
          Modern Poetry                 1
          The Faber Poetry Library      1
1850-1900 African-American Poetry    1883
          American Poetry           10171
          English Poetry            18438
          Modern Poetry               809
          The Faber Poetry Library

----

#### Author birth year distribution

author_dob
1600 ------- [ 1793   | 1830 |   1892 ] -------- 1974



----

#### Number of lines in poems

num_lines
10 ------- [ 16   | 24 |   40 ] -------- 100



----

#### Annotated rhyme distribution

rhyme
y      142007
        58062
n        4302
y n       139
Y           4
Name: count, dtype: int64



----

#### Metadata

Unnamed: 0_level_0,id,period_meta,subcorpus,author,author_dob,title,year,num_lines,volume,line,rhyme,genre,period
id_hash,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
552316,english/bartonbe/Z200274127,1800-1834 Early Nineteenth-Century,English Poetry,"Barton, Bernard, 1784-1849",1784.0,SONNET. II. [The night seems darkest ere the ...,1814,14,The Reliquary (1836),&indent;Rises with light and gladness on its w...,y,Sonnet,1750-1800
889984,c20-american/am20129/Z300227191,1900-1999 Twentieth-Century,American Poetry,"Pound, Ezra, 1885-1972",1885.0,LAMENT OF THE FRONTIER GUARD,1915,24,,"By the North Gate, the wind blows full of sand,",,,1850-1900
729937,c20-english/ep20128/Z200582770,1900-1999 Twentieth-Century,English Poetry,"Rodker, John, 1894-",1894.0,Married,1924,27,,This roof tree holds us,,,1850-1900
100137,english-ed2/ep2527/Z300669174,,English Poetry,"Barton, Emily M., 1817-1909",1817.0,Reply to the Question: “What is the Wealth of...,1847,31,Straws on the Stream: by E. M. B. (1910),"Tin, copper, iron; silver, gems untold?",y,,1800-1850
922011,english-ed2/miscell3/Z200441103,1550-1900 Miscellanies and Collections,English Poetry,"De Vere, Aubrey, 1814-1902",1814.0,CXXII EVENING MELODY,1844,25,,O that the pines which crown yon steep,y,Lyric,1800-1850
...,...,...,...,...,...,...,...,...,...,...,...,...,...
814011,english/merival1/Z300428527,1800-1834 Early Nineteenth-Century,English Poetry,"Merivale, John Herman, 1779-1844",1779.0,FROM CHATTERTON'S “ÆLLA.”,1809,48,Poems original and translated (1844),&indent;&indent;The meads are sprinkled with a...,y,,1750-1800
845120,english/cowperwi/Z300323182,1750-1799 Later Eighteenth-Century,English Poetry,"Cowper, William, 1731-1800",1731.0,SELF&hyphen;LOVE AND TRUTH INCOMPATIBLE.,1761,32,The Works (1835–1837): TRANSLATIONS FROM THE F...,That fill'd my soul with fear and shame;,y,Lyric,1700-1750
890213,english/colersam/Z300317557,1800-1834 Early Nineteenth-Century,English Poetry,"Coleridge, Samuel Taylor, 1772-1834",1772.0,34 EPITAPH ON A BAD MAN,1802,12,The Complete Poetical Works (1912),&indent;This sad brief tale is all that Truth ...,y,Epitaph,1750-1800
345093,american/am0232/Z200151866,1835-1869 Mid Nineteenth-Century,American Poetry,"Osgood, Frances Sargent Locke, 1811-1850",1811.0,LITTLE MAY. SUGGESTED BY A CONVERSATION WITH ...,1841,28,"[Poems, in] The memento (1849)",&indent;Till you are ready too!,y,,1800-1850


In [3]:
printm(f'### Loading corpus text files')
df_corpus = get_chadwyck_corpus()
df_corpus

### Loading corpus text files

##### Loading Chadwyck-Healey corpus (metadata + txt)

* Loading corpus metadata from memory
* Loading 204514 texts


  : 100%|██████████| 204514/204514 [00:07<00:00, 28469.70it/s]


Unnamed: 0_level_0,id,period_meta,subcorpus,author,author_dob,title,year,num_lines,volume,line,rhyme,genre,period,txt
id_hash,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
552316,english/bartonbe/Z200274127,1800-1834 Early Nineteenth-Century,English Poetry,"Barton, Bernard, 1784-1849",1784.0,SONNET. II. [The night seems darkest ere the ...,1814,14,The Reliquary (1836),&indent;Rises with light and gladness on its w...,y,Sonnet,1750-1800,The night seems darkest ere the dawn of day\n ...
889984,c20-american/am20129/Z300227191,1900-1999 Twentieth-Century,American Poetry,"Pound, Ezra, 1885-1972",1885.0,LAMENT OF THE FRONTIER GUARD,1915,24,,"By the North Gate, the wind blows full of sand,",,,1850-1900,"By the North Gate, the wind blows full of sand..."
729937,c20-english/ep20128/Z200582770,1900-1999 Twentieth-Century,English Poetry,"Rodker, John, 1894-",1894.0,Married,1924,27,,This roof tree holds us,,,1850-1900,This roof tree holds us\nwith trembling darkne...
100137,english-ed2/ep2527/Z300669174,,English Poetry,"Barton, Emily M., 1817-1909",1817.0,Reply to the Question: “What is the Wealth of...,1847,31,Straws on the Stream: by E. M. B. (1910),"Tin, copper, iron; silver, gems untold?",y,,1800-1850,"Australia's Wealth? Has she not mines of gold,..."
922011,english-ed2/miscell3/Z200441103,1550-1900 Miscellanies and Collections,English Poetry,"De Vere, Aubrey, 1814-1902",1814.0,CXXII EVENING MELODY,1844,25,,O that the pines which crown yon steep,y,Lyric,1800-1850,O that the pines which crown yond steep\n T...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
814011,english/merival1/Z300428527,1800-1834 Early Nineteenth-Century,English Poetry,"Merivale, John Herman, 1779-1844",1779.0,FROM CHATTERTON'S “ÆLLA.”,1809,48,Poems original and translated (1844),&indent;&indent;The meads are sprinkled with a...,y,,1750-1800,"The budding floweret blushes at the light,\n ..."
845120,english/cowperwi/Z300323182,1750-1799 Later Eighteenth-Century,English Poetry,"Cowper, William, 1731-1800",1731.0,SELF&hyphen;LOVE AND TRUTH INCOMPATIBLE.,1761,32,The Works (1835–1837): TRANSLATIONS FROM THE F...,That fill'd my soul with fear and shame;,y,Lyric,1700-1750,"From thorny wilds a monster came,\nThat filled..."
890213,english/colersam/Z300317557,1800-1834 Early Nineteenth-Century,English Poetry,"Coleridge, Samuel Taylor, 1772-1834",1772.0,34 EPITAPH ON A BAD MAN,1802,12,The Complete Poetical Works (1912),&indent;This sad brief tale is all that Truth ...,y,Epitaph,1750-1800,"Of him that in this gorgeous tomb does lie,\n ..."
345093,american/am0232/Z200151866,1835-1869 Mid Nineteenth-Century,American Poetry,"Osgood, Frances Sargent Locke, 1811-1850",1811.0,LITTLE MAY. SUGGESTED BY A CONVERSATION WITH ...,1841,28,"[Poems, in] The memento (1849)",&indent;Till you are ready too!,y,,1800-1850,"Mamma, you must not let me die\n Till you a..."


## Sampling corpus

In [4]:
printm(f'### Loading period sample in paper')
df_smpl_by_period_in_paper = get_chadwyck_corpus_sampled_by_period_as_in_paper()
describe_corpus(df_smpl_by_period_in_paper)

### Loading period sample in paper

#### Getting sampled corpus by period

* Loading data as in paper: /Users/rj416/github/generative-formalism/data/corpus_sample_by_period.data_as_in_paper.csv.gz


----

#### Subcorpus breakdown

subcorpus
English Poetry              5573
American Poetry             1731
Modern Poetry                286
African-American Poetry      250
The Faber Poetry Library     160
Name: count, dtype: int64



----

#### Historical period breakdown (from metadata)

period_meta
1900-1999 Twentieth-Century                    2437
1700-1749 Early Eighteenth-Century             1023
1800-1834 Early Nineteenth-Century              721
1835-1869 Mid Nineteenth-Century                644
1603-1660 Jacobean and Caroline                 617
1870-1899 Later Nineteenth-Century              603
1750-1799 Later Eighteenth-Century              600
1660-1700 Restoration                           495
1550-1900 Miscellanies and Collections           79
1500-1700 Emblems, Epigrams, Formal Satires      71
1500-1580 Tudor                                  16
1880-1901 Late Victorian                          3
1860-1880 Mid-Victorian                           1
Name: count, dtype: int64



----

#### Historical period breakdown (from author birth year)

period
1850-1900    1000
1650-1700    1000
1800-1850    1000
1900-1950    1000
1700-1750    1000
1600-1650    1000
1750-1800    1000
1950-2000    1000
Name: count, dtype: int64



----

#### Historical period + subcorpus breakdown

                                    count
period    subcorpus                      
1600-1650 American Poetry              43
          English Poetry              957
1650-1700 American Poetry              11
          English Poetry              989
1700-1750 American Poetry              15
          English Poetry              985
1750-1800 African-American Poetry      10
          American Poetry             161
          English Poetry              829
1800-1850 African-American Poetry       5
          American Poetry             304
          English Poetry              691
1850-1900 African-American Poetry      71
          American Poetry             296
          English Poetry              583
          Modern Poetry                23
          The Faber Poetry Library     27
1900-1950 African-American Poetry      36
          American Poetry             599
          English Poetry              200
          Modern Poetry               116
          The Faber Poetry Library

----

#### Author birth year distribution

author_dob
1600 ------- [ 1699   | 1799 |   1899 ] -------- 1974



----

#### Number of lines in poems

num_lines
10 ------- [ 16   | 26 |   42 ] -------- 100



----

#### Annotated rhyme distribution

rhyme
y      5517
n       124
y n       3
Name: count, dtype: int64



----

#### Metadata

Unnamed: 0_level_0,id,period_meta,subcorpus,author,author_dob,title,year,num_lines,volume,line,rhyme,genre,period,txt,data_origin
id_hash,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,english-ed2/ep2438/Z300661875,,English Poetry,"Price, Herbert, b. 1858",1858.0,THE FORSAKEN GARDEN,1888,35,Poems and Sonnets by Herbert Price (1914),"In the garden we loved that is now a waste,",y,,1850-1900,"Ah! sweet were the days, and the nights and th...",in_paper
1,english/pennecu1/Z200459978,1660-1700 Restoration,English Poetry,"Pennecuik, Alexander, 1652-1722",1652.0,THE CITY AND COUNTRY MOUSE.,1682,50,The Works (1815),"&indent;Met with a city mouse, right smooth an...",y,,1650-1700,"A country mouse, upon a winter's day,\n Met...",in_paper
2,english/wattsisa/Z300523040,1750-1799 Later Eighteenth-Century,English Poetry,"Watts, Isaac, 1674-1748",1674.0,SONG 11. Heaven and Hell.,1704,16,The Works (1810),&indent;A heav'n of joy and love;,y,Lyric,1650-1700,There is beyond the sky\n A heaven of joy a...,in_paper
3,english/hardytho/Z200137433,1870-1899 Later Nineteenth-Century,English Poetry,"Hardy, Thomas, 1840-1928",1840.0,WHEN DEAD,1870,16,,&indent;&indent;I am under the bough;,y,,1800-1850,It will be much better when\n I am unde...,in_paper
3,c20-american/da22040/Z300203417,1900-1999 Twentieth-Century,American Poetry,"Walker, Margaret, 1915-1998",1915.0,BALLAD OF THE HOPPY&hyphen;TOAD,1945,84,,Ain't been on Market Street for nothing,,,1900-1950,Ain't been on Market Street for nothing\nWith ...,in_paper
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
135713,c20-american/am20062/Z300213749,1900-1999 Twentieth-Century,American Poetry,"Hinsey, Ellen, 1960-",1960.0,"THE STAIRWELL, BERGGASSE 19, VIENNA",1990,78,,The stairwell bore the weight of their visits—,,,1950-2000,The stairwell boar the weight of their visits ...,in_paper
135765,c20-english/ep20015/Z200594457,1900-1999 Twentieth-Century,English Poetry,"McMillan, Ian, 1956-",1956.0,Poem Badly Translated from the Language,1986,20,,Tell me why you have died and when,,,1950-2000,Tell me why you have died and when\nin not mor...,in_paper
135796,c20-english/maxwell/Z200610338,1900-1999 Twentieth-Century,English Poetry,"Maxwell, Glyn, 1962-",1962.0,The Stakes,1992,19,,Forget that in the three&hyphen;fifteen,,,1950-2000,Forget that in the three-fifteen\nMy love was ...,in_paper
135926,c20-english/ep20015/Z200594449,1900-1999 Twentieth-Century,English Poetry,"McMillan, Ian, 1956-",1956.0,Elegy for an Hour of Daylight,1986,27,,The tilt of the Earth is beautiful;,,,1950-2000,The tilt of the Earth is beautiful;\nthe day i...,in_paper


In [None]:
printm(f'### Replicating period sample')
df_smpl_by_period_replicated = get_chadwyck_corpus_sampled_by_period_as_replicated()
describe_corpus(df_smpl_by_period_replicated)

### Replicating period sample

#### Getting sampled corpus by period

* Loading data as replicated: /Users/rj416/github/generative-formalism/data/corpus_sample_by_period.data_as_replicated.csv.gz


----

#### Subcorpus breakdown

subcorpus
English Poetry              5573
American Poetry             1746
Modern Poetry                308
African-American Poetry      228
The Faber Poetry Library     145
Name: count, dtype: int64



----

#### Historical period breakdown (from metadata)

period_meta
1900-1999 Twentieth-Century                    2434
1700-1749 Early Eighteenth-Century              996
1800-1834 Early Nineteenth-Century              718
1835-1869 Mid Nineteenth-Century                654
1603-1660 Jacobean and Caroline                 634
1750-1799 Later Eighteenth-Century              611
1870-1899 Later Nineteenth-Century              609
1660-1700 Restoration                           528
1550-1900 Miscellanies and Collections           67
1500-1700 Emblems, Epigrams, Formal Satires      52
1500-1580 Tudor                                  11
1837-1860 Early Victorian                         2
1860-1880 Mid-Victorian                           2
1880-1901 Late Victorian                          2
1500-1700 Songbooks                               1
Name: count, dtype: int64



----

#### Historical period breakdown (from author birth year)

period
1800-1850    1000
1900-1950    1000
1950-2000    1000
1850-1900    1000
1700-1750    1000
1750-1800    1000
1650-1700    1000
1600-1650    1000
Name: count, dtype: int64



----

#### Historical period + subcorpus breakdown

                                    count
period    subcorpus                      
1600-1650 American Poetry              39
          English Poetry              961
1650-1700 American Poetry               5
          English Poetry              995
1700-1750 American Poetry              26
          English Poetry              974
1750-1800 African-American Poetry      13
          American Poetry             160
          English Poetry              827
1800-1850 African-American Poetry       8
          American Poetry             330
          English Poetry              662
1850-1900 African-American Poetry      53
          American Poetry             295
          English Poetry              604
          Modern Poetry                28
          The Faber Poetry Library     20
1900-1950 African-American Poetry      41
          American Poetry             592
          English Poetry              202
          Modern Poetry               117
          The Faber Poetry Library

----

#### Author birth year distribution

author_dob
1600 ------- [ 1699   | 1799 |   1899 ] -------- 1974



----

#### Number of lines in poems

num_lines
10 ------- [ 16   | 26 |   41 ] -------- 100



----

#### Annotated rhyme distribution

rhyme
y      5529
n       118
y n       9
Name: count, dtype: int64



----

#### Metadata

Unnamed: 0_level_0,id,period_meta,subcorpus,author,author_dob,title,year,num_lines,volume,line,rhyme,genre,period,txt,data_origin
id_hash,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
11,american/am1208/Z200193522,1870-1899 Later Nineteenth-Century,American Poetry,"Stoddard, Richard Henry, 1825-1903",1825.0,THE FALCON.,1855,12,Songs of summer (1857),&indent;I pine with a fancied wrong;,y,,1800-1850,"In-doors in a summer day, Like this,\n I pi...",replicated
16,c20-american/am20034/Z300211527,1900-1999 Twentieth-Century,American Poetry,"Bensko, John, 1949-",1949.0,Troop Train,1979,34,,The survivors for the front regret their wounds,,,1900-1950,The survivors for the front regret their wound...,replicated
19,c20-american/am23006/Z300250508,1900-1999 Twentieth-Century,American Poetry,"Burkard, Michael, 1947-",1947.0,Zane Grey,1977,41,,I have no desire to be this green wood.,,,1900-1950,I have no desire to be this green wood.\nI hav...,replicated
23,english/davieswi/Z200335516,1835-1869 Mid Nineteenth-Century,English Poetry,"Davies, William, 1830-1896",1830.0,LIPS AND ROSES.,1860,18,The Shepherd's Garden (1873),"&indent;The joys with Time foregone,",y,,1800-1850,Red roses of whose sweets were made\n The j...,replicated
25,english-ed2/ep2358/Z200655703,,English Poetry,"Wilson, Anne, Lady, 1848-1930",1848.0,PENSEES.,1878,19,Themes and Variations: by Mrs James Glenny Wil...,"With wear, and fret, and toil of many hands,",y,,1800-1850,"Columbus, wandering by the Iberian shore,\n ...",replicated
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
133010,c20-american/am23056/Z300255204,1900-1999 Twentieth-Century,American Poetry,"Howe, Marie, 1950-",1950.0,Without Music,1980,14,,Only the car radio,,,1950-2000,Only the car radio\ndriving from the drugstore...,replicated
133316,c20-english/ep32001/Z200305144,1900-1999 Twentieth-Century,English Poetry,"Wilkins, Paul, 1951-",1951.0,It,1981,63,,"Seeing them gripped close,",,,1950-2000,"Seeing them gripped close,\none with a leg til...",replicated
133422,c20-english/armitage/Z200608786,1900-1999 Twentieth-Century,English Poetry,"Armitage, Simon, 1963-",1963.0,Bylot Island,1993,48,,Arrived midday and it felt like heaven. Just t...,,Lyric,1950-2000,Arrived midday and it felt Like heaven. Just t...,replicated
133615,c20-american/am22067/Z300234813,1900-1999 Twentieth-Century,American Poetry,"Shumaker, Peggy, 1952-",1952.0,Ticking,1982,32,,"My doctor, Clarice, says if I want",,,1950-2000,"My doctor, Clarice, says if I want\nto have ch...",replicated


In [8]:
printm(f'### Loading rhyme sample in paper')
df_smpl_by_rhyme_in_paper = get_chadwyck_corpus_sampled_by_rhyme_as_in_paper()
df_smpl_by_rhyme_in_paper

### Loading rhyme sample in paper

#### Getting sampled corpus by rhyme

* Loading data as in paper: /Users/rj416/github/generative-formalism/data/corpus_sample_by_rhyme.data_as_in_paper.csv.gz


Unnamed: 0_level_0,prompt_type,prompt,model,temperature,txt,num_lines,data_origin
id_hash,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
327,MAYBE_rhyme,Write a poem (with 20+ lines).,claude-3-haiku-20240307,0.663096,"Amidst the gentle breeze, a symphony unfolds,\...",20,in_paper
358,MAYBE_rhyme,Write a poem (with 20+ lines).,ollama/olmo2:13b,0.700000,"In the gentle whisper of the wind's caress,\nB...",44,in_paper
450,DO_rhyme,Write a poem in the style of Emily Dickinson.,claude-3-sonnet-20240229,0.242998,The carriage held but just ourselves - \nImmo...,16,in_paper
487,do_NOT_rhyme,Write a poem in free verse.,ollama/olmo2:13b,0.700000,"In the stillness of night, stars whisper secre...",18,in_paper
491,DO_rhyme,Write an ryhmed poem in the style of Shakespea...,gpt-3.5-turbo,0.617467,"Oh fair maiden, with eyes so bright and clear,...",14,in_paper
...,...,...,...,...,...,...,...
999746,do_NOT_rhyme,Write a poem in blank verse.,claude-3-opus-20240229,0.789878,"In solitude, I wander through the woods,\nMy t...",20,in_paper
999780,do_NOT_rhyme,Write a poem in the style of Walt Whitman.,gpt-3.5-turbo,0.639342,O Captain! my Captain! our fearful trip is don...,24,in_paper
999864,DO_rhyme,Write a poem (with 20+ lines) that rhymes.,ollama/llama3.1:8b,0.700000,"In twilight's hush, where shadows play,\nThe s...",24,in_paper
999961,do_NOT_rhyme,Write a poem in free verse.,gemini-pro,0.900051,"Without chains or binds,\nMy thoughts flow lik...",12,in_paper


In [9]:
printm(f'### Replicating period sample')
df_smpl_by_period_replicated = get_chadwyck_corpus_sampled_by_period_as_replicated()
describe_corpus(df_smpl_by_period_replicated)

### Replicating period sample

#### Getting sampled corpus by period

* Loading data as replicated: /Users/rj416/github/generative-formalism/data/corpus_sample_by_period.data_as_replicated.csv.gz


----

#### Subcorpus breakdown

subcorpus
English Poetry              5573
American Poetry             1746
Modern Poetry                308
African-American Poetry      228
The Faber Poetry Library     145
Name: count, dtype: int64



----

#### Historical period breakdown (from metadata)

period_meta
1900-1999 Twentieth-Century                    2434
1700-1749 Early Eighteenth-Century              996
1800-1834 Early Nineteenth-Century              718
1835-1869 Mid Nineteenth-Century                654
1603-1660 Jacobean and Caroline                 634
1750-1799 Later Eighteenth-Century              611
1870-1899 Later Nineteenth-Century              609
1660-1700 Restoration                           528
1550-1900 Miscellanies and Collections           67
1500-1700 Emblems, Epigrams, Formal Satires      52
1500-1580 Tudor                                  11
1837-1860 Early Victorian                         2
1860-1880 Mid-Victorian                           2
1880-1901 Late Victorian                          2
1500-1700 Songbooks                               1
Name: count, dtype: int64



----

#### Historical period breakdown (from author birth year)

period
1800-1850    1000
1900-1950    1000
1950-2000    1000
1850-1900    1000
1700-1750    1000
1750-1800    1000
1650-1700    1000
1600-1650    1000
Name: count, dtype: int64



----

#### Historical period + subcorpus breakdown

                                    count
period    subcorpus                      
1600-1650 American Poetry              39
          English Poetry              961
1650-1700 American Poetry               5
          English Poetry              995
1700-1750 American Poetry              26
          English Poetry              974
1750-1800 African-American Poetry      13
          American Poetry             160
          English Poetry              827
1800-1850 African-American Poetry       8
          American Poetry             330
          English Poetry              662
1850-1900 African-American Poetry      53
          American Poetry             295
          English Poetry              604
          Modern Poetry                28
          The Faber Poetry Library     20
1900-1950 African-American Poetry      41
          American Poetry             592
          English Poetry              202
          Modern Poetry               117
          The Faber Poetry Library

----

#### Author birth year distribution

author_dob
1600 ------- [ 1699   | 1799 |   1899 ] -------- 1974



----

#### Number of lines in poems

num_lines
10 ------- [ 16   | 26 |   41 ] -------- 100



----

#### Annotated rhyme distribution

rhyme
y      5529
n       118
y n       9
Name: count, dtype: int64



----

#### Metadata

Unnamed: 0_level_0,id,period_meta,subcorpus,author,author_dob,title,year,num_lines,volume,line,rhyme,genre,period,txt,data_origin
id_hash,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
11,american/am1208/Z200193522,1870-1899 Later Nineteenth-Century,American Poetry,"Stoddard, Richard Henry, 1825-1903",1825.0,THE FALCON.,1855,12,Songs of summer (1857),&indent;I pine with a fancied wrong;,y,,1800-1850,"In-doors in a summer day, Like this,\n I pi...",replicated
16,c20-american/am20034/Z300211527,1900-1999 Twentieth-Century,American Poetry,"Bensko, John, 1949-",1949.0,Troop Train,1979,34,,The survivors for the front regret their wounds,,,1900-1950,The survivors for the front regret their wound...,replicated
19,c20-american/am23006/Z300250508,1900-1999 Twentieth-Century,American Poetry,"Burkard, Michael, 1947-",1947.0,Zane Grey,1977,41,,I have no desire to be this green wood.,,,1900-1950,I have no desire to be this green wood.\nI hav...,replicated
23,english/davieswi/Z200335516,1835-1869 Mid Nineteenth-Century,English Poetry,"Davies, William, 1830-1896",1830.0,LIPS AND ROSES.,1860,18,The Shepherd's Garden (1873),"&indent;The joys with Time foregone,",y,,1800-1850,Red roses of whose sweets were made\n The j...,replicated
25,english-ed2/ep2358/Z200655703,,English Poetry,"Wilson, Anne, Lady, 1848-1930",1848.0,PENSEES.,1878,19,Themes and Variations: by Mrs James Glenny Wil...,"With wear, and fret, and toil of many hands,",y,,1800-1850,"Columbus, wandering by the Iberian shore,\n ...",replicated
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
133010,c20-american/am23056/Z300255204,1900-1999 Twentieth-Century,American Poetry,"Howe, Marie, 1950-",1950.0,Without Music,1980,14,,Only the car radio,,,1950-2000,Only the car radio\ndriving from the drugstore...,replicated
133316,c20-english/ep32001/Z200305144,1900-1999 Twentieth-Century,English Poetry,"Wilkins, Paul, 1951-",1951.0,It,1981,63,,"Seeing them gripped close,",,,1950-2000,"Seeing them gripped close,\none with a leg til...",replicated
133422,c20-english/armitage/Z200608786,1900-1999 Twentieth-Century,English Poetry,"Armitage, Simon, 1963-",1963.0,Bylot Island,1993,48,,Arrived midday and it felt like heaven. Just t...,,Lyric,1950-2000,Arrived midday and it felt Like heaven. Just t...,replicated
133615,c20-american/am22067/Z300234813,1900-1999 Twentieth-Century,American Poetry,"Shumaker, Peggy, 1952-",1952.0,Ticking,1982,32,,"My doctor, Clarice, says if I want",,,1950-2000,"My doctor, Clarice, says if I want\nto have ch...",replicated
