# Sampling Chadwyck-Healey poetry collections

In [1]:
import sys
sys.path.append('../')
from generative_formalism import *

## Data as in paper

In [2]:
documentation(get_chadwyck_corpus_sampled_by)

##### `get_chadwyck_corpus_sampled_by`

```md
Load or generate a sampled corpus by the specified criteria.

    Loads a precomputed sampled corpus from disk if available, otherwise
    generates a new sample. Handles different sample types including period,
    period×subcorpus, rhyme, and sonnet-based sampling.

    Parameters
    ----------
    sample_by : str
        Sampling criteria ('period', 'period_subcorpus', 'rhyme', 'sonnet_period').
    as_in_paper : bool, default=True
        If True, load precomputed sample from paper.
    as_replicated : bool, default=False
        If True, load/generate replicated sample.
    display : bool, default=False
        If True, display summary statistics for the sample.
    verbose : bool, default=False
        If True, print progress information.
    **kwargs
        Additional arguments passed to generation/display functions.

    Returns
    -------
    pd.DataFrame
        DataFrame containing the sampled corpus.

    Calls
    -----
    - get_path(data_name, as_in_paper=True, as_replicated=False)
    - get_chadwyck_corpus_sampled_by_replicated(...) [if as_replicated is True]
    - pd.read_csv(path).fillna("").set_index("id").sort_values("id_hash") [if loading precomputed sample]
    - describe_qual_grouped(odf, groupby=gby, sort_index=True, count=False, name=sample_by) [if display=True]
    
```
----


### Sampled by period

In [3]:
# Docs
documentation(get_chadwyck_corpus_sampled_by)

# Run
df_smpl_by_period_in_paper = get_chadwyck_corpus_sampled_by(
    'period', 
    as_in_paper=True, 
    verbose=True, 
    display=True
)

# Test
assert len(df_smpl_by_period_in_paper) == 8000

# Display
df_smpl_by_period_in_paper

##### `get_chadwyck_corpus_sampled_by`

```md
Load or generate a sampled corpus by the specified criteria.

    Loads a precomputed sampled corpus from disk if available, otherwise
    generates a new sample. Handles different sample types including period,
    period×subcorpus, rhyme, and sonnet-based sampling.

    Parameters
    ----------
    sample_by : str
        Sampling criteria ('period', 'period_subcorpus', 'rhyme', 'sonnet_period').
    as_in_paper : bool, default=True
        If True, load precomputed sample from paper.
    as_replicated : bool, default=False
        If True, load/generate replicated sample.
    display : bool, default=False
        If True, display summary statistics for the sample.
    verbose : bool, default=False
        If True, print progress information.
    **kwargs
        Additional arguments passed to generation/display functions.

    Returns
    -------
    pd.DataFrame
        DataFrame containing the sampled corpus.

    Calls
    -----
    - get_path(data_name, as_in_paper=True, as_replicated=False)
    - get_chadwyck_corpus_sampled_by_replicated(...) [if as_replicated is True]
    - pd.read_csv(path).fillna("").set_index("id").sort_values("id_hash") [if loading precomputed sample]
    - describe_qual_grouped(odf, groupby=gby, sort_index=True, count=False, name=sample_by) [if display=True]
    
```
----


* Breakdown for period
           count
period          
1600-1650   1000
1650-1700   1000
1700-1750   1000
1750-1800   1000
1800-1850   1000
1850-1900   1000
1900-1950   1000
1950-2000   1000



Unnamed: 0_level_0,id_hash,period_meta,subcorpus,author,author_dob,title,year,num_lines,volume,line,rhyme,genre,period,txt
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
english-ed2/ep2438/Z300661875,1,,English Poetry,"Price, Herbert, b. 1858",1858.0,THE FORSAKEN GARDEN,1888,35,Poems and Sonnets by Herbert Price (1914),"In the garden we loved that is now a waste,",y,,1850-1900,"Ah! sweet were the days, and the nights and th..."
english/pennecu1/Z200459978,1,1660-1700 Restoration,English Poetry,"Pennecuik, Alexander, 1652-1722",1652.0,THE CITY AND COUNTRY MOUSE.,1682,50,The Works (1815),"&indent;Met with a city mouse, right smooth an...",y,,1650-1700,"A country mouse, upon a winter's day,\n Met..."
english/wattsisa/Z300523040,2,1750-1799 Later Eighteenth-Century,English Poetry,"Watts, Isaac, 1674-1748",1674.0,SONG 11. Heaven and Hell.,1704,16,The Works (1810),&indent;A heav'n of joy and love;,y,Lyric,1650-1700,There is beyond the sky\n A heaven of joy a...
english/hardytho/Z200137433,3,1870-1899 Later Nineteenth-Century,English Poetry,"Hardy, Thomas, 1840-1928",1840.0,WHEN DEAD,1870,16,,&indent;&indent;I am under the bough;,y,,1800-1850,It will be much better when\n I am unde...
c20-american/da22040/Z300203417,3,1900-1999 Twentieth-Century,American Poetry,"Walker, Margaret, 1915-1998",1915.0,BALLAD OF THE HOPPY&hyphen;TOAD,1945,84,,Ain't been on Market Street for nothing,,,1900-1950,Ain't been on Market Street for nothing\nWith ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
c20-american/am20062/Z300213749,135713,1900-1999 Twentieth-Century,American Poetry,"Hinsey, Ellen, 1960-",1960.0,"THE STAIRWELL, BERGGASSE 19, VIENNA",1990,78,,The stairwell bore the weight of their visits—,,,1950-2000,The stairwell boar the weight of their visits ...
c20-english/ep20015/Z200594457,135765,1900-1999 Twentieth-Century,English Poetry,"McMillan, Ian, 1956-",1956.0,Poem Badly Translated from the Language,1986,20,,Tell me why you have died and when,,,1950-2000,Tell me why you have died and when\nin not mor...
c20-english/maxwell/Z200610338,135796,1900-1999 Twentieth-Century,English Poetry,"Maxwell, Glyn, 1962-",1962.0,The Stakes,1992,19,,Forget that in the three&hyphen;fifteen,,,1950-2000,Forget that in the three-fifteen\nMy love was ...
c20-english/ep20015/Z200594449,135926,1900-1999 Twentieth-Century,English Poetry,"McMillan, Ian, 1956-",1956.0,Elegy for an Hour of Daylight,1986,27,,The tilt of the Earth is beautiful;,,,1950-2000,The tilt of the Earth is beautiful;\nthe day i...


### Sampled by rhyme

In [4]:
# Run
df_smpl_by_rhyme_in_paper = get_chadwyck_corpus_sampled_by(
    'rhyme', 
    as_in_paper=True, 
    verbose=True, 
    display=True
)

# Test
assert len(df_smpl_by_rhyme_in_paper) == 2000

# Display
df_smpl_by_rhyme_in_paper

* Breakdown for rhyme
       count
rhyme       
n       1000
y       1000



Unnamed: 0_level_0,id_hash,period_meta,subcorpus,author,author_dob,title,year,num_lines,volume,line,rhyme,genre,period,txt
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
english-ed2/ep2438/Z300661875,1,,English Poetry,"Price, Herbert, b. 1858",1858.0,THE FORSAKEN GARDEN,1888,35,Poems and Sonnets by Herbert Price (1914),"In the garden we loved that is now a waste,",y,,1850-1900,"Ah! sweet were the days, and the nights and th..."
english/pennecu1/Z200459978,1,1660-1700 Restoration,English Poetry,"Pennecuik, Alexander, 1652-1722",1652.0,THE CITY AND COUNTRY MOUSE.,1682,50,The Works (1815),"&indent;Met with a city mouse, right smooth an...",y,,1650-1700,"A country mouse, upon a winter's day,\n Met..."
english/wattsisa/Z300523040,2,1750-1799 Later Eighteenth-Century,English Poetry,"Watts, Isaac, 1674-1748",1674.0,SONG 11. Heaven and Hell.,1704,16,The Works (1810),&indent;A heav'n of joy and love;,y,Lyric,1650-1700,There is beyond the sky\n A heaven of joy a...
english/hardytho/Z200137433,3,1870-1899 Later Nineteenth-Century,English Poetry,"Hardy, Thomas, 1840-1928",1840.0,WHEN DEAD,1870,16,,&indent;&indent;I am under the bough;,y,,1800-1850,It will be much better when\n I am unde...
english/fawkesfr/Z300372956,4,1750-1799 Later Eighteenth-Century,English Poetry,"Fawkes, Francis, 1720-1777",1720.0,"III. ON A WORTHY FRIEND, Who was accomplished...",1750,10,Original Poems and Translations (1761),"Thou friendly, candid, virtuous mind, farewel!",y,,1700-1750,"Oh born in liberal studies to excel,\nThou fri..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
english-ed2/ep2525/Z200668962,219124,,English Poetry,"Armstrong, Edmund John, 1841-1865",1841.0,WOMAN'S SORROW.,1871,14,The Poetical Works of Edmund J. Armstrong. Edi...,"Tost by a tempest, and ere long in calm",n,,1800-1850,The sorrow of a man is Like the sea\nTost by a...
english-ed2/ep2316/Z200654162,219130,,English Poetry,"Collins, Mortimer, 1827-1876",1827.0,A CAVALIER BALLAD.,1857,37,Idyls and Rhymes. By Mortimer Collins (1855),"Who is gone, in his glory and his sorrow, to the",n,,1800-1850,O alas and alas for the King we could not save...
english/colersam/Z300317124,219174,1800-1834 Early Nineteenth-Century,English Poetry,"Coleridge, Samuel Taylor, 1772-1834",1772.0,TO THE REV. GEORGE COLERIDGE OF OTTERY ST. MA...,1802,77,The Complete Poetical Works (1912),Notus in fratres animi paterni.,n,,1750-1800,"A blessed lot hath he, who having passed\nHis ..."
american/am1066/Z200187826,219178,1835-1869 Mid Nineteenth-Century,American Poetry,"Whitman, Walt, 1819-1892",1819.0,KOSMOS.,1849,10,Leaves of grass (1860–61),"Who is the amplitude of the earth, and the coa...",n,,1800-1850,"Who includes diversity, and is Nature,\nWho is..."


### Sampled by period/subcorpus

In [5]:
# Run
df_smpl_by_period_subcorpus_in_paper = get_chadwyck_corpus_sampled_by(
    'period_subcorpus',
    as_in_paper=True, 
    display=True, 
    verbose=True
)

# Test
assert len(df_smpl_by_period_subcorpus_in_paper) > 20_000

# Display
df_smpl_by_period_subcorpus_in_paper

* Breakdown for period_subcorpus
                                    count
period    subcorpus                      
1600-1650 American Poetry             361
          English Poetry             1000
1650-1700 American Poetry              74
          English Poetry             1000
1700-1750 African-American Poetry       3
...                                   ...
1950-2000 African-American Poetry     820
          American Poetry            1000
          English Poetry             1000
          Modern Poetry              1000
          The Faber Poetry Library    616

[32 rows x 1 columns]



Unnamed: 0_level_0,id_hash,period_meta,subcorpus,author,author_dob,title,year,num_lines,volume,line,rhyme,genre,period,txt
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
c20-english/ep20152/Z200586158,2,1900-1999 Twentieth-Century,English Poetry,"Rosenberg, Isaac, 1890-1918",1890.0,‘I KNOW YOU GOLDEN’,1920,12,,I know you golden,,,1850-1900,I know you golden\nAs summer and pale\nAs the ...
english/kerpeter/Z300410015,3,1660-1700 Restoration,English Poetry,"Ker, Patrick, fl. 1691",1691.0,On the Memory of a Married Maid.,1721,16,Flosculum Poeticum (1684),A Marrie'd&hyphen;Virgin to remain.,y,,1650-1700,"Within this Coffin here does lie,\nA Pattern o..."
american/am1258/Z200196105,7,1835-1869 Mid Nineteenth-Century,American Poetry,"Emerson, Ralph Waldo, 1803-1882",1803.0,SEPTEMBER,1833,16,Poems [1904],"&indent;Of a gusty Autumn day,",y,,1800-1850,In the turbulent beauty\n Of a gusty Autumn...
english/gilfilla/Z400379001,8,1800-1834 Early Nineteenth-Century,English Poetry,"Gilfillan, Robert, 1798-1850",1798.0,NORWEGIAN SMUGGLER'S SONG.,1828,36,Poems and Songs (1851),"&indent;The storm is loud and high,",y,,1750-1800,"Awake, you midnight mariners!\n The storm i..."
english/wattwill/Z300523577,18,1800-1834 Early Nineteenth-Century,English Poetry,"Watt, William, 1793-1859",1793.0,BAB AT THE BOWSTER.,1823,40,Poems and Songs (1860),Wi' touslet hair and drowsy een?,y,Ballad,1750-1800,"Lassie, whare were you yestreen,\nWi' touslet ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
faber/fa0401/Z300557474,999109,1900-1999 Twentieth-Century,The Faber Poetry Library,"Boyle, Charles, 1951-",1951.0,(i) Underground,1981,18,,A woman sleeping on the underground:,,,1950-2000,"A woman sleeping on the underground:\nneat, As..."
c20-african-american/da20076/Z300330273,999377,1900-1999 Twentieth-Century,African-American Poetry,"Weaver, Michael S., 1951-",1951.0,Duke Ellington and His Mistress Make Love,1981,34,,I draw the sheets written with life,,,1950-2000,I draw the sheets written with life\naround me...
african-american/hortonge/Z200399812,999379,1835-1869 Mid Nineteenth-Century,African-American Poetry,"Horton, George Moses, 1798?-ca.1880",1798.0,THE POWERS OF LOVE.,1828,35,Naked Genius (1865),It lifts the poor man from his cell,y,,1750-1800,It lifts the poor man from his cell\n To fo...
c20-african-american/da22011/Z300262800,999421,1900-1999 Twentieth-Century,African-American Poetry,"Jackson, Angela, 1951-",1951.0,"george, after all, means farmer",1981,39,,he carried a tomato plant &,,,1950-2000,he carried a tomato plant &\nwatermelon\nacros...


### Sampled by sonnet/period

In [6]:
# Run
df_smpl_by_sonnet_period_in_paper = get_chadwyck_corpus_sampled_by('sonnet_period', as_in_paper=True, display=True, verbose=True)


# Display
df_smpl_by_sonnet_period_in_paper

* Breakdown for sonnet_period
           count
period          
1600-1650    152
1650-1700     65
1700-1750    154
1750-1800    154
1800-1850    154
1850-1900    154
1900-1950    154
1950-2000     12



Unnamed: 0_level_0,id_hash,period_meta,subcorpus,author,author_dob,title,year,num_lines,volume,line,rhyme,genre,period,txt
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
c20-english/abarnett/Z300683485,1592,,English Poetry,"Barnett, Anthony, 1941-",1941.0,"SONNET II, 217 (239)",1971,14,,"The day dawned, and held",,,1900-1950,"The day dawned, and held\nand o, joy!\nappeare..."
english/smithcha/Z300489005,2155,1750-1799 Later Eighteenth-Century,English Poetry,"Smith, Charlotte Turner, 1749-1806",1749.0,SONNET LXXXII. TO THE SHADE OF BURNS.,1779,14,Elegiac sonnets (1797–1800),"&indent;Who, amid Scotia's mountain solitude,",y,Sonnet,1700-1750,"Mute is thy wild harp, now, O Bard sublime!\n ..."
english/keatsjoh/Z200408021,3225,1800-1834 Early Nineteenth-Century,English Poetry,"Keats, John, 1795-1821",1795.0,SONNET THE HUMAN SEASONS,1825,14,The Poetical Works (1906),&indent;There are four seasons in the mind of ...,y,Sonnet,1750-1800,Four seasons fill the measure of the year;\n ...
english/langhorn/Z200413651,4312,1750-1799 Later Eighteenth-Century,English Poetry,"Langhorne, John, 1735-1779",1735.0,SONNET IN THE MANNER OF PETRARCH.,1765,14,The Poetical Works (1804),"&indent;The sweetest twins that ever Nature bore,",y,Sonnet,1700-1750,"On thy fair morn, O hope-inspiring May!\n T..."
english/tupperma/Z200513878,5042,1835-1869 Mid Nineteenth-Century,English Poetry,"Tupper, Martin Farquhar, 1810-1889",1810.0,AN ASPIRATION.,1840,14,Three Hundred Sonnets (1860),O that I had a pastor near my home,y,Sonnet,1800-1850,O that I had a pastor near my home\n Ho...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
english/ayresphi/Z300265586,996139,1660-1700 Restoration,English Poetry,"Ayres, Philip, 1638-1712",1638.0,A Sonnet. On Signor Pietro Reggio his setting ...,1668,14,Lyric Poems (1687),&indent;Whilst its great Deeds he does in Odes...,y,Sonnet,1600-1650,"If Theban Pindar raised his Country's Fame,\n ..."
english/tupperma/Z200513990,997117,1835-1869 Mid Nineteenth-Century,English Poetry,"Tupper, Martin Farquhar, 1810-1889",1810.0,ROMISH PRIESTCRAFT.—1851.,1840,14,Three Hundred Sonnets (1860),"What! after all our charitable pains,",y,Sonnet,1800-1850,"What! after all our charitable pains,\n And..."
english/rawnsley/Z200472593,997435,1870-1899 Later Nineteenth-Century,English Poetry,"Rawnsley, H. D. (Hardwicke Drummond), 1851-1920",1851.0,GOING TO NETTLESHIP'S GRAVE FROM ARGENTIÈRE T...,1881,14,Sonnets in Switzerland and Italy (1899),&indent;&indent;Than this loud stream with wat...,y,Sonnet,1850-1900,"I needs no surer, sympathetic guide,\n ..."
english/stuarthy/Z300500668,997488,1835-1869 Mid Nineteenth-Century,English Poetry,"Stuart-Wortley, Emmeline, Lady, 1806-1855",1806.0,"SONNET. [Beautiful Spring, thy young ambrosia...",1836,14,"Queen Berengaria's Courtesy, and Other Poems (...","&indent;Now dwells caressingly upon the air,",y,,1800-1850,"Beautiful Spring, thy young ambrosial breath\n..."


## Replicating new samples

Must have access to Chadwyck-Healey corpora.

In [7]:
documentation(get_chadwyck_corpus_sampled_by_replicated)
documentation(gen_chadwyck_corpus_sampled_by)
documentation(sample_chadwyck_corpus)

##### `get_chadwyck_corpus_sampled_by_replicated`

```md
Load or generate a stratified sample with disk caching.

    Loads a pre-generated stratified sample from disk if available, otherwise
    generates a new sample and caches it. This ensures efficient reuse of
    expensive sampling operations.

    Parameters
    ----------
    sample_by : str
        Sampling criteria ('rhyme', 'period', 'period_subcorpus', 'sonnet_period').
    force : bool, default=False
        If True, regenerate the sample even if a cached version exists.
    display : bool, default=False
        If True, display summary tables for certain sample types.
    verbose : bool, default=False
        If True, print progress information.
    as_in_paper : bool, default=True
        If True, use precomputed sample from paper.
    as_replicated : bool, default=False
        If True, use replicated sample.
    **kwargs: Additional arguments passed to gen_chadwyck_corpus_sampled_by

    Returns
    -------
    pd.DataFrame
        DataFrame containing the stratified sample.

    Calls
    -----
    - gen_chadwyck_corpus_sampled_by(sample_by, display=display) [if generating new sample]
    - save_sample(odf, path, overwrite=True) [if saving generated sample]
    - pd.read_csv(path).set_index('id').sort_values('id_hash') [if loading cached sample]
    - get_period_subcorpus_table(odf, return_display=True) [if display=True for period_subcorpus]
    - display(img) [if display=True and IPython available]
    
```
----


##### `gen_chadwyck_corpus_sampled_by`

```md
Generate a stratified sample from the full Chadwyck-Healey corpus.

    Creates a balanced sample of poems using the specified stratification criteria.
    Handles different sampling types including rhyme, period, period×subcorpus,
    and sonnet-based sampling.

    Parameters
    ----------
    sample_by : str
        Sampling criteria ('rhyme', 'period', 'period_subcorpus', 'sonnet_period').
    display : bool, default=False
        If True, display summary tables for certain sample types (e.g., period).
    **kwargs
        Additional arguments passed to sample_chadwyck_corpus.

    Returns
    -------
    pd.DataFrame
        DataFrame containing the stratified sample with balanced representation.

    Calls
    -----
    - get_chadwyck_corpus() [to load the full corpus]
    - sample_chadwyck_corpus(df_corpus, sample_by=sample_by, **kwargs) [to create stratified sample]
    - get_period_subcorpus_table(df, return_display=True) [if display=True for period samples]
    - display(img) [if display=True and IPython available]
    
```
----


##### `sample_chadwyck_corpus`

```md
Deterministically sample the corpus by one or more grouping criteria.

    Creates a balanced sample from the corpus by grouping on specified criteria,
    filtering groups by size constraints, and taking deterministic subsets within
    each group. Uses id_hash sorting to ensure reproducible results across runs.

    Parameters
    ----------
    df_corpus : pd.DataFrame
        Corpus DataFrame to sample from (e.g., from get_chadwyck_corpus()).
        Must contain the columns specified in sample_by plus 'id_hash'.
    sample_by : str or list[str]
        Column name(s) to group by for stratified sampling.
    min_sample_n : int, default=MIN_SAMPLE_N
        Minimum number of items required in a group to be included.
    max_sample_n : int, default=MAX_SAMPLE_N
        Maximum number of items to take from each group.
    prefer_min_id_hash : bool, default=False
        If True, prefer items with smaller id_hash values when sampling.
    sort_id_hash : bool, default=True
        If True, sort the sample by id_hash.
    verbose : bool, default=False
        If True, print progress information.


    Returns
    -------
    pd.DataFrame
        Sampled DataFrame containing the selected rows from df_corpus.

    Calls
    -----
    - describe_qual(s, count=False, name="/".join(sample_by)) [to display group size distribution]
    
```
----


In [8]:
## By period
df_replicated_smpl_by_period = get_chadwyck_corpus_sampled_by('period', as_replicated=True, display=True, verbose=True, force=True)

* Generating period sample
* Loading Chadwyck-Healey corpus (metadata + txt)


  : 100%|██████████| 204514/204514 [00:28<00:00, 7281.91it/s]


* Saved sample to /Users/ryan/github/generative-formalism/data/data_as_replicated/corpus_sample_by_period.csv.gz
* Breakdown for period
           count
period          
1600-1650   1000
1650-1700   1000
1700-1750   1000
1750-1800   1000
1800-1850   1000
1850-1900   1000
1900-1950   1000
1950-2000   1000



In [9]:
## By period
df_replicated_smpl_by_period = get_chadwyck_corpus_sampled_by('period', as_replicated=True, display=True, verbose=True, force=True)

# By rhyme
df_smpl_by_rhyme_replicated = get_chadwyck_corpus_sampled_by('rhyme', as_replicated=True, display=True, verbose=True, force=True)

# By period/subcorpus
df_smpl_by_period_subcorpus_replicated = get_chadwyck_corpus_sampled_by('period_subcorpus', as_replicated=True, display=True, verbose=True)

# By sonnet/period
df_smpl_by_sonnet_period_replicated = get_chadwyck_corpus_sampled_by('sonnet_period', as_replicated=True, as_in_paper=False, display=True, verbose=True, force=True)

* Generating period sample
* Loading Chadwyck-Healey corpus (metadata + txt)
* Loading corpus from memory
* Saved sample to /Users/ryan/github/generative-formalism/data/data_as_replicated/corpus_sample_by_period.csv.gz
* Breakdown for period
           count
period          
1600-1650   1000
1650-1700   1000
1700-1750   1000
1750-1800   1000
1800-1850   1000
1850-1900   1000
1900-1950   1000
1950-2000   1000

* Generating rhyme sample
* Loading Chadwyck-Healey corpus (metadata + txt)
* Loading corpus from memory
* Saved sample to /Users/ryan/github/generative-formalism/data/data_as_replicated/corpus_sample_by_rhyme.csv.gz
* Breakdown for rhyme
       count
rhyme       
n       1000
y       1000

* Loading period_subcorpus sample from /Users/ryan/github/generative-formalism/data/data_as_replicated/corpus_sample_by_period_subcorpus.csv.gz
* Breakdown for period_subcorpus
                                    count
period    subcorpus                      
1600-1650 American Poetry         