# Collecting rhyme prompting data

In [1]:
import sys
sys.path.append('../')
from generative_formalism import *

## Loading data

### Pre/post-processing

In [2]:
documentation(preprocess_rhyme_promptings)
documentation(postprocess_rhyme_promptings)

**`preprocess_rhyme_promptings`**

```md
Preprocess rhyme promptings data.

    This function preprocesses rhyme promptings data from legacy pickle and JSON files,
    combines them, and saves to CSV format.

    Args:
        overwrite (bool, optional): Whether to overwrite existing processed data.
            Defaults to False.
        save_to (str, optional): Path to save the processed data.
            Defaults to PATH_GENAI_PROMPTS_IN_PAPER.

    Returns:
        pd.DataFrame: Preprocessed data as a dataframe.
    
```
----


**`postprocess_rhyme_promptings`**

```md
Postprocess rhyme promptings data.

    This function postprocesses rhyme promptings data by cleaning the text,
    setting the prompt type, and filtering the data by prompt and model.

    Args:
        df_prompts (pd.DataFrame): Input DataFrame containing rhyme promptings data.
        prompts (list, optional): List of prompts to include.
            Defaults to PROMPT_LIST.
        models (list, optional): List of models to include.
            Defaults to MODEL_LIST.
        min_lines (int, optional): Minimum number of lines for filtering.
            Defaults to MIN_NUM_LINES.
        max_lines (int, optional): Maximum number of lines for filtering.
            Defaults to MAX_NUM_LINES.
        save_to (str, optional): Path to save the processed data.
            Defaults to None.
        overwrite (bool, optional): Whether to overwrite existing processed data.
            Defaults to False.
        display (bool, optional): Whether to display the processed data.
            Defaults to False.
        verbose (bool, optional): Whether to print verbose output.
            Defaults to True.
        **display_kwargs: Additional keyword arguments passed to display_rhyme_promptings.

    Returns:
        pd.DataFrame: Postprocessed data as a dataframe.
    
```
----


### Data used in paper

In [3]:
df_genai_rhyme_promptings_as_in_paper = get_genai_rhyme_promptings_as_in_paper(display=True)
df_genai_rhyme_promptings_as_in_paper

* Collecting genai rhyme promptings as used in paper
  * Collecting from /Users/ryan/github/generative-formalism/data/corpus_genai_promptings.csv.gz
  * 17,988 generated responses
  * 16,935 unique responses
  * 16,871 unique poems
  * 23 unique prompts
  * 3 unique prompt types


'/Users/ryan/github/generative-formalism/data/tex/table_rhyme_promptings.paper_regenerated.tex'

'/Users/ryan/github/generative-formalism/data/tex/table_num_poems_models.paper_regenerated.tex'

Unnamed: 0_level_0,prompt_type,prompt,model,temperature,txt,num_lines
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
08610f10,DO_rhyme,Write a poem in ballad stanzas.,gpt-3.5-turbo,0.032586,"In a land of ancient lore,\nWhere tales of old...",36
035033a3,do_NOT_rhyme,Write an unrhymed poem.,ollama/llama3.1:70b,0.700000,Silence falls like a blanket over the city\nA ...,12
c137d1bb,do_NOT_rhyme,Write an unrhymed poem.,gpt-3.5-turbo,1.210359,"In the silence of the night, \nI lay awake wit...",16
c2598373,DO_rhyme,Write a poem in ballad stanzas.,gpt-3.5-turbo,1.347333,"In a quaint little village, so peaceful and se...",24
86994910,MAYBE_rhyme,Write a poem.,gpt-4-turbo,0.238043,"In the quiet hush of morning's glow,\nWhere wh...",28
...,...,...,...,...,...,...
874dbcc8,do_NOT_rhyme,Write a short poem that does NOT rhyme.,gpt-3.5-turbo,0.801410,"In the whispering wind,\nI hear the echo of fo...",16
45466a57,DO_rhyme,Write a poem that does rhyme.,ollama/llama3.1:70b,0.700000,Silence falls like a blanket\nA heavy stillnes...,12
c1dccca6,MAYBE_rhyme,Write a long poem.,ollama/llama3.1:8b,0.700000,"In twilight's hush, where shadows softly fall,...",40
2f004344,MAYBE_rhyme,Write a poem (with 20+ lines).,ollama/olmo2:13b,0.700000,"In the quiet of the morning light, \nWhere dr...",44


### Data replicated here

In [4]:
documentation(get_genai_rhyme_promptings_as_replicated)

**`get_genai_rhyme_promptings_as_replicated`**

```md

    Get genai rhyme promptings as replicated in this implementation.
    
    This function retrieves the rhyme promptings data that was generated
    and replicated in this codebase, as opposed to the original data
    used in the paper. It processes the data through the same postprocessing
    pipeline but uses the replicated suffix for output files.
    
    Args:
        *args: Variable length argument list passed to postprocess_rhyme_promptings.
        verbose (bool, optional): Whether to print verbose output during processing.
            Defaults to True.
        **kwargs: Additional keyword arguments passed to postprocess_rhyme_promptings.
            Common kwargs include:
            - prompts: List of prompts to process (defaults to PROMPT_LIST)
            - models: List of models to process (defaults to MODEL_LIST)
            - min_lines: Minimum number of lines per poem (defaults to MIN_NUM_LINES)
            - max_lines: Maximum number of lines per poem (defaults to MAX_NUM_LINES)
            - save_to: Path to save processed data
            - overwrite: Whether to overwrite existing files
    
    Returns:
        pd.DataFrame: Processed rhyme promptings data with replicated suffix
        applied to output files. Contains the same structure as the paper
        data but generated from the current implementation's stash.
        
    Note:
        This function uses REPLICATED_SUFFIX for output file naming to
        distinguish it from the original paper data. The underlying data
        comes from get_stash_df_poems() which contains the replicated
        generation results.
    
```
----


In [5]:
df_genai_rhyme_promptings_as_replicated = get_genai_rhyme_promptings_as_replicated(display=True)
df_genai_rhyme_promptings_as_replicated


* Collecting genai rhyme promptings as replicated here
  * Collecting from /Users/ryan/github/generative-formalism/data/stash/genai_rhyme_prompts.jsonl
  * 478 generated poems
  * 478 generated responses
  * 355 unique responses
  * 355 unique poems
  * 23 unique prompts
  * 3 unique prompt types


'/Users/ryan/github/generative-formalism/data/tex/table_rhyme_promptings.replicated.tex'

'/Users/ryan/github/generative-formalism/data/tex/table_num_poems_models.replicated.tex'

Unnamed: 0,prompt_type,prompt,model,temperature,txt,num_lines
333,DO_rhyme,Write a poem in the style of Emily Dickinson.,claude-3-opus-20240229,0.7000,The gentle Breeze doth beckon me—\nFrom my Cha...,12
319,do_NOT_rhyme,Write a poem in the style of Walt Whitman.,deepseek/deepseek-chat,0.7000,"I sing the ceaseless pulse, the thrumming, vas...",28
29,do_NOT_rhyme,Write a poem that does NOT rhyme.,claude-3-haiku-20240307,0.7000,"Soft breezes caress my face,\nWhispering secre...",12
365,MAYBE_rhyme,Write a poem.,deepseek/deepseek-chat,0.7000,"The world awakens in shades of gold,\nA story ...",32
257,do_NOT_rhyme,Write a poem in free verse.,claude-3-haiku-20240307,0.7000,"The breeze caresses my face,\nSoft and gentle,...",16
...,...,...,...,...,...,...
449,MAYBE_rhyme,Write a short poem.,claude-3-opus-20240229,0.7000,"In the stillness of the night,\nStars twinkle,...",12
209,DO_rhyme,Write a poem in the style of Emily Dickinson.,claude-3-haiku-20240307,0.7000,"The whispers of the wind,\nCaressing leaves an...",16
67,MAYBE_rhyme,Write a poem (with 20+ lines).,gpt-4-turbo,0.8257,"In the quiet heart of the twilight grove,\nWhe...",36
340,MAYBE_rhyme,Write a poem (with 20+ lines).,deepseek/deepseek-chat,0.7000,"The sun has drawn its final thread of gold,\nT...",28


### Aggregating paper + replicated data

In [6]:
# All together
df_all_rhyme_promptings = get_all_genai_rhyme_promptings(display=True)
df_all_rhyme_promptings

* Collecting genai rhyme promptings as used in paper
  * Collecting from /Users/ryan/github/generative-formalism/data/corpus_genai_promptings.csv.gz
  * 17,988 generated responses
  * 16,935 unique responses
  * 16,871 unique poems
  * 23 unique prompts
  * 3 unique prompt types

* Collecting genai rhyme promptings as replicated here
  * Collecting from /Users/ryan/github/generative-formalism/data/stash/genai_rhyme_prompts.jsonl
  * 478 generated poems
  * 478 generated responses
  * 355 unique responses
  * 355 unique poems
  * 23 unique prompts
  * 3 unique prompt types


'/Users/ryan/github/generative-formalism/data/tex/table_rhyme_promptings.tex'

'/Users/ryan/github/generative-formalism/data/tex/table_num_poems_models.tex'

Unnamed: 0,prompt_type,prompt,model,temperature,txt,num_lines
08610f10,DO_rhyme,Write a poem in ballad stanzas.,gpt-3.5-turbo,0.032586,"In a land of ancient lore,\nWhere tales of old...",36
035033a3,do_NOT_rhyme,Write an unrhymed poem.,ollama/llama3.1:70b,0.700000,Silence falls like a blanket over the city\nA ...,12
c137d1bb,do_NOT_rhyme,Write an unrhymed poem.,gpt-3.5-turbo,1.210359,"In the silence of the night, \nI lay awake wit...",16
c2598373,DO_rhyme,Write a poem in ballad stanzas.,gpt-3.5-turbo,1.347333,"In a quaint little village, so peaceful and se...",24
86994910,MAYBE_rhyme,Write a poem.,gpt-4-turbo,0.238043,"In the quiet hush of morning's glow,\nWhere wh...",28
...,...,...,...,...,...,...
449,MAYBE_rhyme,Write a short poem.,claude-3-opus-20240229,0.700000,"In the stillness of the night,\nStars twinkle,...",12
209,DO_rhyme,Write a poem in the style of Emily Dickinson.,claude-3-haiku-20240307,0.700000,"The whispers of the wind,\nCaressing leaves an...",16
67,MAYBE_rhyme,Write a poem (with 20+ lines).,gpt-4-turbo,0.825700,"In the quiet heart of the twilight grove,\nWhe...",36
340,MAYBE_rhyme,Write a poem (with 20+ lines).,deepseek/deepseek-chat,0.700000,"The sun has drawn its final thread of gold,\nT...",28
