# Collecting rhyme prompting data

In [10]:
import sys
sys.path.append('../')
from generative_formalism import *

## Loading data

### Pre/post-processing

In [13]:
documentation(preprocess_rhyme_promptings)
documentation(postprocess_rhyme_promptings)

**`preprocess_rhyme_promptings`**

```md
Preprocess rhyme promptings data.

    This function preprocesses rhyme promptings data from legacy pickle and JSON files,
    combines them, and saves to CSV format.

    Args:
        overwrite (bool, optional): Whether to overwrite existing processed data.
            Defaults to False.
        save_to (str, optional): Path to save the processed data.
            Defaults to PATH_GENAI_PROMPTS_IN_PAPER.

    Returns:
        pd.DataFrame: Preprocessed data as a dataframe.
    
```
----


**`postprocess_rhyme_promptings`**

```md
Postprocess rhyme promptings data.

    This function postprocesses rhyme promptings data by cleaning the text,
    setting the prompt type, and filtering the data by prompt and model.

    Args:
        df_prompts (pd.DataFrame): Input DataFrame containing rhyme promptings data.
        prompts (list, optional): List of prompts to include.
            Defaults to PROMPT_LIST.
        models (list, optional): List of models to include.
            Defaults to MODEL_LIST.
        min_lines (int, optional): Minimum number of lines for filtering.
            Defaults to MIN_NUM_LINES.
        max_lines (int, optional): Maximum number of lines for filtering.
            Defaults to MAX_NUM_LINES.
        save_to (str, optional): Path to save the processed data.
            Defaults to None.
        overwrite (bool, optional): Whether to overwrite existing processed data.
            Defaults to False.
        display (bool, optional): Whether to display the processed data.
            Defaults to False.
        verbose (bool, optional): Whether to print verbose output.
            Defaults to True.
        **display_kwargs: Additional keyword arguments passed to display_rhyme_promptings.

    Returns:
        pd.DataFrame: Postprocessed data as a dataframe.
    
```
----


### Data used in paper

In [14]:
df_genai_rhyme_promptings_as_in_paper = get_genai_rhyme_promptings_as_in_paper(display=True)
df_genai_rhyme_promptings_as_in_paper

* Collecting genai rhyme promptings as used in paper
  * Collecting from /Users/ryan/github/generative-formalism/data/corpus_genai_promptings.csv.gz
  * 17,988 generated responses
  * 16,935 unique responses
  * 16,871 unique poems
  * 23 unique prompts
  * 3 unique prompt types


'/Users/ryan/github/generative-formalism/data/tex/table_rhyme_promptings.paper_regenerated.tex'

'/Users/ryan/github/generative-formalism/data/tex/table_num_poems_models.paper_regenerated.tex'

Unnamed: 0_level_0,prompt_type,prompt,model,temperature,txt,num_lines
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
45e9aa8b,DO_rhyme,Write a poem in the style of Emily Dickinson.,ollama/llama3.1:8b,0.700000,"The silence falls like twilight's hush,\nA soo...",16
028a6893,do_NOT_rhyme,Write a poem (with 20+ lines) that does NOT rh...,gpt-3.5-turbo,0.603860,"In the stillness of the night, \nI find myself...",30
92f06dff,do_NOT_rhyme,Write an unrhymed poem.,ollama/llama3.1:8b,0.700000,Silence falls like a blanket over the city\nPe...,12
332c3e14,do_NOT_rhyme,Write a poem in free verse.,ollama/llama3.1:8b,0.700000,In the silence of dawn's whispered secrets\nI ...,16
a498586f,MAYBE_rhyme,Write a poem (with 20+ lines).,gpt-4-turbo,0.942433,"In the quiet village where the willows weep,\n...",40
...,...,...,...,...,...,...
08d7383d,do_NOT_rhyme,Write a poem that does NOT rhyme.,claude-3-sonnet-20240229,0.222496,"The wind whispers secrets,\nCarrying tales unt...",16
93ba199d,MAYBE_rhyme,Write a poem in groups of two lines.,claude-3-sonnet-20240229,0.972548,"Sunlight dances on the waves,\nNature's rhythm...",10
488ca270,MAYBE_rhyme,Write a poem in stanzas of 4 lines each.,gpt-4-turbo,0.999849,"Beneath a sky of sapphire glow, \nA whisperin...",16
58d5f165,do_NOT_rhyme,Write a poem (with 20+ lines) that does NOT rh...,ollama/olmo2,0.700000,"In the dawn of a new day, where dreams still l...",36


### Data replicated here

In [15]:
documentation(get_genai_rhyme_promptings_as_replicated)

**`get_genai_rhyme_promptings_as_replicated`**

```md

    Get genai rhyme promptings as replicated in this implementation.
    
    This function retrieves the rhyme promptings data that was generated
    and replicated in this codebase, as opposed to the original data
    used in the paper. It processes the data through the same postprocessing
    pipeline but uses the replicated suffix for output files.
    
    Args:
        *args: Variable length argument list passed to postprocess_rhyme_promptings.
        verbose (bool, optional): Whether to print verbose output during processing.
            Defaults to True.
        **kwargs: Additional keyword arguments passed to postprocess_rhyme_promptings.
            Common kwargs include:
            - prompts: List of prompts to process (defaults to PROMPT_LIST)
            - models: List of models to process (defaults to MODEL_LIST)
            - min_lines: Minimum number of lines per poem (defaults to MIN_NUM_LINES)
            - max_lines: Maximum number of lines per poem (defaults to MAX_NUM_LINES)
            - save_to: Path to save processed data
            - overwrite: Whether to overwrite existing files
    
    Returns:
        pd.DataFrame: Processed rhyme promptings data with replicated suffix
        applied to output files. Contains the same structure as the paper
        data but generated from the current implementation's stash.
        
    Note:
        This function uses REPLICATED_SUFFIX for output file naming to
        distinguish it from the original paper data. The underlying data
        comes from get_stash_df_poems() which contains the replicated
        generation results.
    
```
----


In [16]:
df_genai_rhyme_promptings_as_replicated = get_genai_rhyme_promptings_as_replicated(display=True)
df_genai_rhyme_promptings_as_replicated


* Collecting genai rhyme promptings as replicated here
  * Collecting from /Users/ryan/github/generative-formalism/data/stash/genai_rhyme_prompts.jsonl
  * 457 generated poems
  * 457 generated responses
  * 337 unique responses
  * 337 unique poems
  * 23 unique prompts
  * 3 unique prompt types


'/Users/ryan/github/generative-formalism/data/tex/table_rhyme_promptings.replicated.tex'

'/Users/ryan/github/generative-formalism/data/tex/table_num_poems_models.replicated.tex'

Unnamed: 0,prompt_type,prompt,model,temperature,txt,num_lines
79,do_NOT_rhyme,Write an unrhymed poem.,deepseek/deepseek-chat,0.0307,"The window holds a square of sky,\na slow and ...",20
36,do_NOT_rhyme,Write a poem that does NOT rhyme.,claude-3-haiku-20240307,0.7000,"Whispers in the wind,\nEchoing through the tre...",12
38,do_NOT_rhyme,Write a poem that does NOT rhyme.,claude-3-haiku-20240307,0.7000,"Soft caresses upon my face,\nA gentle breath, ...",12
203,do_NOT_rhyme,Write a poem in blank verse.,claude-3-haiku-20240307,0.7000,"Beneath the endless sky, I tread alone,\nMy fo...",14
428,MAYBE_rhyme,Write a short poem.,claude-3-opus-20240229,0.7000,"In the stillness of the night,\nStars twinkle,...",12
...,...,...,...,...,...,...
39,do_NOT_rhyme,Write a poem that does NOT rhyme.,claude-3-haiku-20240307,0.7000,"Whispers in the wind,\nEchoing through the tre...",16
91,MAYBE_rhyme,Write a poem in stanzas of 4 lines each.,claude-3-opus-20240229,0.4995,"In fields of gold and amber hues,\nThe sun-kis...",20
153,DO_rhyme,Write a long poem that does rhyme.,claude-3-haiku-20240307,0.7000,"Through the winding, wooded ways,\nI wander on...",36
66,MAYBE_rhyme,Write a poem (with 20+ lines).,gpt-4-turbo,0.8257,"In the quiet heart of the twilight grove,\nWhe...",36


### Aggregating paper + replicated data

In [17]:
# All together
df_all_rhyme_promptings = get_all_genai_rhyme_promptings(display=True)
df_all_rhyme_promptings

* Collecting genai rhyme promptings as used in paper
  * Collecting from /Users/ryan/github/generative-formalism/data/corpus_genai_promptings.csv.gz
  * 17,988 generated responses
  * 16,935 unique responses
  * 16,871 unique poems
  * 23 unique prompts
  * 3 unique prompt types

* Collecting genai rhyme promptings as replicated here
  * Collecting from /Users/ryan/github/generative-formalism/data/stash/genai_rhyme_prompts.jsonl
  * 457 generated poems
  * 457 generated responses
  * 337 unique responses
  * 337 unique poems
  * 23 unique prompts
  * 3 unique prompt types


'/Users/ryan/github/generative-formalism/data/tex/table_rhyme_promptings.tex'

'/Users/ryan/github/generative-formalism/data/tex/table_num_poems_models.tex'

Unnamed: 0,prompt_type,prompt,model,temperature,txt,num_lines
45e9aa8b,DO_rhyme,Write a poem in the style of Emily Dickinson.,ollama/llama3.1:8b,0.700000,"The silence falls like twilight's hush,\nA soo...",16
028a6893,do_NOT_rhyme,Write a poem (with 20+ lines) that does NOT rh...,gpt-3.5-turbo,0.603860,"In the stillness of the night, \nI find myself...",30
92f06dff,do_NOT_rhyme,Write an unrhymed poem.,ollama/llama3.1:8b,0.700000,Silence falls like a blanket over the city\nPe...,12
332c3e14,do_NOT_rhyme,Write a poem in free verse.,ollama/llama3.1:8b,0.700000,In the silence of dawn's whispered secrets\nI ...,16
a498586f,MAYBE_rhyme,Write a poem (with 20+ lines).,gpt-4-turbo,0.942433,"In the quiet village where the willows weep,\n...",40
...,...,...,...,...,...,...
39,do_NOT_rhyme,Write a poem that does NOT rhyme.,claude-3-haiku-20240307,0.700000,"Whispers in the wind,\nEchoing through the tre...",16
91,MAYBE_rhyme,Write a poem in stanzas of 4 lines each.,claude-3-opus-20240229,0.499500,"In fields of gold and amber hues,\nThe sun-kis...",20
153,DO_rhyme,Write a long poem that does rhyme.,claude-3-haiku-20240307,0.700000,"Through the winding, wooded ways,\nI wander on...",36
66,MAYBE_rhyme,Write a poem (with 20+ lines).,gpt-4-turbo,0.825700,"In the quiet heart of the twilight grove,\nWhe...",36
