# Prompting for un/rhyming poems

## Setting prompts and models

Edit `PROMPTS` and `MODEL_LIST` in `constants.py` to change prompts and models.

In [1]:
# Imports
import sys
sys.path.append('../')
from generative_formalism import *

### Setting prompts

In [2]:
documentation(describe_prompts, signature=False)

**`describe_prompts`**

```md
Print a description of the prompts with statistics and details.
    
    Args:
        prompts: List of prompt strings to describe. Defaults to PROMPT_LIST.
        prompt_to_type: Dictionary mapping prompts to their types. Defaults to PROMPT_TO_TYPE.
    
```
----


In [3]:
# PROMPT_LIST = ''
# PROMPT_TO_CATEGORY = {}

describe_prompts(
    prompts=PROMPT_LIST,
    prompt_to_type=PROMPT_TO_TYPE
)

* 23 unique prompts
* 3 prompt types

* List of prompts:
  ['Write a poem in ballad stanzas.',
 "Write an ryhmed poem in the style of Shakespeare's sonnets.",
 'Write a long poem that does rhyme.',
 'Write a poem in the style of Emily Dickinson.',
 'Write a poem in heroic couplets.',
 'Write an rhyming poem.',
 'Write a poem (with 20+ lines) that rhymes.',
 'Write a poem that does rhyme.',
 'Write a short poem that does rhyme.',
 'Write a poem that does NOT rhyme.',
 'Write a poem (with 20+ lines) that does NOT rhyme.',
 'Write a long poem that does NOT rhyme.',
 'Write a poem in the style of Walt Whitman.',
 'Write a poem in free verse.',
 'Write a poem in blank verse.',
 'Write an unrhymed poem.',
 'Write a short poem that does NOT rhyme.',
 'Write a poem (with 20+ lines).',
 'Write a long poem.',
 'Write a poem in groups of two lines.',
 'Write a poem.',
 'Write a poem in stanzas of 4 lines each.',
 'Write a short poem.']

* List of prompt types:
  {'DO_rhyme': ['Write a poem in bal

### Setting models

In [4]:
# Prompts

# To override:

# MODEL_LIST = []
# MODEL_TO_NAME = {}
# MODEL_TO_TYPE = {}

describe_models(models=MODEL_LIST, model_to_type=MODEL_TO_TYPE)

* 11 models (counting parameter changes)
  * 6 model types (ChatGPT, Claude, DeepSeek, Gemini, Llama, Olmo)
  * Using models:
  {   'ChatGPT': ['gpt-3.5-turbo', 'gpt-4-turbo'],
    'Claude': [   'claude-3-haiku-20240307',
                  'claude-3-opus-20240229',
                  'claude-3-sonnet-20240229'],
    'DeepSeek': ['deepseek/deepseek-chat'],
    'Gemini': ['gemini-pro'],
    'Llama': ['ollama/llama3.1:70b', 'ollama/llama3.1:8b'],
    'Olmo': ['ollama/olmo2', 'ollama/olmo2:13b']}
  


### Prompting models

In [1]:
documentation(check_api_keys)
check_api_keys()

NameError: name 'documentation' is not defined

In [10]:
documentation(generate_rhyme_prompt_text)

**`generate_rhyme_prompt_text`**

```md

    Convenience function for generate_text using rhyme stash.

    Args:
        args: Arguments for generate_text
        stash: Stash to use for caching. Defaults to STASH_GENAI_RHYME_PROMPTS.
        verbose: Whether to print verbose output. Defaults to True.
        kwargs: Keyword arguments for generate_text

    Returns:
        str: The generated text
    
```
----


In [11]:
documentation(generate_text)

**`generate_text`**

```md
Generate text with caching support (synchronous interface).
    
    This is the main text generation function that includes caching capabilities.
    Results are cached based on the combination of model, prompt, temperature, and system_prompt.
    
    Args:
        model: The model identifier
        prompt: The user prompt/input text
        temperature: Sampling temperature for text generation (0.0-1.0)
        system_prompt: Optional system prompt/instruction
        verbose: If True, print the complete response to stdout
        force: If True, bypass cache and force new generation
        stash: Cache storage backend for results
        
    Returns:
        str: The complete generated text response (from cache or new generation)
    
```
----


In [12]:
documentation(stream_llm)

**`stream_llm`**

```md
Universal streaming interface for language models.
    
    Automatically routes to the appropriate streaming function based on the model name.
    Google models (containing 'gemini') use the Google Generative AI API,
    all others use LiteLLM.
    
    Args:
        model: The model identifier
        prompt: The user prompt/input text
        temperature: Sampling temperature for text generation (0.0-1.0)
        system_prompt: Optional system prompt/instruction
        verbose: If True, print tokens to stdout as they're generated
        
    Yields:
        str: Individual tokens/text chunks from the model response
    
```
----


#### Demo of prompting a model

In [2]:
if REPLICATE_LLM_DEMO:
    demo_model, demo_prompt = get_demo_model_prompt()

    response_str = generate_rhyme_prompt_text(
        model=DEMO_MODEL,
        prompt=DEMO_PROMPT,
        verbose=True,
        force=REPLICATE_OVERWRITE
    )

NameError: name 'REPLICATE_LLM_DEMO' is not defined

#### Generate more poems from rhyme prompts

In [14]:
documentation(generate_more_poems_from_rhyme_prompts)

**`generate_more_poems_from_rhyme_prompts`**

```md

    Generate more poems from rhyme prompts using various models and configurations.
    
    This function generates additional poems by sampling from available models and prompts,
    with intelligent prioritization of underrepresented combinations to ensure balanced
    data collection across different model-prompt pairs.
    
    Args:
        n (int, optional): Number of poems to generate. Defaults to 3.
        df_sofar (pd.DataFrame, optional): Existing dataframe of generated poems to build upon.
            If None, loads all existing rhyme promptings. Defaults to None.
        models (list, optional): List of model identifiers to use for generation.
            Defaults to MODEL_LIST from constants.
        prompts (list, optional): List of prompt templates to use for generation.
            Defaults to PROMPT_LIST from constants.
        temperatures (list, optional): List of temperature values for generation.
            If None, uses default temperature. Defaults to None.
        verbose (bool, optional): Whether to print progress and status information.
            Defaults to True.
        force (bool, optional): Whether to force regeneration even if cached results exist.
            Defaults to False.
        max_n_combo (int, optional): Maximum number of entries allowed per model-prompt
            combination. If provided, model-prompt pairs that already have this many
            or more entries will be excluded from selection. Defaults to None (no limit).
    
    Returns:
        list: List of dictionaries containing generated poem data, including model,
            prompt, temperature, generated text, and metadata.
    
    Note:
        The function uses inverse probability weighting to prioritize model-prompt
        combinations that have been used less frequently, ensuring balanced sampling
        across the available options. Models that consistently fail are temporarily
        excluded from further attempts.
    
```
----


In [None]:
# Set params
models = [
    'claude-3-haiku-20240307',
    # 'claude-3-opus-20240229',
    # 'claude-3-sonnet-20240229',
    'deepseek/deepseek-chat',
    # 'gemini-pro',
    # 'gpt-3.5-turbo',
    # 'gpt-4-turbo',
    # 'ollama/llama3.1:70b',
    # 'ollama/llama3.1:8b',
    # 'ollama/olmo2',
    # 'ollama/olmo2:13b'
]
n = 100
verbose = False

# Run if enabled
df_newdata = pd.DataFrame()
if n > 0 and REPLICATE_LLM_DATA:  
    df_newdata = generate_more_poems_from_rhyme_prompts(
        n=n,
        models = models,
        prompts = PROMPT_LIST,
        temperatures=[DEFAULT_TEMPERATURE],
        verbose=False,
        force=REPLICATE_OVERWRITE,
        max_n_combo=25
    )

# Show the new data
df_newdata

>>> deepseek/deepseek-chat (n_model=707, n_prompt=159, n_combo=11): "Write a short poem that does rhyme.":  10%|█         | 10/100 [00:52<09:21,  6.24s/it]     

In [None]:
# tqdm?

### Collecting replicated data

In [None]:
documentation(get_genai_rhyme_promptings_as_replicated)

**`get_genai_rhyme_promptings_as_replicated`**

```md
None
```
----


In [None]:
df_genai_rhyme_promptings_as_replicated = get_genai_rhyme_promptings_as_replicated(display=False)
df_genai_rhyme_promptings_as_replicated.head()


* Collecting genai rhyme promptings as replicated here
  * Collecting from /Users/ryan/github/generative-formalism/data/stash/genai_rhyme_prompts.jsonl
  * 329 generated poems
  * 329 generated responses
  * 224 unique responses
  * 224 unique poems
  * 23 unique prompts
  * 3 unique prompt types


Unnamed: 0,prompt_type,prompt,model,temperature,txt,num_lines
79,DO_rhyme,Write a long poem that does rhyme.,gpt-3.5-turbo,0.3887,"In a world of chaos and strife,\nWhere darknes...",32
207,do_NOT_rhyme,Write a poem in free verse.,claude-3-haiku-20240307,0.7,"The breeze caresses my face,\nGently rustling ...",17
34,do_NOT_rhyme,Write a poem that does NOT rhyme.,claude-3-haiku-20240307,0.7,"Beneath the endless sky,\nWhispers of the wind...",15
211,do_NOT_rhyme,Write a short poem that does NOT rhyme.,claude-3-haiku-20240307,0.7,"Whispers of the wind,\nEchoing through the tre...",12
181,do_NOT_rhyme,Write a long poem that does NOT rhyme.,claude-3-haiku-20240307,0.7,"In the stillness of the night,\nWhen the world...",24


In [None]:
# All together
df_all_rhyme_promptings = get_all_genai_rhyme_promptings(display=False)
df_all_rhyme_promptings

* Collecting genai rhyme promptings as used in paper
  * Collecting from /Users/ryan/github/generative-formalism/data/corpus_genai_promptings.csv.gz
  * 17,988 generated responses
  * 16,935 unique responses
  * 16,871 unique poems
  * 23 unique prompts
  * 3 unique prompt types

* Collecting genai rhyme promptings as replicated here
  * Collecting from /Users/ryan/github/generative-formalism/data/stash/genai_rhyme_prompts.jsonl
  * 329 generated poems
  * 329 generated responses
  * 224 unique responses
  * 224 unique poems
  * 23 unique prompts
  * 3 unique prompt types


Unnamed: 0,prompt_type,prompt,model,temperature,txt,num_lines
e0cef7a6,do_NOT_rhyme,Write a poem (with 20+ lines) that does NOT rh...,gpt-3.5-turbo,0.568162,"In the stillness of the night, I find myself l...",24
62239ba3,do_NOT_rhyme,Write a poem that does NOT rhyme.,gpt-3.5-turbo,1.214958,"In the darkness of the night,\nI wander alone,...",16
1adb546c,do_NOT_rhyme,Write a poem that does NOT rhyme.,ollama/llama3.1:8b,0.700000,Silence falls like a blanket over the city\nA ...,12
02cb4c92,do_NOT_rhyme,Write a long poem that does NOT rhyme.,ollama/llama3.1:70b,0.700000,"In the depths of existence, where shadows roam...",28
20a14bc7,do_NOT_rhyme,Write a poem in the style of Walt Whitman.,gpt-3.5-turbo,0.376548,"O captain, my captain, the journey is long\nTh...",16
...,...,...,...,...,...,...
258,DO_rhyme,Write a long poem that does rhyme.,gpt-3.5-turbo,0.700000,"In the heart of the forest, where shadows danc...",36
87,DO_rhyme,Write a poem that does rhyme.,claude-3-haiku-20240307,0.292400,"The autumn breeze blows soft and cool,\nCaress...",12
178,do_NOT_rhyme,Write a long poem that does NOT rhyme.,claude-3-haiku-20240307,0.700000,"The gentle breeze caresses my face,\nCarrying ...",36
198,do_NOT_rhyme,Write a poem (with 20+ lines) that does NOT rh...,claude-3-haiku-20240307,0.700000,"The whispers of the wind caress my face,\nCarr...",28


>>> ollama/llama3.1:8b (n_model=1,523, n_prompt=209, n_combo=2): "Write a short poem that does NOT rhyme.":  28%|██▊       | 28/100 [00:30<00:17,  4.13it/s]