# Prompting for completions of poems

In [1]:
import sys
sys.path.append('../')
from generative_formalism import *

In [2]:
first_n_lines = FIRST_N_LINES
system_prompt = get_rhyme_completion_system_prompt(first_n_lines=first_n_lines)
printm(f'#### System prompt\n\n```md\n{system_prompt}\n```\n')


#### System prompt

```md
The following is the first 5 lines from a poem given in the user prompt, whose true number of lines is stated there.

Complete the poem – do this from memory if you know it; if not, imitate its style and theme for the same number of lines as in the original.

Return lines in tab-separated form, starting from line 6 up to the stated number of lines:

    line#	line

Do not return any other text besides these tab-separated lines.
```


In [3]:


poem_eg="""
And if I have a soul my soul is green
And if it sings it doesn't sing to me
And if it loves it loves externally
Both what it has and what it hasn't seen

And if it's green it may as well be high
And if ambition doesn't give it height
And if it only rises with a fight
Against itself and not against the sky

If all the force it uses leaves me free
This proves it not just definite but right
"""



user_prompt = get_rhyme_completion_user_prompt(poem_eg, first_n_lines=first_n_lines)
printm(f'#### User prompt\n\n```md\n{user_prompt}\n```')


#### User prompt

```md
NUMBER OF LINES: 10

1	And if I have a soul my soul is green
2	And if it sings it doesn't sing to me
3	And if it loves it loves externally
4	Both what it has and what it hasn't seen

5	And if it's green it may as well be high
```

In [4]:
documentation(generate_more_completions, signature=True)

**`generate_more_completions`**

```md

    Generate more poem completions using various models and source poems.

    This function generates additional poem completions by sampling from available models
    and source poems from the Chadwyck corpus, with intelligent prioritization of
    underrepresented combinations to ensure balanced data collection across different
    model-poem pairs.

    Args:
        n (int, optional): Number of completions to generate. Defaults to 3.
        df_sofar (pd.DataFrame, optional): Existing dataframe of generated completions to build upon.
            If None, loads all existing rhyme completions. Defaults to None.
        models (list, optional): List of model identifiers to use for generation.
            Defaults to MODEL_LIST from constants.
        first_n_lines (list, optional): List of possible first_n_lines values to use.
            Defaults to [2, 5].
        temperatures (list, optional): List of temperature values for generation.
            If None, uses default temperature. Defaults to None.
        verbose (bool, optional): Whether to print progress and status information.
            Defaults to True.
        force (bool, optional): Whether to force regeneration even if cached results exist.
            Defaults to False.
        max_n_combo (int, optional): Maximum number of entries allowed per model-poem
            combination. If provided, model-poem pairs that already have this many
            or more entries will be excluded from selection. Defaults to None (no limit).
        source_poems_sample (str, optional): Which corpus sample to use for source poems.
            Options: 'period', 'rhyme', 'period_subcorpus'. Defaults to 'period'.

    Returns:
        list: List of dictionaries containing generated completion data, including model,
            source poem ID, first_n_lines, temperature, and completion results.

    Note:
        The function uses inverse probability weighting to prioritize model-poem
        combinations that have been used less frequently, ensuring balanced sampling
        across the available options. Models that consistently fail are temporarily
        excluded from further attempts.
    
```
----


*Call signature*

```md
generate_more_completions(
    n=3
    df_sofar=None
    models=[   'claude-3-haiku-20240307',
    'claude-3-opus-20240229',
    'claude-3-sonnet-20240229',
    'deepseek/deepseek-chat',
    'gemini-pro',
    'gpt-3.5-turbo',
    'gpt-4-turbo',
    'ollama/llama3.1:70b',
    'ollama/llama3.1:8b',
    'ollama/olmo2',
    'ollama/olmo2:13b']
    first_n_lines=5
    temperatures=[0.7]
    verbose=True
    force=True
    max_n_combo=None
    source_poems_sample='period'
)
```

In [5]:
models = [
    'claude-3-haiku-20240307',
    'claude-3-opus-20240229',
    'claude-3-sonnet-20240229',
    'deepseek/deepseek-chat',
    'gemini-pro',
    'gpt-3.5-turbo',
    'gpt-4-turbo',
    'ollama/llama3.1:70b',
    'ollama/llama3.1:8b',
]
first_n_lines=FIRST_N_LINES
temperatures=[DEFAULT_TEMPERATURE]
verbose=False
force=False
max_n_combo=25
source_poems_sample='period'
n_to_gen = 3

generate_more_completions(
    n=n_to_gen,
    models=models,
    first_n_lines=first_n_lines,
    temperatures=temperatures,
    verbose=verbose,
    force=force,
    max_n_combo=max_n_combo,
    source_poems_sample=source_poems_sample,
)

* Loading period sample from /Users/rj416/github/generative-formalism/data/corpus_sample_by_period.replicated.csv.gz
* Loading legacy genai rhyme completions from {PATH_REPO}/data/corpus_genai_rhyme_completions.csv.gz
* Found 11298 unique human poems for input to models
* Found 21130 unique generated poems
* Distribution of input poem lengths
  6   [ 14 |    36 ]------                                              230
* Distribution of output poem lengths
  10  -- | 17 ]--                                                     100
* Computing line similarity


100%|██████████| 326862/326862 [00:02<00:00, 117089.53it/s]


* Filtered out 169 recognized poems


>>> gpt-4-turbo (n_model=0, n_poem=0, n_combo=0): poem english/wolcotjo/Z300541100 (first_5): 100%|██████████| 3/3 [00:29<00:00,  9.81s/it]          


Unnamed: 0,id,model,id_human,first_n_lines,temperature,response
0,94bb2fc2,ollama/llama3.1:8b,english/anderso2/Z300260593,5,0.7,id stanza_num line_num ...
1,e6968fde,claude-3-opus-20240229,american/am0172/Z200149333,5,0.7,id stanza_num line_num ...
2,59891ca7,gpt-4-turbo,english/wolcotjo/Z300541100,5,0.7,id stanza_num line_num ...
