<a href="https://colab.research.google.com/github/mayamarshel/CART498/blob/main/A2_Script.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 2
## P+7 (Oulipian language modelling)

### Background

The Oulipo (*Ouvroir de Littérature Potentielle*, or “Workshop of Potential Literature”) is a French literary group founded in 1960 by writer Raymond Queneau and mathematician François Le Lionnais. The group focuses on using rules and constraints in writing as a way to spark creativity. Rather than seeing constraints as obstacles, Oulipians treat them as tools to inspire new forms of storytelling and poetry. Their work combines mathematics, language, and playfulness, making their approach both unique and influential in modern literature.

One of the most famous Oulipian writers is Georges Perec, who is known for his creative use of constraints. His novel *La Disparition* (“A Void”) is written entirely without the letter "e," which is especially challenging given how common "e" is in French. Perec’s writing often plays with the structure of language in surprising ways. One popular Oulipian technique is N+7, where each noun in a text is replaced by the noun seven entries later in a dictionary. This creates unusual, absurd, and often funny results, encouraging writers to think differently about language and meaning.

![George Perec](https://upload.wikimedia.org/wikipedia/commons/7/76/Myart_georges-perec_1978.jpg)

(George Perec, 1978. From Wikidata)

<!-- <img src="https://media.vigliensoni.com/clips/CART498/perec-01.jpg" width="800"> -->


## How it works

The N+7 technique process is straightforward

- **Start with a text**. Choose any text—this could be a poem, a sentence, or a passage.
- **Use a dictionary**. Have a dictionary (or word list) handy.
- **Replace each noun**. For every substantive noun in the original text, replace it with the noun appearing seven nouns away in the dictionary. If the end of the dictionary is reached, you can loop back to the beginning.
- **Maintain grammar**. Ensure the new text maintains grammatical correctness as much as possible, though the results often turn out  surreal and nonsensical.

For example, using the N+7 technique with a standard English dictionary for the original sentence:

*The cat sat on the mat*.

- ”Cat” → 7 nouns after “cat” is “catalog.”
- ”Mat” → 7 nouns after ”mat” is ”material.”

Results in:

*The catalog sat on the material.*

### Assignment and deliverables

For this assignment, you will create a variation of the N+7 technique we will name P+7. Using the GPT-2 language model, you will replace the last word of each line from *The Snow Man* with the word that has the seventh-highest probability according to the model’s predictions.

By the end of this assignment, submit a link to a GitHub repository named `CART498-GenAI` containing a folder labelled `A02` with the following items:

- A version of the text processed with your P+7 technique, saved as a `.txt` file.
- A second version of the text processed with `P+x`. Choose an `x` value that produces the funniest, wittiest, or most absurd version of the original text. Save this as a .txt file, and include the `x` value in the filename (e.g., `P+23.txt` or `P+12.txt`).
- A Python notebook with the script used to generate your P+7 and P+x transformations using the GPT-2 language model.
- A short reflection (250–350 words) explaining how altering the `x` value impacted the output of your P+x version. Additionally, discuss how you would implement a P+7 technique in which all nouns are replaced with their seventh-highest probability alternatives.

### *The Snow Man*
by Wallace Stevens (1879-1955)


> One must have a mind of winter  
> To regard the frost and the boughs  
> Of the pine-trees crusted with snow;  
> And have been cold a long time  
> To behold the junipers shagged with ice,  
> The spruces rough in the distant glitter  
> Of the January sun; and not to think  
> Of any misery in the sound of the wind,  
> In the sound of a few leaves,  
> Which is the sound of the land  
> Full of the same wind  
> That is blowing in the same bare place  
> For the listener, who listens in the snow,  
> And, nothing himself, beholds  
> Nothing that is not there and the nothing that is.  




In [4]:
# prompt: write a prompt that takes a series of sentences separated by a dash - and then removes the last word from the sentence
# puts them into an array called sentences. then print the array
prompt = "One must have a mind of winter- To regard the frost and the boughs- Of the pine-trees crusted with snow;- And have been cold a long time- To behold the junipers shagged with ice,-- The spruces rough in the distant glitter- Of the January sun; and not to think- Of any misery in the sound of the wind,- In the sound of a few leaves,- Which is the sound of the land- Full of the same wind- That is blowing in the same bare place- For the listener, who listens in the snow,- And, nothing himself, beholds- Nothing that is not there and the nothing that is."

def process_prompt(prompt):
    sentences = []
    for sentence in prompt.split('-'):
        words = sentence.strip().split()
        if words:
            sentences.append(" ".join(words[:-1]))  # Remove the last word
    return sentences

prompt = "One must have a mind of winter- To regard the frost and the boughs- Of the pine trees crusted with snow;- And have been cold a long time- To behold the junipers shagged with ice,-- The spruces rough in the distant glitter- Of the January sun; and not to think- Of any misery in the sound of the wind,- In the sound of a few leaves,- Which is the sound of the land- Full of the same wind- That is blowing in the same bare place- For the listener, who listens in the snow,- And, nothing himself, beholds- Nothing that is not there and the nothing that is."

sentences = process_prompt(prompt)
sentences

['One must have a mind of',
 'To regard the frost and the',
 'Of the pine trees crusted with',
 'And have been cold a long',
 'To behold the junipers shagged with',
 'The spruces rough in the distant',
 'Of the January sun; and not to',
 'Of any misery in the sound of the',
 'In the sound of a few',
 'Which is the sound of the',
 'Full of the same',
 'That is blowing in the same bare',
 'For the listener, who listens in the',
 'And, nothing himself,',
 'Nothing that is not there and the nothing that']

In [32]:
# prompt: now take the array of sentences and for each item in the array use the gpt 2 model to generate the 7 most likely words to come after the sentence and add them to an array of words that is titled with the sentence number they correspond to

!pip install transformers

from transformers import pipeline, GPT2Tokenizer, GPT2LMHeadModel

# Load pre-trained GPT-2 model and tokenizer
model_name = "gpt2"  # or a specific GPT-2 variant
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Create a text generation pipeline
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)


def p_plus_7(sentences):
    results = {}
    for i, sentence in enumerate(sentences):
        # Generate text using GPT-2
        generated_text = generator(sentence, max_length=len(sentence.split()) + 7, num_return_sequences=1)
        generated_words = generated_text[0]['generated_text'].split()[len(sentence.split()):]

        # Extract the 7 most likely words
        top_7_words = generated_words[:7]
        results[f"Sentence {i+1}"] = top_7_words
    return results


p7_results = p_plus_7(sentences)

for sentence_num, words in p7_results.items():
    last_word = words[0] if words else "No words generated"
    original_sentence = sentences[int(sentence_num.split()[1]) - 1]  # Get the original sentence
    print(f"{original_sentence} {last_word}")



Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eo

One must have a mind of what
To regard the frost and the thaw
Of the pine trees crusted with needles,
And have been cold a long time,
To behold the junipers shagged with their
The spruces rough in the distant forest,
Of the January sun; and not to the
Of any misery in the sound of the war,
In the sound of a few more
Which is the sound of the rain...
Full of the same the
That is blowing in the same bare hands
For the listener, who listens in the foreground,
And, nothing himself, I
Nothing that is not there and the nothing that is
