<a href="https://colab.research.google.com/github/leahiscoding/CART498/blob/main/A02_P%2B7_OULIPO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 2
## P+7 (Oulipian language modelling)

### Background

The Oulipo (*Ouvroir de Littérature Potentielle*, or “Workshop of Potential Literature”) is a French literary group founded in 1960 by writer Raymond Queneau and mathematician François Le Lionnais. The group focuses on using rules and constraints in writing as a way to spark creativity. Rather than seeing constraints as obstacles, Oulipians treat them as tools to inspire new forms of storytelling and poetry. Their work combines mathematics, language, and playfulness, making their approach both unique and influential in modern literature.

One of the most famous Oulipian writers is Georges Perec, who is known for his creative use of constraints. His novel [*La Disparition* (“A Void”) is written entirely without the letter "e,"](https://archive.org/details/void00pere/mode/2up) which is especially challenging given how common "e" is in French. Perec’s writing often plays with the structure of language in surprising ways.

One popular Oulipian technique is N+7, where each noun in a text is replaced by the noun seven entries later in a dictionary. This creates unusual, absurd, and often funny results, encouraging writers to think differently about language and meaning.

![George Perec](https://upload.wikimedia.org/wikipedia/commons/7/76/Myart_georges-perec_1978.jpg)

(George Perec, 1978. From Wikidata)

<!-- <img src="https://media.vigliensoni.com/clips/CART498/perec-01.jpg" width="800"> -->


## How it works

The N+7 technique process is straightforward

- **Start with a text**. Choose any text—this could be a poem, a sentence, or a passage.
- **Use a dictionary**. Have a dictionary (or word list) handy.
- **Replace each noun**. For every substantive noun in the original text, replace it with the noun appearing seven nouns away in the dictionary. If the end of the dictionary is reached, you can loop back to the beginning.
- **Maintain grammar**. Ensure the new text maintains grammatical correctness as much as possible, though the results often turn out  surreal and nonsensical.

For example, using the N+7 technique with a standard English dictionary for the original sentence:

*The cat sat on the mat*.

- ”Cat” → 7 nouns after “cat” is “catalog.”
- ”Mat” → 7 nouns after ”mat” is ”material.”

Results in:

*The catalog sat on the material.*

[Online example](http://www.spoonbill.org/n+7/)

### Assignment and deliverables

For this assignment, you will create a variation of the N+7 technique we will name P+7. Using the GPT-2 language model, you will replace the last word of each line from *The Snow Man* with the word that has the seventh-highest probability according to the model’s predictions.

By the end of this assignment, submit a link to a GitHub repository named `CART498-GenAI` containing a folder labelled `A02` with the following items:

- A version of the text processed with your P+7 technique, saved as a `.txt` file.
- A second version of the text processed with `P+x`. Choose an `x` value that produces the funniest, wittiest, or most absurd version of the original text. Save this as a .txt file, and include the `x` value in the filename (e.g., `P+23.txt` or `P+12.txt`).
- A Python notebook with the script used to generate your P+7 and P+x transformations using the GPT-2 language model.
- A short reflection (250–350 words) explaining how altering the `x` value impacted the output of your P+x version. Additionally, discuss how you would implement a P+7 technique in which all nouns are replaced with their seventh-highest probability alternatives.

### *The Snow Man*
by Wallace Stevens (1879-1955)


> One must have a mind of winter  
> To regard the frost and the boughs  
> Of the pine-trees crusted with snow;  
> And have been cold a long time  
> To behold the junipers shagged with ice,  
> The spruces rough in the distant glitter  
> Of the January sun; and not to think  
> Of any misery in the sound of the wind,  
> In the sound of a few leaves,  
> Which is the sound of the land  
> Full of the same wind  
> That is blowing in the same bare place  
> For the listener, who listens in the snow,  
> And, nothing himself, beholds  
> Nothing that is not there and the nothing that is.  




In [24]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [40]:
poem = """One must have a mind of winter
To regard the frost and the boughs
Of the pine-trees crusted with snow;
And have been cold a long time
To behold the junipers shagged with ice,
The spruces rough in the distant glitter
Of the January sun; and not to think
Of any misery in the sound of the wind,
In the sound of a few leaves,
Which is the sound of the land
Full of the same wind
That is blowing in the same bare place
For the listener, who listens in the snow,
And, nothing himself, beholds
Nothing that is not there and the nothing that is."""

def replace_last_word(text_block, n):
    new_lines = []

    for line in text_block.strip().split('\n'):
        words = line.split()
        if not words: continue

        context = " ".join(words[:-1])

        inputs = tokenizer(context, return_tensors="pt")
        with torch.no_grad():
            outputs = model(**inputs).logits[0,-1]

        # Corrected: outputs instead of predictions
        sorted_predictions = torch.argsort(outputs, descending=True)
        nth_best_id = sorted_predictions[n-1]

        new_word = tokenizer.decode(nth_best_id)
        new_lines.append(f"{context} {new_word.strip()}") # Added .strip() to new_word to remove potential leading spaces

    return "\n".join(new_lines)



In [49]:
p7_content = replace_last_word(poem, 7)
with open("P+7.txt", "r") as f:
    p7_content = f.read()
print(p7_content)

One must have a mind of her
To regard the frost and the death
Of the pine-trees crusted with oil
And have been cold a long way
To behold the junipers shagged with white
The spruces rough in the distant horizon
Of the January sun; and not to have
Of any misery in the sound of the sound
In the sound of a few shots
Which is the sound of the voice
Full of the same day
That is blowing in the same bare air
For the listener, who listens in the morning
And, nothing himself, I
Nothing that is not there and the nothing that isn


In [51]:
px_content = replace_last_word(poem, 15000)

with open("P+x.txt", "w") as f:
    f.write(px_content)
with open("P+x.txt", "r") as f:
    saved_text = f.read()
    print(saved_text)

One must have a mind of Holiday
To regard the frost and the Fever
Of the pine-trees crusted with hazard
And have been cold a long achievement
To behold the junipers shagged with homegrown
The spruces rough in the distant differ
Of the January sun; and not to bad
Of any misery in the sound of the chapters
In the sound of a few actu
Which is the sound of the intersect
Full of the same POW
That is blowing in the same bare LOVE
For the listener, who listens in the ful
And, nothing himself, ps
Nothing that is not there and the nothing that Mim


# Task
Fix the `lfmDkil3qSvU` cell to write `px_content` to `P+23.txt` in write mode before printing it.

## Fix File Writing Logic in P+x Cell

### Subtask:
Modify the `lfmDkil3qSvU` cell to correctly write the `px_content` to `P+23.txt` using write mode (`'w'`) before attempting to print it.


## Summary:

### Data Analysis Key Findings
*   The identified issue in the `lfmDkil3qSvU` cell is that the `px_content` is not being written to `P+23.txt` using the correct file writing mode.
*   The sequence of operations is incorrect, as the content is attempted to be printed without first ensuring it has been written to the specified file in write mode (`'w'`).

### Insights or Next Steps
*   The next step is to modify the `lfmDkil3qSvU` cell to use the write mode (`'w'`) when opening `P+23.txt` and to perform the write operation for `px_content` before any subsequent actions.
