# It's HW 04 Time 

> The digital cut-up revisited. In assignment #2, the tools available to you for cutting up and rearranging texts relied only on information present in the character data itself. Since then, we’ve learned several methods for incorporating outside information concerning syntax (i.e. with spaCy) and semantics (i.e., word vectors) into what we “know” about a text in question. Adapt your original digital cut-up assignment, making use of one of these new sources of information. What new aesthetic possibilities are made available if the unit of the cut-up can be a type of syntactic unit (instead of words, lines, characters), and if stretches of text can be algorithmically selected not at random, but based on their meaning?

Tools I can work with: 
- Tracery grammers 
- SpaCy NLP 
- word vectors 

Text I can work with: 
- My interview transcripts 
- Corpora Project 
- [Original HW2, a cutup around memory](https://github.com/leils/itp_spr_2023/blob/main/rwet/02_mixed_lines/mixed-lines.ipynb)

## Loading SpaCy and my source

In [2]:
import random
import spacy
nlp = spacy.load('en_core_web_md') 

Starting with the poem I wrote last summer, `Retrieval-induced distortion`. 

In [3]:
retrieval = open("sources/retrieval.txt").read()

In [4]:
print(retrieval)

Retrieval-induced distortion;
a phenomena in memory 
where a memory, sweet or sour, 
changes as you remember it. 

When you trace the line of your memories
Tug the string, pull them from the ether
lay them out on the deck 
and they mutate, stretched thin by your tugging 
smudged with your fingerprints 

I remember a day as a toddler
a wood-burning stove, warming the house
and a frozen winter beyond glass sliding doors. 
only this memory reads 
back and forth 
from a VHS tape. 


In [5]:
retrieval_doc = nlp(retrieval)

Because the [original poem](https://leils.github.io/telescopic-poems/21_26-01.html) was written to reflect how our recalling of memories can change them, I wanted to emulate that again here. My goal: to chance the poem more and more, each time you "recall" it. I want to do this in a way that still is plausible (ie. doesn't just look like a randomizer was set upon all the words with no regard to the meaning). 

My idea:
- re-writing each line so that it's less and less accurate each run, like every time it's read we're moving the meaning further and further away 

Basically, I'm attempting to: 
1. use spacy to understand what a word is (noun, verb, etc) 
2. Replace words with other words of their ilk (noun/verb/etc)  
3. Do this more and more across runs.

Questions that I think affect the concept of the piece: 
- Do I rewrite the original file across runs? Or the original "loaded" file? <--- (I chose not to do this for repeatability's sake) 
- Do I add a small chance of removing words? <--- (I did this) 

I think I'm realizing how little I know about language here. 

In [6]:
verbs = []
nouns = []
for word in retrieval_doc: 
    if word.pos_ == "VERB":
        verbs.append(word.text)
    elif word.pos_ == "NOUN":
        nouns.append(word.text)

In [7]:
print(nouns)

['distortion', 'phenomena', 'memory', 'memory', 'changes', 'line', 'memories', 'string', 'ether', 'deck', 'tugging', 'fingerprints', 'day', 'toddler', 'wood', 'stove', 'house', 'winter', 'glass', 'doors', 'memory', 'tape']


In [8]:
print(verbs)

['induced', 'remember', 'trace', 'pull', 'lay', 'mutate', 'stretched', 'smudged', 'remember', 'burning', 'warming', 'sliding', 'reads']


In [34]:
run_times = 0
chance_to_swap = 0.03
chance_to_forget = 0

In [35]:
def recall_poem():
    global chance_to_swap, chance_to_forget
    print("Chance at faulty recall: ", chance_to_swap, ", ", chance_to_forget)
    print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
    
    word_to_print = ""
    for word in retrieval_doc:
        rolled = random.random()
        if rolled <= chance_to_forget: 
            word_to_print = " " * len(word.text)
        else: 
            if (word.pos_ == "VERB") and (rolled <= chance_to_swap):
                word_to_print = random.choice(verbs)
            elif (word.pos_ == "NOUN") and (rolled <= chance_to_swap):
                word_to_print = random.choice(nouns)
            else:
                word_to_print = word.text
            
        print(word_to_print, end=" ")
        
    if (chance_to_swap < 1):
        chance_to_swap += .1
    else:
        chance_to_forget += .1
            
def reset():
    global chance_to_swap, chance_to_forget
    chance_to_swap = 0.03
    chance_to_forget = 0

In [56]:
recall_poem()

Chance at faulty recall:  1.03 ,  0.9999999999999999
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 

---
## Some nice outcomes: 
Chance at faulty recall:  0.73 ,  0
```
Retrieval - induced distortion ; 
 a doors in memory 
 where a memory , sweet or sour , 
 phenomena as you pull it . 

 When you burning the memory of your phenomena 
 Tug the memory , pull them from the tape 
 lay them out on the tape 
 and they mutate , reads thin by your fingerprints 
 remember with your fingerprints 

 I remember a line as a doors 
 a wood - burning string , remember the tape 
 and a frozen deck beyond glass warming doors . 
 only this deck mutate 
 back and forth 
 from a VHS memory . 
 ```

Chance at faulty recall:  1.03 ,  0.2
```
Retrieval - lay fingerprints ; 
 a memory in memory 
 where a glass , sweet    sour ,   day as              it . 

 When you induced the      of your phenomena 
 Tug the doors , reads them      the fingerprints   sliding them out on the string   and they stretched , trace thin by your deck 
 lay           string 

 I warming a phenomena as           
 a toddler - stretched phenomena , smudged the       
 and a        line beyond memories remember distortion   
 only this        mutate 
 back             from a VHS doors . 
 ```

I really, really liked this. These are really beautiful to me, and I love that they remix the internal contents of the poem. Because I'm swapping nouns with nouns and verbs with verbs, the poem still tends to make sense, until it very much so does not. In the end, there is nothing left of the poem. 

You can see a [video of the full run](https://youtu.be/g0GKooU_wUk) here. 

What would I like to do with this in the future? 
- This program does not actively destroy the original text, only its copy in memory. I like that, but I think it would be interesting to play with destroying the original text as well. 
- I wanted to swap the nouns and verbs with others from the Corpora project, but after looking at the JSON file, I felt unsatisfied with that word bank. I think I'd like to give this another word bank at some point. 
- I wonder what it might look like to "mix" memories through mixing poems. 

~ End of assignment. Below is another experiment. 

---

---
## Scratchpad

In [140]:
ret_noun_chunks = list(retrieval_doc.noun_chunks)

In [184]:
print(len(ret_noun_chunks))
print('------------')
for n in random.sample(ret_noun_chunks, 10): print(n.text)

26
------------
Retrieval-induced distortion
a day
them
Tug
the deck
a wood-burning stove
the line
the ether
your memories
them


---
---
# Side experiments with interview transcripts

I also wanted to test out some of these tools with the my interview transcripts. These particular transcripts have been adapted and edited from Descript, which intakes audio files and outputs transcripts. These transcripts are ... well, as accurate as you might expect from an audio transcription service. While I've edited them a bit, it seems like it might be as much work to edit them as to transcribe them myself. 

In [26]:
interview_1 = open("sources/01.txt").read()
interview_2 = open("sources/02.txt").read()
interview_3 = open("sources/03.txt").read()
interview_4 = open("sources/04.txt").read()

---
## Working with Transcripts
In these interviews, I've replaced the names with "Interviewee1" and such. However, there's still a lot that you can glean from these interviews especially if you know about the group of people that these selected interviews come from, so it's a bit of an uncomfortable thing to work with. I'm trying to wrestle with this fact, while continuing to work with them. 

There are a couple of issues with the source text. For example, here's a list of "People" SpaCy pulled from interview 2: 

```
[YouTuber,
 00:08:00,
 I.,
 unnie,
 unnie,
 00:17:00,
 Matt,
 Twinkie,
 00:24:00,
 00:30:00,
 00:32:00,
 Dad,
 dnr,
 00:39:00,
 00:41:00,
 00:46:00,
 Juwan,
 00:47:00,
 Dora,
 Barbie]
```

I had timestamps in them, which definitely didn't work here. So, I figured out a bit of regex to find-and-replace remove them: `\[0.:.*\]`. 


Task: Determine main characters, places, themes in the interview. 

In [27]:
interview1_doc = nlp(interview_1)
interview2_doc = nlp(interview_2)
interview3_doc = nlp(interview_3)
interview4_doc = nlp(interview_4)

---

Let's work with interivew 1

In [28]:
entities_01 = list(interview1_doc.ents)
noun_chunks_01 = list(interview1_doc.noun_chunks)

In [29]:
print(entities_01[:10])

[Argentina, Miami, New York, 10 years, three, Argentina, Argentina, World War, Spanish, Italian]


I want to take this list, and find the most commonly mentioned entities. 

In [30]:
entities_01[2].label_

'GPE'

In [31]:
people_01 = [e for e in entities_01 if e.label_ == "PERSON"]
locations_01 = [e for e in entities_01 if e.label_ == "LOC"]
gpe_01 = [e for e in entities_01 if e.label_ == "GPE"]

In [32]:
locations_01

[South Italy,
 Latin America,
 Americanness,
 South America,
 Americanness,
 Americanness,
 Americanness,
 South America,
 Caribbean,
 south America,
 Central America,
 Central America,
 Northeastern,
 South America]

In [33]:
random.sample(gpe_01, 5)

[Miami, New York, Venezuela, Miami, the United States]

What is the difference between a location and a gpe? 

In [37]:
sents_01 = [item.text for item in interview1_doc.sents]
for s in random.sample(sents_01, 5): print(s)

And then the kids who came after are like, fuck English, really? 

Yes.
Hmm.
You know?
You don't?


---

In [38]:
entities_02 = list(interview2_doc.ents)
noun_chunks_02 = list(interview2_doc.noun_chunks)

In [39]:
people_02 = [e for e in entities_02 if e.label_ == "PERSON"]
locations_02 = [e for e in entities_02 if e.label_ == "LOC"]

In [40]:
locations_02

[Americanness, the Bay Area, the Bay Area, Americanness, Americanness]

It's really interesting to me that this model recognizes 'Americanness' as a location. There's something SUPER intriguing about that to me, and I want to follow that further in the future.

In [41]:
sents_02 = [item.text for item in interview2_doc.sents]

In [42]:
for s in random.sample(sents_02, 5): print(s)

and he would say it in English and my dad would say like, reply in broken English, or like, go in Korean. 

And just like, cuz like for me, at least at this point in my life, I say that like, if I'm gonna talk shit about someone,  
Interviewer: I'm okay saying that to their face. 

And.
Like ittastes like dish water, but like to me, that tastes like home and that tastes like hours of effort by my grandma put into it. 

There's still a lot of like what physical beauty matters.


One output that felt like a poem: 
```
yeah.
But yeah, it's just.
and I kind of don't process it properly.
Speaking Korean.
But yeah.
```

Random thought dump 3/21 
- counting how much something is mentioned 
- swapping something in the middle of a mention (ie. if we swap sentences where they both mention "Adam") 
- grab descriptions from these interviews, place them on places??? 