# Reading and Writing Electronic Text
Spring 2024

Myrah Sarwar

***

## Assignment 4
### Digital cut-up revisited

#### Text sources:
- _Man’s Search for Meaning_, Victor Frankl
- _Sense of Nonsense_ audio transcript, Alan Watts
- _The Hitchhiker's Guide to the Galaxy_, Douglas Adams

Load text files:

In [326]:
import textwrap

frankl_txt = open("frankl.txt").read()
watts_txt = open("watts1.txt").read()
hitchhiker_txt = open("hitchhiker.txt").read()

Printing all text files individually here in full (for my own reference)

In [327]:
print(textwrap.fill(frankl_txt), "\n")
print(textwrap.fill(watts_txt), "\n")
print(textwrap.fill(hitchhiker_txt))

By declaring that man is responsible and must actualize the potential
meaning of his life, I wish to stress that the true meaning of life is
to be discovered in the world rather than within man or his own
psyche, as though it were a closed system. I have termed this
constitutive characteristic "the self-transcendence of human
existence." It denotes the fact that being human always points, and is
directed, to something or someone, other than oneself--be it a meaning
to fulfill or another human being to encounter. The more one forgets
himself--by giving himself to a cause to serve or another person to
love--the more human he is and the more he actualizes himself. What is
called self-actualization is not an attainable aim at all, for the
simple reason that the more one would strive for it, the more he would
miss it. In other words, self-actualization is possible only as a
side-effect of self-transcendence. 

But it seems that only in moments of unusual insight and illumination
that we get

Combining all text files so it's easier to extract from

In [328]:
combined_txt = frankl_txt + " " + watts_txt + " " + hitchhiker_txt

Set up spacy:

In [None]:
import sys
!conda install -c conda-forge -y --prefix {sys.prefix} spacy
!{sys.executable} -m spacy download en_core_web_md

In [54]:
import spacy

nlp = spacy.load('en_core_web_md')

Separate sentences/words and parts of speech:

In [78]:
combined = nlp(combined_txt)

sentences = list(combined.sents)
words = [w for w in list(combined) if w.is_alpha]

In [262]:
noun_single = [w for w in words if w.tag_ == 'NN']
noun_plural = [w for w in words if w.tag_ == 'NNS']
verbs_base = [w for w in words if w.tag_ == 'VB']
verbs_present = [w for w in words if w.tag_ == 'VBG']
verbs_past = [w for w in words if w.tag_ == 'VBN']
adj = [w for w in words if w.tag_ == 'JJ']
adjs = [w for w in words if w.tag_ == 'JJS']
adv = [w for w in words if w.tag_ == 'RB']

Choose a random sentence and randomly replace words with appropriate parts of speech:

In [465]:
import random
sentence_base = random.choice(sentences)
sentence_base_words = []

for word in sentence_base:
    if word.tag_ == 'NN' and word.text.islower():
        new_word = random.choice(noun_single).text
        sentence_base_words.append(new_word)
    elif word.tag_ == 'NNS' and word.text.islower():
        new_word = random.choice(noun_plural).text
        sentence_base_words.append(new_word)  
    elif word.tag_ == 'VB' and word.text.islower():
        new_word = random.choice(verbs_base).text
        sentence_base_words.append(new_word)
    elif word.tag_ == 'VBG' and word.text.islower():
        new_word = random.choice(verbs_present).text
        sentence_base_words.append(new_word)
    elif word.tag_ == 'VBN' and word.text.islower():
        new_word = random.choice(verbs_past).text
        sentence_base_words.append(new_word) 
    elif word.tag_ == 'JJ' and word.text.islower():
        new_word = random.choice(adj).text
        sentence_base_words.append(new_word)
    elif word.tag_ == 'JJS' and word.text.islower():
        new_word = random.choice(adjs).text
        sentence_base_words.append(new_word) 
    elif word.tag_ == 'RB' and word.text.islower():
        new_word = random.choice(adv).text
        sentence_base_words.append(new_word)
    else:
        sentence_base_words.append(word.text)
    sentence_base_words.append(word.whitespace_)

print("Original:")
print(sentence_base)
print()
print("Modified:")
print(''.join(sentence_base_words))

Original:
It is in this kind of meaninglessness that we get the profoundest meaning.

Modified:
It is in this complexity of possible that we get the most psyche. 


Didn't really like how this turned out, so trying again by separating by subject and object instead:

In [294]:
def flatten_subtree(st):
    return ''.join([w.text_with_ws for w in list(st)]).strip()

In [425]:
subjects = []
for word in combined:
    if word.dep_ in ('nsubj', 'nsubjpass'):
        subjects.append(flatten_subtree(word.subtree))
        
subjects

['that man',
 'I',
 'the true meaning of life',
 'it',
 'I',
 'It',
 'it',
 'The more one',
 'he',
 'he',
 'What',
 'self-actualization',
 'the more one',
 'he',
 'self-actualization',
 'it',
 'we',
 'the true meaning of life',
 'its purpose',
 'its sense',
 'we',
 'that',
 'that',
 'that',
 'It',
 'we',
 'This planet',
 'which',
 'most of the people living on it',
 'Many solutions',
 'most of these',
 'which',
 'it',
 'that']

In [426]:
objects = []
for word in combined:
    if word.dep_ in ('dobj', 'iobj'):
        objects.append(flatten_subtree(word.subtree))
        
objects

['the potential meaning of his life',
 'this constitutive characteristic "',
 'the fact that being human always points',
 'himself',
 'himself',
 'himself',
 'it',
 'the point of this',
 'the word significant',
 'balderdash',
 'the profoundest meaning',
 'a problem, which was this']

In [None]:
import random
sentence_base = random.choice(sentences)
SOV = []

for word in sentence_base:
    if word.dep_ in ('dobj', 'iobj'):
        new_words = random.choice(objects)
        SOV.append(new_words)
    if word.dep_ in ('nsubj', 'nsubjpass'):
        new_words = random.choice(subjects)
        SOV.append(new_words)  
    else:
        SOV.append(word.text)
        SOV.append(word.whitespace_)
        
print(sentence_base)
print()
print(''.join(SOV))

That code is not quite working either so I am giving up on that and, this time, trying by using word vectors.

In [None]:
def vec(s):
    return nlp.vocab[s].vector

!curl -L -O https://raw.githubusercontent.com/aparrish/wordfreq-en-25000/main/wordfreq-en-25000-log.json
    
import json
prob_lookup = dict(json.load(open("./wordfreq-en-25000-log.json")))

In [314]:
from simpleneighbors import SimpleNeighbors

lookup = SimpleNeighbors(300)
for word in prob_lookup.keys():
    if nlp.vocab[word].has_vector:
        lookup.add_one(word, vec(word))
lookup.build()

Printing 5 times (first being the original) with each one becoming more complicated.

In [457]:
sentence_base = random.choice(sentences)
sentence_synonyms = []
distance = 3;

print("Original:")
print(sentence_base)
print()
print("Variations:")

for repeat in range(5):
    for word in sentence_base:
        if word.is_alpha and word.pos_ in ('NOUN', 'ADJ'):
            new_word = random.choice(lookup.nearest(word.vector, distance))
            sentence_synonyms.append(new_word)  
        else:
            sentence_synonyms.append(word.text)
        sentence_synonyms.append(word.whitespace_)
    sentence_updated = ''.join(sentence_synonyms) 
    print(textwrap.fill(sentence_updated) + "\n\n")
    sentence_synonyms = []
    distance *= 5

Original:
Yes, nonsense, that is not just chaos, that is not just blathering balderdash, but that has in it rhythm, fascinating complexity, a kind of artistry.

Variations:
Yes, nonsense, that is not just chaos, that is not just blathering
beard, but that has in it rhythm, fascinating complexity, a really of
contemporary.


Yes, rhetoric, that is not just woes, that is not just blathering
finch, but that has in it motif, persuasive turbulence, a strangely of
theatrical.


Yes, foolish, that is not just tragedies, that is not just blathering
footed, but that has in it fling, enchanting preferential, a terrible
of figurative.


Yes, overtly, that is not just elves, that is not just blathering
turtle, but that has in it knuckles, sighting transformations, a
debatable of vernacular.


Yes, perspective, that is not just miracles, that is not just
blathering blond, but that has in it axis, aspiring misunderstand, a
menace of international.




**Original:**
>
>_Significant nonsense?_
>
**Variations:**
>
>_insignificant nonsense?_
>
>_significance despise?_
>
>_suggestive honesty?_
>
>_substantially kind?_
>
>_dissemination skeptical?_