# Madlib Generator: A Proof of Concept and Testground
The idea here is it use Spacy's POS tagger and syntax tree parsing to automatically turn text into madlibs.  
For example you could take the chorus of your favorite song and make a madlib out of it to send to play with your friends and get some quirkly twists on the lyrics.

## References and Resources
- https://en.wikipedia.org/wiki/Mad_Libs
- https://spacy.io/usage/linguistic-features#pos-tagging
- https://spacy.io/api/token
- https://spacy.io/usage/visualizers#jupyter
- Dependency parsing
   - https://web.archive.org/web/20170809024928/http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf
   - https://stackoverflow.com/questions/40288323/what-do-spacys-part-of-speech-and-dependency-tags-mean
- https://universaldependencies.org/u/pos/
- Addition Spacy Usage
    - https://stackoverflow.com/questions/55453864/mapping-spacy-int-attributes-to-string-unicode-attributes
    - https://stackoverflow.com/questions/62785916/spacy-replace-token
    - https://pypi.org/project/pyinflect/
    - https://machinelearningknowledge.ai/tutorial-on-spacy-part-of-speech-pos-tagging/
    - https://web.archive.org/web/20170809024928/http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf

## The Code

### Setup

In [None]:
import spacy
import pyinflect
from spacy import displacy
nlp = spacy.load("en_core_web_sm")

In [None]:
def is_madlib(token):
    if (token.pos_ == 'PROPN'):
        return 0.9
    if (token.pos_ == 'NUM'):
        return 0.95
    if (token.pos_ == 'ADV'):
        return 0.9
    if (token.pos_ == 'ADJ'):
        return 0.9
    if (token.dep_ == 'amod'):
        return 0.7
    if (token.pos_ == 'NOUN' and token.dep_ == 'dobj'):
        return 0.4
    if (token.pos_ == 'NOUN' and token.dep_ == 'pobj'):
        return 0.5
    if (token.pos_ == 'NOUN' and token.dep_ == 'nsubj'):
        return 0.7
    if (token.pos_ == 'NOUN' and token.dep_ == 'compound'):
        return 0.1
    if (token.pos_ == 'VERB' and not (token.dep_ == 'ccomp' or any([sub_token.dep_ == 'ccomp' for sub_token in token.subtree]))):
        return 0.7
    return 0.0

### Data Loading

In [None]:
text = """Once upon a midnight dreary, as I pondered weak and weary,
Over many a quaint and curious volume of forgotten lore
While I nodded, nearly napping, suddenly there came a tapping,
As of someone gently rapping, rapping at my chamber door."""
text = """I am an invisible man. 
No, I am not a spook like those who haunted Edgar Allan Poe; nor am I one of your Hollywood-movie ectoplasms. 
I am a man of substance, of flesh and bone, fiber and liquids—and I might even be said to possess a mind. 
I am invisible, understand, simply because people refuse to see me. 
Like the bodiless heads you see sometimes in circus sideshows, it is as though I have been surrounded by mirrors of hard, distorting glass. 
When they approach me they see only my surroundings, themselves, or figments of their imagination—indeed, everything and anything except me."""

text = [line.strip() for line in text.splitlines()]
text = ' '.join([line+'.' if line[-1] not in ('.',',') else line for line in text])
text

In [None]:
doc = nlp(text)
doc

### Data Exploration

In [None]:
displacy.render(doc, style="dep")

In [None]:
import pandas as pd
df = pd.DataFrame([
    {
        'text': token.text,
        'lemma': token.lemma_,
        'POS': token.pos_,
        'tag': token.tag_,
        'dep': token.dep_,
        'shape': token.shape_,
        'is_alpha': token.is_alpha,
        'is_stop': token.is_stop,
        'madlib_prob': is_madlib(token)
    }
    for token in doc
])
df

### Data Processing

In [None]:
df.drop_duplicates(subset=['lemma','POS'], inplace=True)
df

In [None]:
import numpy as np
df = df[df['madlib_prob'] > np.random.rand(len(df))]
df.reset_index(inplace=True)
df

### UX and Input Required
The result would say:

In [None]:
df['POS']

User would input:

In [None]:
replacements = [
    "flat",
    "toe",
    "curly",
    "tripped",
    "stretched",
    "lazily",
    "explosively",
    "vomited",
    "wildly",
    "slapped"
]
len(replacements)

### Final Processing and Output

In [None]:
replacement_dict = {
    (df.loc[i,'lemma'], df.loc[i,'POS']): word
    for i,word in enumerate(replacements)
}
replacement_dict

In [None]:
result = ''
for token in doc:
    lemma, pos, tag = token.lemma_, token.pos_, token.tag_
    if (lemma, pos) in replacement_dict:
        base_replacement = replacement_dict[(lemma, pos)]
        #print(base_replacement, tag)
        if pos in ('PROPN', 'NUM'): 
            new_token = base_replacement
        else:
            new_token = nlp(base_replacement)[0]._.inflect(tag)
            if not new_token:
                new_token = base_replacement
            if token.is_title:
                new_token = new_token.title()
        result += new_token
    else:
        result += str(token)
    result += token.whitespace_
print(result)