# Utterance generation prototype notebook
Author: Matthew Stachyra <br>
Date: 16 June 2022 <br>
Version: 0.1 - prototyping 3 approaches

## *Approach 1:* replacement with similar words
Note: This can be used to generate very similar sentences with similar structure. They may also be used for the machine learning in approaches 2 and 3 below.

### Subproblems
1. generate similar words
2. identify which words to replace in an utterance
3. replace words in utterance one at a time and generate new set

In [165]:
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')
from nltk.corpus import wordnet as wn
import itertools
import spacy
nlp = spacy.load("en_core_web_sm")

[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/matthewstachyra/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/matthewstachyra/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


#### 1. generate similar words

##### using `nltk.corpus.wordnet.synsets` and `spacy` part of speech tagging
nltk doc: https://www.nltk.org/howto/wordnet.html <br>
spacy doc: https://spacy.io/usage/linguistic-features#pos-tagging <br>
pos tags used in spacy: https://universaldependencies.org/u/pos/ <br>

In [146]:
posmap = {'VERB':'v', 'NOUN':'n', 'PRON':'n', 'PROPN':'n', 'ADJ':'a', 'ADV':'r'}

def get_pos(word, utterance):
    '''return the part of speech of the word in the utterance if it is a verb
    noun, pronoun, proper noun, adjective, or adverb.
    '''
    pos = next(filter(lambda x : str(x[0])==word,
                      [(word, word.pos_)
                       for word in nlp(utterance)]))[1]
    if pos in posmap: 
        return posmap[pos]
    else: 
        return None
        

def get_synonyms(word, pos):
    '''return synonyms given a word if its part of speech is a verb, noun, adverb,
    or adjective.
    '''
    if pos not in ['v', 'n', 'r', 'a']: return # nothing if not a verb, noun, adjective, or adverb
    return set(
            list(
                itertools.chain(
                    [synonym
                     for synset in wn.synsets(word, pos=pos)
                     for synonym in synset.lemma_names()])))

In [148]:
generate_synonyms("dog", "v")

{'chase',
 'chase_after',
 'dog',
 'give_chase',
 'go_after',
 'tag',
 'tail',
 'track',
 'trail'}

In [168]:
sometext = "This is a test sentence."
print(get_synonyms("test", get_pos("test", sometext)))

NOUN
yes
{'mental_test', 'mental_testing', 'tryout', 'test', 'run', 'examination', 'trial_run', 'psychometric_test', 'trial', 'exam'}


#### 2. select which words to replace 
Note: only replacing nouns, verbs, pronouns, proper nouns, adjectives, and adverbs <br>
spacy doc: https://spacy.io/usage/linguistic-features#pos-tagging

##### using `spacy`

In [169]:
for word in nlp(sometext):
    print(word.pos_)

PRON
AUX
DET
NOUN
NOUN
PUNCT


In [174]:
get_pos("test", sometext)

NOUN
yes


'n'

In [182]:
def map_synonyms(utterance):
    '''return map of words to synonyms for words that will be replaced in
    the utterance, i.e., those words that are nouns, verbs, pronouns, proper
    nouns, adjectives, and adverbs.
    '''
#     wordpos = [(word, get_pos(word, utterance))
#                for word in utterance.split()]
    
#     print(wordpos)
    for word in utterance.split():
        pos = get_pos(word, utterance)
        print(word, pos)
    
#     for tup in wordpos:
#         if tup[1] in posmap:
#             s = get_synonyms(tup[0], tup[1])
#     return {tup[1] : get_synonyms(tup[0], tup[1]) 
#             for tup in wordpos
#             if tup[1] in posmap}

In [183]:
map_synonyms(sometext)

PRON
yes
This n
AUX
is None
DET
a None
NOUN
yes
test n


StopIteration: 

## *Approach 2:* text generation of similar sentences using GANs

## *Approach 3:* text generation of fixed length, similar sentences using BERT