<a href="https://colab.research.google.com/github/spatank/InteractiveFictionCIS700/blob/master/NLP_for_Text_Adventure_Games_part_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NLP for Text Adventure Games - part 1

In this notebook, we start trying to improve the coverage of the parser in a text adventure game like Action Castle.  The parser is responsible for interpreting the players' input.  A limitation of classic text adventure games was that parsers were able to handle only a limited number of keywords, and did not support the wide range of different ways that it is possible to say a command.  This is one of the key components that makes natural language challenging for computers to handle.

In part 1, we will introduce you to the WordNet resource.  WordNet is a classic resource for natural language processing.  It was created at Princeton University by Christine Fellbaum and George Miller.  It encodes information about synonyms, antonyms, and is-a relationships between words like _troll_ is-a _monster_.  In NLP, is-a relationships are called hypernyms or hyponyms.

# WordNet 
[WordNet](https://wordnet.princeton.edu) is a lexical knowledge base that encodes a ton of useful information about how words relate to each other.  NLTK provides a Python API to WordNet.

In [1]:
#!sudo pip3 install nltk 
import nltk
nltk.download('wordnet')
nltk.download('punkt')
from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


## Word Senses
Words with multiple meanings are called _polysemous_ words.  An example of a polysemous word is the word _bug_ which can mean 
1. an insect
2. a virus or microbe that makes you sick
3. an error in your computer program
4. a covert listening device
5. (verb) to annoy/bother
6. (verb) to wiretap

WordNet oraganizes word senses into a structure called _synsets_. Each word can have multiple synsets, and each synset represents a different meaning of the word.

In [2]:
def get_senses(word):
  """Returns a list of word senses (WordNet synsets) for a word"""
  word_senses = wn.synsets(word)
  return word_senses

def get_definition(word_sense):
  return word_sense.definition()

def get_synonyms(word_sense):
  synonyms = []
  for lemma in word_sense.lemmas():
    synonym = lemma.name().replace('_', ' ')
    synonyms.append(synonym)
  return synonyms

#Here are the word senses for "bug". We can see what their distinct meanings are 
#by getting their definitions or their synonyms from WordNet.
word_senses = get_senses("bug")
for i, word_sense in enumerate(word_senses):
  print("\nSense %d: %s" % (i, word_sense.name()))
  print("Definition: ", get_definition(word_sense))
  print("Synonyms: ", get_synonyms(word_sense))


Sense 0: bug.n.01
Definition:  general term for any insect or similar creeping or crawling invertebrate
Synonyms:  ['bug']

Sense 1: bug.n.02
Definition:  a fault or defect in a computer program, system, or machine
Synonyms:  ['bug', 'glitch']

Sense 2: bug.n.03
Definition:  a small hidden microphone; for listening secretly
Synonyms:  ['bug']

Sense 3: hemipterous_insect.n.01
Definition:  insects with sucking mouthparts and forewings thickened and leathery at the base; usually show incomplete metamorphosis
Synonyms:  ['hemipterous insect', 'bug', 'hemipteran', 'hemipteron']

Sense 4: microbe.n.01
Definition:  a minute life form (especially a disease-causing bacterium); the term is not in technical use
Synonyms:  ['microbe', 'bug', 'germ']

Sense 5: tease.v.01
Definition:  annoy persistently
Synonyms:  ['tease', 'badger', 'pester', 'bug', 'beleaguer']

Sense 6: wiretap.v.01
Definition:  tap a telephone or telegraph wire to get information
Synonyms:  ['wiretap', 'tap', 'intercept', 'bug

## Hypernyms / Hyponyms

In addition to representing word senses, WordNet also organizes words hierachically. For example, _red_ is a specific kind of _color_, or _microbe_ is a kind of _organism_.  These are example of _hyponym_ relationships.  If X is-a Y then X is a hyponym of Y, and Y is a hypernym of X. So _red_ is a hyponym of _color_ and _color_ is a hypernym of _red_.

In WordNet, each word sense (synset) has its own distinct hypernyms and hyponyms. 

In [3]:
hyper = lambda s: s.hypernyms()
hypo = lambda s: s.hyponyms()

def get_hypernyms(word_sense, depth=5):
  return list(word_sense.closure(hyper, depth=depth))

def get_hyponyms(word_sense, depth=5):
  return list(word_sense.closure(hypo, depth=depth))

word_senses = get_senses("bug")
for i, word_sense in enumerate(word_senses):
  # The synset names include a word from the set of synonyms, 
  # plus a part of speech (n for noun, v for verb), and 
  # the number of the sense (sense 01 is the most common sense).
  print("\nSense %d: %s (%s)" % (i, word_sense.name(), get_definition(word_sense)))
  print("Hypernyms:")
  hypernyms = word_sense.hypernyms()
  while len(hypernyms) >0:
    print("%s\tis a\t%s" % (word_sense.name(), hypernyms[0].name()))
    word_sense = hypernyms[0]
    hypernyms = word_sense.hypernyms()


Sense 0: bug.n.01 (general term for any insect or similar creeping or crawling invertebrate)
Hypernyms:
bug.n.01	is a	insect.n.01
insect.n.01	is a	arthropod.n.01
arthropod.n.01	is a	invertebrate.n.01
invertebrate.n.01	is a	animal.n.01
animal.n.01	is a	organism.n.01
organism.n.01	is a	living_thing.n.01
living_thing.n.01	is a	whole.n.02
whole.n.02	is a	object.n.01
object.n.01	is a	physical_entity.n.01
physical_entity.n.01	is a	entity.n.01

Sense 1: bug.n.02 (a fault or defect in a computer program, system, or machine)
Hypernyms:
bug.n.02	is a	defect.n.03
defect.n.03	is a	imperfection.n.01
imperfection.n.01	is a	state.n.02
state.n.02	is a	attribute.n.02
attribute.n.02	is a	abstraction.n.06
abstraction.n.06	is a	entity.n.01

Sense 2: bug.n.03 (a small hidden microphone; for listening secretly)
Hypernyms:
bug.n.03	is a	microphone.n.01
microphone.n.01	is a	electro-acoustic_transducer.n.01
electro-acoustic_transducer.n.01	is a	transducer.n.01
transducer.n.01	is a	electrical_device.n.01
elect

## Text Adventure Commands
One of the tricky things about creating a text adventure game is anticipating the many different ways that a player might write a command.

If you program the game to understand a command like _give fish to troll_ and the player types in _feed fish to troll_, then a simple parser will fail to understand the command.

Here we're going to use WordNet to expand out the set of commands that we've programmed into the game, with the goal of being able to recognize more varied input from the player.  

Instead of just one _give fish to troll_ command, we'll enumerate thouands of alternatives like

* _serve salmon to monster_
* _serve up food to monster_
* _feed sea trout to troll_
* _supply smoked salmon to mythical creature_

In [0]:
commands = [
	'wear crown',
	'smell rose',
	'eat fish',
	'light lamp',
	'give fish to troll',
	'propose to the princess',
	'go north',
]


## Manually Annotate Senses and Hypernyms/Hyponyms

Below is some code that will help you manually annotate the word sense of each word in your list of commands, and confirm which hypernyms and hyponyms are reasonable substitutes that should be recognized if a player types them instead of our command word.

Here are some helper functions for you. You can just run this cell instead of reading through the functions in detail if you want. 

In [0]:
def annotate_synsets(sentences):
  """This function queries WordNet for each word in a list of sentences,
     and asks the user to input a number corresponding to the synset."""

  word_senses = {}
  # Cached selections maps from word string to the previous
  # selection for this word (an integer)
  cached_selections = {}

  for i, sent in enumerate(sentences):
    words = word_tokenize(sent.lower())

    for word in words:
      sysnsets = wn.synsets(word)
      if len(sysnsets) != 0:
        selection = select_synset(sent, word, sysnsets, cached_selections)
        if selection != None:
          cached_selections[word] = selection
          if selection < len(sysnsets):
            s = sysnsets[selection]
            word_senses[word] = s.name()
  return word_senses


def select_synset(sent, word, sysnsets, cached_selections):
  """Ask the user to select which sense of the word  
     is being used in this sentence."""
  print(sent)
  print(word.upper())

  prev_selection = -1
  if word in cached_selections:
    prev_selection = cached_selections[word]

  for choice, s in enumerate(sysnsets):
    if choice == prev_selection:
      print("*** ", end = '')
    print("%d) %s - %s" % (choice, s.name(), s.definition()))

  choice += 1
  if choice == prev_selection:
    print("*** ", end = '')
  print("%d None of these." % choice)

  selection = -1
  while selection == -1:
    try:
      user_input = input(">")
      if user_input.strip() == 'x':
        # The user can press 'x' to exit.
        return None
      if user_input.strip() == '' and prev_selection > -1:
        # The user can press retrun to confirm the previous selection.
        return prev_selection
      selection = int(user_input)
    except:
      selection = -1
    if selection < 0 or selection > len(sysnsets):
      print("Please select a number between 0-%d, or type 'x' to exit" % len(sysnsets))
      if prev_selection > -1:
        print("You can also press return to confirm the previous selection (marked by ***).")
    else:
      return selection


def confirm_hyponyms(word, sysnset, do_hypernyms_instead=False):
  """Ask the user to confirm which of the hyponyms are applicable 
     for this sentence."""
  print(word.upper())

  confirmed = []
  if do_hypernyms_instead:
    unconfirmed = sysnset.hypernyms()
  else:
    unconfirmed = sysnset.hyponyms()

  while len(unconfirmed) > 0:
    s = unconfirmed.pop(0)
    print("Is %s an appropriate substitute for %s? (y/n)" % (s.name(), word))
    print("It means:", s.definition())
    print("Synonyms are:", get_synonyms(s))
    user_input = ''
    while user_input == '':
      user_input = input(">")
      user_input = user_input.strip()
      if user_input == 'y' or user_input == 'yes':
        confirmed.append(s.name())
        if do_hypernyms_instead:
          unconfirmed.extend(s.hypernyms())
        else:
          unconfirmed.extend(s.hyponyms())
        
      elif user_input == 'n' or user_input == 'no':
        pass
      elif user_input == 'x':
        # The user can press 'x' to exit.
        return confirmed
      else:
        print("Please type 'yes' or 'no' or 'x' to stop confirming for this word")
        user_input = ''
  return confirmed

# Save your annotations to a file, so that you can submit them with your homework.
def save_to_drive(word_senses, confirmed_hyponyms, confirmed_hypernyms):
  import json
  from google.colab import drive
  drive.mount('/content/drive/')

  output_file = '/content/drive/My Drive/word-sense-annotations.json'
  output_json = {}
  output_json['senses'] = word_senses
  output_json['hyponyms'] = confirmed_hyponyms
  output_json['hypernyms'] = confirmed_hypernyms

  with open(output_file, 'w') as write_file:
    write_file.write(json.dumps(output_json, sort_keys=True, indent=4))
    write_file.write('\n')



Run this part when you're ready to start annotating the words in the commands. I estimate that it will take about 10 minutes per command.  Your annotations will be saved to a file in your Google Drive called _word-sense-annotations.json_, so that you can submit them with your homework.  You'll be prompted to enter a code to authorize Colab to write to your Google Drive. Be sure to do this so that your work will be saved.

In [6]:
word_senses = annotate_synsets(commands)
confirmed_hyponyms = {}
confirmed_hypernyms = {}
for word in word_senses:
  print("First, pick the word sense for the word '%s'" % word)
  print("==============")
  word_sense = wn.synset(word_senses[word])
  print("\nNext, pick which hypernyms of %s we should allow players to use." % word_sense.name())
  print("==============")
  confirmed_hypernyms[word] = confirm_hyponyms(word, word_sense, do_hypernyms_instead=True)
  print("\Finally, pick which hyponyms of %s we should allow players to use." % word_sense.name())
  print("==============")  
  confirmed_hyponyms[word] = confirm_hyponyms(word, word_sense)


print("You're done annotating!  Save your annotation to your Google drive.")
print("You need to paste in a confirmation code to allow Colab to have access.")
print("We'll create a file called 'word-sense-annotations.json' for you to turn in.")
print("==============")
save_to_drive(word_senses, confirmed_hyponyms, confirmed_hypernyms)



wear crown
WEAR
0) wear.n.01 - impairment resulting from long use
1) clothing.n.01 - a covering designed to be worn on a person's body
2) wear.n.03 - the act of having on your person as a covering or adornment
3) wear.v.01 - be dressed in
4) wear.v.02 - have on one's person
5) wear.v.03 - have in one's aspect; wear an expression of one's attitude or personality
6) wear.v.04 - deteriorate through use or stress
7) wear.v.05 - have or show an appearance of
8) wear.v.06 - last and be usable
9) break.v.42 - go to pieces
10) tire.v.02 - exhaust or get tired through overuse or great strain or stress
11) wear.v.09 - put clothing on one's body
12 None of these.
>2
wear crown
CROWN
0) crown.n.01 - the Crown (or the reigning monarch) as the symbol of the power and authority of a monarchy
1) crown.n.02 - the part of a tooth above the gum that is covered with enamel
2) crown.n.03 - a wreath or garland worn on the head to signify victory
3) crown.n.04 - an ornamental jeweled headdress signifying sov

## Look Over Your Annotations

Here's what your selections were, and what their corresponding synonyms are.

In [7]:
for word in word_senses:
  print('\n', word.upper())
  word_sense = wn.synset(word_senses[word])
  print('Synonyms:\t', get_synonyms(word_sense))
  print('Hypernyms:', )
  for hypernym in confirmed_hypernyms[word]:
    print('\t', get_synonyms(wn.synset(hypernym)))

  print('Hyponyms:', )
  hyponyms = confirmed_hyponyms[word]
  for hyponym in hyponyms:
    print('\t', get_synonyms(wn.synset(hyponym)))


 WEAR
Synonyms:	 ['wear', 'wearing']
Hypernyms:
Hyponyms:

 CROWN
Synonyms:	 ['crown']
Hypernyms:
Hyponyms:

 SMELL
Synonyms:	 ['smell']
Hypernyms:
	 ['perceive', 'comprehend']
Hyponyms:
	 ['get a noseful', 'get a whiff']
	 ['scent', 'nose', 'wind']
	 ['sniff', 'whiff']
	 ['snuff', 'snuffle']

 ROSE
Synonyms:	 ['rose', 'rosebush']
Hypernyms:
Hyponyms:
	 ['China rose', 'Bengal rose', 'Rosa chinensis']
	 ['damask rose', 'summer damask rose', 'Rosa damascena']
	 ['dog rose', 'Rosa canina']
	 ['mountain rose', 'Rosa pendulina']
	 ['multiflora', 'multiflora rose', 'Japanese rose', 'baby rose', 'Rosa multiflora']
	 ['musk rose', 'Rosa moschata']
	 ['sweetbrier', 'sweetbriar', 'brier', 'briar', 'eglantine', 'Rosa eglanteria']

 EAT
Synonyms:	 ['eat']
Hypernyms:
	 ['consume', 'ingest', 'take in', 'take', 'have']
	 ['eat']
	 ['consume', 'ingest', 'take in', 'take', 'have']
Hyponyms:
	 ['devour', 'down', 'consume', 'go through']
	 ['devour', 'guttle', 'raven', 'pig']
	 ['eat up', 'finish', 'pol

## Enumerate Alternatives Wordings of Commands

Once we know what the word sense is for each word in our command, and what its relevant hypernyms and hyponyms are, we can output a rich set of reasonably accurate paraphrases for the commands in our game.

Here we use the 

In [8]:
import itertools #We're using the product method from itertools

def get_alternatives(word, word_senses, confirmed_hypernyms, confirmed_hyponyms):
  """Create a list of good alternatives for a word by listing out the synonyms
    for its word sense, and for its hyponyms and hypernyms."""
  alternatives = []
  if not word in word_senses:
    alternatives.append(word)
    return alternatives
  word_sense = wn.synset(word_senses[word])
  alternatives.extend(get_synonyms(word_sense))
  for hypernym in confirmed_hypernyms[word]:
    alternatives.extend(get_synonyms(wn.synset(hypernym)))
  for hyponym in confirmed_hyponyms[word]:
    alternatives.extend(get_synonyms(wn.synset(hyponym)))
  return alternatives

def enumerate_alternatives(sentence, word_senses, confirmed_hypernyms, confirmed_hyponyms):
  """Enumerate all of the sentenes that can result by taking any combination of
     the alternates for each word in the sentence."""
  words = word_tokenize(sentence.lower())
  # a list of lists
  alternatives_per_word = []
  for word in words:
    alternatives = get_alternatives(word, word_senses, confirmed_hypernyms, confirmed_hyponyms)
    alternatives_per_word.append(alternatives)
  
  alternative_to_original = {}
  # all combinations of a list of lists
  for words in list(itertools.product(*alternatives_per_word)):
    alt_sent = " ".join(words)
    alternative_to_original[alt_sent] = sentence
  return alternative_to_original


# alternative_commands is a dictionary that maps 
# the new commands onto the original ones.
alternative_commands = {}
for command in commands:
  alternative_commands.update(enumerate_alternatives(command, 
                                                     word_senses, 
                                                     confirmed_hypernyms, 
                                                     confirmed_hyponyms))

for alt_sent in alternative_commands:
  print("%s ==> %s" % (alt_sent, alternative_commands[alt_sent]))
print("Congratulations you can now handle %d commands instead of just %d!" % 
      (len(alternative_commands.keys()), len(commands)))

wear crown ==> wear crown
wearing crown ==> wear crown
smell rose ==> smell rose
smell rosebush ==> smell rose
smell China rose ==> smell rose
smell Bengal rose ==> smell rose
smell Rosa chinensis ==> smell rose
smell damask rose ==> smell rose
smell summer damask rose ==> smell rose
smell Rosa damascena ==> smell rose
smell dog rose ==> smell rose
smell Rosa canina ==> smell rose
smell mountain rose ==> smell rose
smell Rosa pendulina ==> smell rose
smell multiflora ==> smell rose
smell multiflora rose ==> smell rose
smell Japanese rose ==> smell rose
smell baby rose ==> smell rose
smell Rosa multiflora ==> smell rose
smell musk rose ==> smell rose
smell Rosa moschata ==> smell rose
smell sweetbrier ==> smell rose
smell sweetbriar ==> smell rose
smell brier ==> smell rose
smell briar ==> smell rose
smell eglantine ==> smell rose
smell Rosa eglanteria ==> smell rose
perceive rose ==> smell rose
perceive rosebush ==> smell rose
perceive China rose ==> smell rose
perceive Bengal rose ==>