# Process Synonyms

This notebook uses a combination of Python data science libraries and the Google Natural Language API (machine learning) to expand the vocabulary of the chatbot by generating synonyms for topics created in the previous notebook.

In [1]:
!pip uninstall -y google-cloud-datastore

Uninstalling google-cloud-datastore-1.13.0:
  Successfully uninstalled google-cloud-datastore-1.13.0


In [2]:
!pip install google-cloud-datastore

Collecting google-cloud-datastore
  Using cached https://files.pythonhosted.org/packages/8c/11/507b62a1b273e8a4c40dc37194081094c2c4c5fd5bc19d80476ad5a9dd47/google_cloud_datastore-1.13.0-py2.py3-none-any.whl
Installing collected packages: google-cloud-datastore
Successfully installed google-cloud-datastore-1.13.0


In [3]:
!pip install inflect



Hit Reset Session > Restart, then resume with the following cells. 

In [2]:
# Only need to do this once...
import nltk
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package stopwords to /content/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /content/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [1]:
from nltk.corpus import stopwords
stop = set(stopwords.words('english'))

In [2]:
from google.cloud import datastore

In [3]:
datastore_client = datastore.Client()

In [4]:
client = datastore.Client()
query = client.query(kind='prenatal care')
results = list(query.fetch())

In [5]:
import inflect
plurals = inflect.engine()

## Extract Synonyms with Python
Split the topic into words and use PyDictionary to look up synonyms in a "thesaurus" for each word.  Store these in Datastore and link them back to the topic.  Note this section uses the concept of "stop words" to filter out articles and other parts of speech that don't contribute to meaning of the topic.

In [9]:
from nltk.corpus import wordnet
from sets import Set

for result in results:
  for word in result.kind.split():
    
    if word in stop:
        continue

    
    synonyms = Set()
    for syn in wordnet.synsets(word):
      
      if ".n." in str(syn):

        for l in syn.lemmas():
          lemma = l.name()
          if (lemma.isalpha()):
            synonyms.add(lemma)
            synonyms.add(plurals.plural(lemma))
      
      if ".a." in str(syn):
        synonyms = Set()
        break

    print result.key.name, word, synonyms
    
    kind = 'Synonym'
    synonym_key = datastore_client.key(kind, result.kind)

    synonym = datastore.Entity(key=synonym_key)
    synonym['synonym'] = result.kind

    datastore_client.put(synonym)
    
    synonym_key = datastore_client.key(kind, word)

    synonym = datastore.Entity(key=synonym_key)
    synonym['synonym'] = result.kind

    datastore_client.put(synonym)
    
    for dictionary_synonym in synonyms:
      
      synonym_key = datastore_client.key(kind, dictionary_synonym)

      synonym = datastore.Entity(key=synonym_key)
      synonym['synonym'] = result.kind

      datastore_client.put(synonym)
      
    synonym_key = datastore_client.key(kind, plurals.plural(word))

    synonym = datastore.Entity(key=synonym_key)
    synonym['synonym'] = result.kind

    datastore_client.put(synonym)
    

None prenatal Set([])
None care Set([u'cautions', u'tutelages', u'maintenances', u'caution', u'upkeep', u'aids', u'precautions', u'tutelage', u'concern', u'tending', u'cares', u'fears', u'charge', u'concerns', u'maintenance', u'attentions', u'upkeeps', u'attention', u'forethought', u'tendings', u'care', u'guardianships', u'charges', u'forethoughts', u'fear', u'aid', u'precaution', u'guardianship'])
