This notebook assumes Python version 3.

## Import several Python packages to check availability

In [6]:
import numpy
import matplotlib
import scipy
import bokeh
import pandas

In [None]:
# These were added explicitly to environment.yml for binder.
import autocorrect
import plotly
# import pyenchant

import nltk

## Download some data to use with NLTK

**Note:**  The nltk.download() function with no argument launches a dialog to choose data such as corpora, but you can also provide an argument to specify what to download. See: https://stackoverflow.com/questions/5843817/programmatically-install-nltk-corpora-models-i-e-without-the-gui-downloader

In [None]:
nltk.download('wordnet')
nltk.download('verbnet')

## Run some tests with WordNet from NLTK

In [7]:
from nltk.corpus import wordnet as wn

sets = wn.synsets('boil')
for s in sets:
    print( s.name() )
    # print( s.pos() )

print(' ')
sets = wn.synsets('jump')
for s in sets:
    print( s.name() )
    # print( s.pos() )

print(' ')
sets = wn.synsets('flow')
for s in sets:
    print( s.name() )
    # print( s.pos() )
    
print(' ')
sets = wn.synsets('gyrate')
for s in sets:
    print( s.name() )
    # print( s.pos() )

boil.n.01
boiling_point.n.01
boil.v.01
boil.v.02
boil.v.03
churn.v.02
seethe.v.02
 
jump.n.01
leap.n.02
jump.n.03
startle.n.01
jump.n.05
jump.n.06
jump.v.01
startle.v.02
jump.v.03
jump.v.04
leap_out.v.01
jump.v.06
rise.v.11
jump.v.08
derail.v.02
chute.v.01
jump.v.11
jumpstart.v.01
jump.v.13
leap.v.02
alternate.v.01
 
flow.n.01
flow.n.02
flow.n.03
flow.n.04
stream.n.04
stream.n.02
menstruation.n.01
flow.v.01
run.v.06
flow.v.03
flow.v.04
hang.v.05
flow.v.06
menstruate.v.01
 
gyrate.v.01
spin.v.01


## Run some tests with VerbNet from NLTK

In [8]:
from nltk.corpus import verbnet as vn
vn.lemmas()[0:25]

# help(vn)

[u'December',
 u'FedEx',
 u'UPS',
 u'abandon',
 u'abase',
 u'abash',
 u'abate',
 u'abbreviate',
 u'abduct',
 u'abet',
 u'abhor',
 u'abolish',
 u'abound',
 u'abrade',
 u'abridge',
 u'absolve',
 u'abstain',
 u'abstract',
 u'abuse',
 u'abut',
 u'accelerate',
 u'accept',
 u'acclaim',
 u'accompany',
 u'accrue']

## Experiment for generating nominalizations of verbs

GSN process names are nominalizations of verbs.  One idea for how to automatically generate a list of these process names from a list of verbs (e.g. from VerbNet) is to concatenate some of the 11 or so possible verb nominalization endings directly to the verb, then apply autocorrect and check if result is a noun.  Standard endings are:
**tion** (absorption, convection),
**sion** (conversion, dispersion),
**cion** (suspicion, coercion),
**ing** (swimming, upwelling),
**age** (drainage, seepage),
**y** (discovery, recovery),
**al** (arrival, retrieval),
**ance** (acceptance, attendance),
**ence** (existence, maintanence)
**ment** (alignment, improvement),
**ure** (failure, departure).

In [11]:
from autocorrect import spell

[spell('proposetion'), spell('proposeal'), spell('proposeence')]

['proposition', 'proposal', 'proposeence']

Notice that **spell** may return strings that aren't words with no change, like the last one.

In [12]:
[spell('distracttion'),spell('distractal'),spell('distractence')]

['distraction', u'distracted', 'distractence']