## Verb's usage-based meaning using Naive Discrimination Learning library `pyndl`

We will again use `pandas` and the special library that handles discrimination learning `pyndl`. We will also need `random` library to shuffle the learning events, and `gzip` to pack the input ready file.

In [23]:
import pyndl
import random
import gzip
import pandas as pd

Since we have a frequency table, let's make two convenience functions:

- to expand frequency table into a randomized event file $\rightarrow$ `expand_to_events()`
- to convert `*.csv` event file into a tab-separated `*.gz` file, which is what `pyndl` expects as input file

In [24]:
def expand_to_events(_infile, _outfile):
    lst = list()
    with open(_infile, 'r') as infile:
        with open(_outfile, 'w') as outfile:
            infile.readline()
            outfile.write('Cues\tOutcomes\n')
            for line in infile:
                line = line.strip().split(',')
                row = line[0] + '\t' + line[1]
                how_many = int(line[2])
                for i in range(how_many):
                    lst.append(row)
            lst = random.sample(lst, len(lst))
            for element in lst:
                outfile.write('%s\n' % element)

In [25]:
def convert_csv_to_ndl_events(_infile, _outfile):
    with open(_infile, 'r') as infile:
        with gzip.open(_outfile, 'wt') as outfile:
            for event in infile:
                event = event.replace(',', '\t')
                outfile.write(event)

### Running the main part:

In [26]:
expand_to_events('semantic_example_frequencies.csv', 'semantic_example_events.tsv')

In [27]:
convert_csv_to_ndl_events('semantic_example_events.tsv', 'semantic_example.gz')

In [29]:
weights = pyndl.ndl.ndl(events='semantic_example.gz',
                  alpha=0.1,
                  betas=(0.1, 0.1),
                  method='openmp')

AttributeError: module 'pyndl' has no attribute 'ndl'

In [11]:
weights.values.T

array([[ 0.30343145,  0.2068178 ],
       [ 0.01703928, -0.00202258],
       [ 0.09092098, -0.01407596],
       [ 0.12485467,  0.04822739],
       [ 0.0178995 ,  0.06379365],
       [ 0.03832665,  0.11479813],
       [ 0.01439036, -0.00390283]])

In [12]:
list(weights.cues.values)

['commence', 'journey', 'machine', 'movie', 'day', 'meeting', 'business']

In [13]:
list(weights.outcomes.values)

['START', 'BEGIN']