# Grammar Experiments - #2 of E

Given a previously created file of sentences, this notebook outlines how to output sentence fragments in the form  of subject-verb-object.

Now that a carrel has been modeled into a list of sentences, we can exploit the model to answer different questions. For example, "What sentences take the form of subject-verb-object, and what are the resulting subjects, verbs, and objects?" If the result of this question takes the form of tab-delimited list of values, then it is easy to group, sort, and filter the list discover patterns. Sometimes the results can be quite interesting.

In [None]:
# configure
CARREL  = 'homer'


In [None]:
# sub-configure; you probably don't want to change these
COLUMNS = [ 'subject', 'verb', 'object' ]
SVO     = '''
  NOUNPHRASE: {<DT>?<JJ.*>*<NN.?>+}
   PREDICATE: {<VB.*>?}
     GRAMMAR: {<NOUNPHRASE><PREDICATE><NOUNPHRASE>}
'''


In [None]:
# require
import rdr
import multiprocessing
import pandas as pd
import nltk


In [None]:
# do the work
if __name__ == '__main__' : 

    # initialize
    parser       = nltk.RegexpParser( SVO )
    localLibrary = rdr.configuration( 'localLibrary' )
    sentences    = localLibrary/CARREL/( rdr.ETC )/( rdr.SENTENCES )

    # get and parallel process each sentence
    sentences = rdr.Sentences( sentences )
    pool      = multiprocessing.Pool()
    results   = pool.starmap( rdr.matchSVO, [ [ sentence, parser ] for sentence in sentences ] )
    pool.close()

    # process each result; create a list of lists containing the results
    svo = []
    for result in results :

        # update, conditionally
        if result : svo.append( result )


Once you get this far, all of the subject-verb-object values ought to be saved in a list of lists called `svo`. We can then simply dump them to screen...

In [None]:
# merely dump the result to the screen
print( svo )

Alternatively, we can stuff the subject-verb-object values into a Pandas dataframe. The printed result is more readable.

In [None]:
# create a dataframe from the result, and ouput the result
dataframe = pd.DataFrame( svo, columns=COLUMNS )
print( dataframe )

Finally, we can exploit the dataframe for the purposes of filtering, grouping, sorting, etc...

In [None]:
# denote/update a values for column (subject, verb, or object) and filter, and then ouput only the matching lines
column = 'subject'
filter = 'ulysses'
print( dataframe[ dataframe[ column ] == filter ] )
