## Partitioner examples
### This is a jupyter notebook with a few vignettes that present some of the Python partitioner package's functionality.
Note: Cleaning of text and determination of clauses  occurs in the partitionText method.  Because of this, it is unwise to pass large, uncleaned pieces of text as 'clauses' directly through the .partition() method (regardless of the type of partition being taken), as this will simply tokenize the text by splitting on " ", producing many long, punctuation-filled phrases, and likely run very slow. As such, best practices only use .partition() for testing and exploring the tool on case-interested clauses.


In [1]:
from partitioner import partitioner
from partitioner.methods import *

### Process the English Wiktionary to generate the (default) partition probabilities.
Note: this step can take significant time for large dictionaries (~5 min).

In [2]:
## Vignette 1: Build informed partition data from a dictionary, 
##             and store to local collection
def preprocessENwiktionary():
    pa = partitioner(informed = True, dictionary = "./dictionaries/enwiktionary.txt")
    pa.dumpqs(qsname="enwiktionary")

In [3]:
preprocessENwiktionary()

### Perform a few one-off partitions.

In [4]:
## Vignette 2: An informed, one-off partition of a single clause
def informedOneOffPartition(clause = "How are you doing today?"):
    pa = oneoff()
    print pa.partition(clause)

In [5]:
informedOneOffPartition()
informedOneOffPartition("Fine, thanks a bunch for asking!")

['How are you doing', 'today?']
['Fine,', 'thanks a bunch', 'for', 'asking!']


### Solve for the informed stochastic expectation partition (given the informed partition probabilities).

In [6]:
## Vignette 3: An informed, stochastic expectation partition of a single clause
def informedStochasticPartition(clause = "How are you doing today?"):
    pa = stochastic()
    print pa.partition(clause)

In [7]:
informedStochasticPartition()

{'are you': 1.407092428930965e-09, 'How are you': 0.00025712526951610467, 'How': 5.472370457590498e-06, 'doing': 0.000257136448270894, 'you doing': 3.79920846523168e-05, 'How are': 3.800164835444141e-05, 'are': 2.0796023583003835e-10, 'are you doing': 5.47075540492574e-06, 'today?': 1.0, 'you': 9.771662360456963e-09, 'How are you doing': 0.999699400711672}


### Perform a pure random (uniform) one-off partition.

In [8]:
## Vignette 4: An uniform, one-off partition of a single clause
def uniformOneOffPartition(informed = False, clause = "How are you doing today?", qunif = 0.25):
    pa = oneoff(informed = informed, qunif = qunif)
    print pa.partition(clause)

In [15]:
uniformOneOffPartition()
uniformOneOffPartition(qunif = 0.75)

['How are', 'you doing today?']
['How', 'are', 'you doing today?']


### Solve for the uniform stochastic expectation partition (given the uniform partition probabilities).

In [16]:
## Vignette 5: An uniform, stochastic expectation partition of a single clause
def uniformStochasticPartition(informed = False, clause = "How are you doing today?", qunif = 0.25):
    pa = stochastic(informed = informed, qunif = qunif)
    print pa.partition(clause)

In [17]:
uniformStochasticPartition()
uniformStochasticPartition(clause = "Fine, thanks a bunch for asking!")

{'are you doing today?': 0.10546875000000001, 'How are you': 0.14062499999999997, 'How': 0.25, 'doing': 0.0625, 'How are': 0.1875, 'How are you doing today?': 0.31640625, 'doing today?': 0.1875, 'you doing': 0.046875, 'are you doing': 0.03515624999999999, 'are': 0.0625, 'you doing today?': 0.14062499999999997, 'today?': 0.25, 'are you': 0.046875, 'you': 0.0625, 'How are you doing': 0.10546875000000001}
{'a': 0.0625, 'Fine,': 0.25, 'thanks a': 0.046875, 'Fine, thanks a bunch for asking!': 0.23730468749999997, 'bunch for asking!': 0.14062499999999997, 'a bunch for': 0.03515624999999999, 'for': 0.0625, 'thanks a bunch for': 0.026367187499999993, 'Fine, thanks a bunch': 0.10546875000000001, 'Fine, thanks a bunch for': 0.0791015625, 'a bunch': 0.046875, 'Fine, thanks a': 0.14062499999999997, 'Fine, thanks': 0.1875, 'thanks': 0.0625, 'a bunch for asking!': 0.10546875000000001, 'asking!': 0.25, 'bunch for': 0.046875, 'for asking!': 0.1875, 'thanks a bunch for asking!': 0.0791015625, 'thanks a

### Build a rank-frequency distribution for a text and determine its Zipf/Simon (bag-of-phrase) $R^2$.

In [18]:
## Vignette 6: Use the default partitioning method to partition the main partitioner.py file and compute rsq
def testPartitionTextAndFit():
    pa = oneoff()
    pa.partitionText(textfile = pa.home+"/../README.md")
    pa.testFit()
    print "R-squared: ",round(pa.rsq,2)
    print
    phrases = sorted(pa.counts, key = lambda x: pa.counts[x], reverse = True)
    for j in range(25):
        phrase = phrases[j]
        print phrase, pa.counts[phrase]

In [19]:
testPartitionTextAndFit()

R-squared:  0.11

project 7.0
 5.0
the 5.0
code 4.0
to 4.0
and 4.0
of the 3.0
API 3.0
should 2.0
docs 2.0
A short 2.0
This 2.0
etc 2.0
your 2.0
size 2.0
reference 2.0
can 2.0
examples 2.0
is 2.0
how 2.0
added 2.0
description 2.0
important 2.0
Make sure 1.0
show 1.0
