# NaNoGenMo assignment

## Random

Add the **Random** library to this Jupyter notebook:

In [None]:
import random

## spaCy (& displacy)

If not yet installed, install **[spaCy](https://spacy.io/usage)** using **Anaconda**.

In [None]:
# uncomment to install
# !conda install -c conda-forge spacy

[Download](https://spacy.io/usage/models#download) the small general-purpose model:

In [None]:
# uncomment to install
# !python -m spacy download en_core_web_sm

Add **spaCy** to this Jupyter notebook.

In [None]:
import spacy

Add **displaCy** to this Jupyter notebook.

In [None]:
from spacy import displacy

Create a new **spaCy** small model (`en_core_web_sm`) object:

In [None]:
nlp = spacy.load('en_core_web_sm')

## The Jungle Book

The Jungle Book as a string:

In [None]:
text_kip_jungle = open('00236.txt').read()
len(text_kip_jungle)

The Jungle Book as natural language processed object:

In [None]:
doc_kip_jungle = nlp(open("00236.txt").read())

The Jungle Book parsed into a list of sentences, print how many:

In [None]:
kip_jungle_sentences = list(doc_kip_jungle.sents)
len(kip_jungle_sentences)

Print out 5 random sentences from the list:

In [None]:
random.sample(kip_jungle_sentences, 5)

### The Jungle Book » Noun Chunks, Subjects, and Direct Objects

The Jungle Book parsed into a list of noun chunks

In [None]:
kip_jungle_noun_chunks = list(doc_kip_jungle.noun_chunks)
random.sample(kip_jungle_noun_chunks, 5)

Noun chunks parsed into subjects and direct objects

In [None]:
kip_jungle_subjects = [chunk for chunk in kip_jungle_noun_chunks if chunk.root.dep_ == 'nsubj']
kip_jungle_objects = [chunk for chunk in kip_jungle_noun_chunks if chunk.root.dep_ == 'dobj']

In [None]:
# random.sample(kip_jungle_noun_chunks, 5)
# random.sample(kip_jungle_subjects, 5)
random.sample(kip_jungle_objects, 5)

### The Jungle Book » Parts of Speech

The Jungle Book tagged with various parts of speech

In [None]:
kip_jungle_sentences = list(doc_kip_jungle.sents)
len(kip_jungle_sentences)

In [None]:
kip_jungle_nums_card = [w for w in doc_kip_jungle if w.tag_ == "CD"] # e.g. one, two, three

kip_jungle_adjs = [w for w in doc_kip_jungle if w.tag_ == "JJ"]
kip_jungle_adjs_comparative = [w for w in doc_kip_jungle if w.tag_ == "JJR"] # e.g. bigger, better, stonger, worse
kip_jungle_adjs_superlative = [w for w in doc_kip_jungle if w.tag_ == "JJS"] # e.g. biggest, best, strongest, worst

kip_jungle_verb_modal = [w for w in doc_kip_jungle if w.tag_ == "MD"] # fanboy

kip_jungle_nouns = [w for w in doc_kip_jungle if w.tag_ == "NN"]
kip_jungle_nouns_prop_sing = [w for w in doc_kip_jungle if w.tag_ == "NNP"]
kip_jungle_nouns_prop_plur = [w for w in doc_kip_jungle if w.tag_ == "NNPS"]

kip_jungle_pronouns_personal = [w for w in doc_kip_jungle if w.tag_ == "PRP"] # e.g. he, she, I, they
kip_jungle_pronouns_posses = [w for w in doc_kip_jungle if w.tag_ == "PRP$"] # his, hers, my, their

kip_jungle_adverbs = [w for w in doc_kip_jungle if w.tag_ == "RB"]
kip_jungle_adverbs_comparative = [w for w in doc_kip_jungle if w.tag_ == "RBR"]
kip_jungle_adverbs_superlative = [w for w in doc_kip_jungle if w.tag_ == "RBS"]
kip_jungle_adverbs_paticiple = [w for w in doc_kip_jungle if w.tag_ == "RP"]

kip_jungle_verbs = [w for w in doc_kip_jungle if w.tag_ == "VB"] # fall
kip_jungle_verbs_past_tense = [w for w in doc_kip_jungle if w.tag_ == 'VBD'] # fell
kip_jungle_verbs_pres_participle = [w for w in doc_kip_jungle if w.tag_ == "VBG"] # verb-ing
kip_jungle_verbs_past_participle = [w for w in doc_kip_jungle if w.tag_ == "VBN"] # verb-ed
kip_jungle_verbs_pres_singular = [w for w in doc_kip_jungle if w.tag_ == "VBP"]
kip_jungle_verbs_pres_singular_3rd = [w for w in doc_kip_jungle if w.tag_ == "VBZ"]

kip_jungle_numbers = [w for w in doc_kip_jungle if w.pos_ == "NUM"]

In [None]:
for i in range(10):
    print(random.choice(kip_jungle_verbs_pres_singular_3rd))

### The Jungle Book Named Entity Recognition

In [None]:
for item in doc_kip_jungle.ents:
    print(item.text, item.label_)

In [None]:
kip_jungle_ents = list(doc_kip_jungle.ents)
random.sample(kip_jungle_ents, 5)

Parsing for entity types in The Jungle Book:

In [None]:
kip_jungle_people = [e for e in kip_jungle_ents if e.label_ == "PERSON"]
kip_jungle_locations = [e for e in kip_jungle_ents if e.label_ == "LOC"]
kip_jungle_products = [e for e in kip_jungle_ents if e.label_ == "PRODUCT"]
# kip_jungle_events = [e for e in kip_jungle_ents if e.label_ == "EVENT"] # not much to work with in Kipling
kip_jungle_times = [e for e in kip_jungle_ents if e.label_ == "TIME"]
kip_jungle_nums_cardinal = [e for e in kip_jungle_ents if e.label_ == "CARDINAL"] # one, two, three
kip_jungle_nums_ordinal = [e for e in kip_jungle_ents if e.label_ == "ORDINAL"] # first, second, third

Print 5 random people from the Jungle Book:

In [None]:
# random.sample(kip_jungle_people, 5)
# random.sample(kip_jungle_locations, 5)
# random.sample(kip_jungle_products, 5)
# random.sample(kip_jungle_events, 5)
random.sample(kip_jungle_times, 5)
# random.sample(kip_jungle_nums_cardinal, 5)
# random.sample(kip_jungle_nums_ordinal, 5)

## Tracery

If not yet installed, install Allison Parrish's **Tracery** library:

In [None]:
# uncomment to install
# !pip install tracery

Add **Tracery** to this Jupyter notebook:

In [None]:
import tracery
from tracery.modifiers import base_english

Define a **Tracery** grammer that uses the lists created with spaCy:

In [None]:
# rules = {
#     "subject": [w.text for w in kip_jungle_subjects],
#     "object": [w.text for w in kip_jungle_objects],
#     "verb": [w.text for w in kip_jungle_verbs_past_tense],
#     "adj": [w.text for w in kip_jungle_adjs],
#     "people": [w.text for w in kip_jungle_people],
#     "loc": [w.text for w in kip_jungle_locations],
#     "time": [w.text for w in kip_jungle_times],
#     "origin": "#scene#\n\n[charA:#subject#][charB:#subject#][prop:#object#]#sentences#",
#     "scene": "SCENE: #loc#, #time.lowercase#",
#     "sentences": [
#         "#sentence#\n#sentence#",
#         "#sentence#\n#sentence#\n#sentence#",
#         "#sentence#\n#sentence#\n#sentence#\n#sentence#"
#     ],
#     "sentence": [
#         "#charA.capitalize# #verb# #prop#.",
#         "#charB.capitalize# #verb# #prop#.",
#         "#prop.capitalize# became #adj#.",
#         "#charA.capitalize# and #charB# greeted each other.",
#         "'Did you hear about #object.lowercase#?' said #charA#.",
#         "'#subject.capitalize# is #adj#,' said #charB#.",
#         "#charA.capitalize# and #charB# #verb# #object#.",
#         "#charA.capitalize# and #charB# looked at each other.",
#         "#sentence#\n#sentence#"
#     ]
# }

rules = {
    "chunks": [w.text for w in kip_jungle_noun_chunks],
    "subject": [w.text for w in kip_jungle_subjects],
    "object": [w.text for w in kip_jungle_objects],
    "adj": [w.text for w in kip_jungle_adjs],
    "comparative": [w.text for w in kip_jungle_adjs_comparative], # e.g. bigger, better, stonger, worse
    "superlative": [w.text for w in kip_jungle_adjs_superlative], # e.g. biggest, best, strongest, worst
    
    "modal": [w.text for w in kip_jungle_verb_modal], # e.g., can, could, may, might, must, will, would, shall, should, ought to, had better
    
    "noun": [w.text for w in kip_jungle_nouns],
    "noun_prop_sing": [w.text for w in kip_jungle_nouns_prop_sing],
    "noun_noun_prop_plur": [w.text for w in kip_jungle_nouns_prop_plur],
    
    "verb": [w.text for w in kip_jungle_verbs],
    "verb_past": [w.text for w in kip_jungle_verbs_past_tense],
    "verb_past_part": [w.text for w in kip_jungle_verbs_past_participle],
    "verb_pres_part": [w.text for w in kip_jungle_verbs_pres_participle],

    "people": [w.text for w in kip_jungle_people],
    "location": [w.text for w in kip_jungle_locations],
    "time": [w.text for w in kip_jungle_times],
    "cardinal": [w.text for w in kip_jungle_nums_cardinal],
    "ordinal": [w.text for w in kip_jungle_nums_ordinal],
    "origin": "There are #cardinal# principle characters in this story, #cardinal# of whom are #adj#. The #ordinal# is something of #noun.a#, #verb_past_part# by #cardinal# of the #cardinal#, and #verb_pres_part# the others. 'The tale of child raised among #noun.s#,' #people# has suggested. This is the story the #superlative# of which were #verb_past_part# as #subject# sang the #verb_pres_part# during #time#, listened beside #location# to his own words traverse the space between, trying to map the final-saying. Yet another spin on loss of innocence, this is the narrative about #noun.s# #verb_pres_part# in the #location#. It is the story of the #noun#-child, raised by #subject#, waiting to shed something which was never in his possession to lose, who finds a box in which to write his child's unconscious partial-knowing: many compartments, each separated in space and time by the un-seen/scene.",
    "scene": "SCENE: #loc#, #time.lowercase#",
    "sentences": [
        "#sentence#\n#sentence#",
        "#sentence#\n#sentence#\n#sentence#",
        "#sentence#\n#sentence#\n#sentence#\n#sentence#"
    ],
    "sentence": [
        "#charA.capitalize# #verb# #prop#.",
        "#charB.capitalize# #verb# #prop#.",
        "#prop.capitalize# became #adj#.",
        "#charA.capitalize# and #charB# greeted each other.",
        "'Did you hear about #object.lowercase#?' said #charA#.",
        "'#subject.capitalize# is #adj#,' said #charB#.",
        "#charA.capitalize# and #charB# #verb# #object#.",
        "#charA.capitalize# and #charB# looked at each other.",
        "#sentence#\n#sentence#"
    ]
}

Assign the Tracery rules to a variable:

In [None]:
grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)

Loop through the rules to generate a text:

In [None]:
for i in range(3):
    print(grammar.flatten("#origin#"))
    print()

## Write to file

In [None]:
def save_nanogenmo(filename, counter, limit=1000):
    with open(filename, "w") as outfile:
        outfile.write("key\tvalue\n")
        for item, count in counter.most_common(limit):
            outfile.write(item.strip() + "\t" + str(count) + "\n")