# Final Draft

This code does the following:

1. Import all books (top 100 of project gutenberg)
2. Gets top 365 nouns in those books
3. Saves to word map with all sentences containing those words
4. Use Markov chain to write about nouns using those word maps
5. Combines output with wikipedia exerpt
6. Save!

This code draws heavily on tutorials and examples from Allison Parrish (especially [just enough python](https://gist.github.com/aparrish/50803e0ae51a2c6e775af36ea79be285) and [corpus driven narrative generation](https://github.com/aparrish/corpus-driven-narrative-generation/blob/master/corpus-driven-narrative-generation.ipynb)). Thank you Allison!!

### Import all texts

In [1]:
# import necessary libraries
import re
from glob import glob

In [4]:
# open all texts
all_texts = ""
for fname in glob("books/*.txt"):
    all_texts += open(fname).read()

In [5]:
# check length
len(all_texts)

45071057

### Find most common nouns

In [6]:
#import libraries

#import spacy
import spacy
nlp = spacy.load('en_core_web_sm')

#import counter
import collections
from collections import Counter


In [7]:
#loop to split into chunks so can process with nlp (max chars is 1,000,000)
#this takes a while

all_objects = collections.Counter()
for i in range(2):
    
    #split inot chuncks
    chunk = all_texts[(i*1000000):((i+1)*1000000)]
    #print(len(chunk))
    
    #natural language processing
    doc = nlp(chunk)
   
    #get top nouns
    noun_counts = Counter([item.text for item in doc if item.pos_ == 'NOUN']) 
    
    #append to array
    all_objects = all_objects + noun_counts

In [8]:
#get the top 365 objects
top_365 = all_objects.most_common(365)

In [9]:
#make object array
objs = []
for i in range(365):
    objs.append(top_365[i][0])

In [10]:
#look at it
objs

['man',
 'time',
 'day',
 'way',
 'king',
 'one',
 'eyes',
 'night',
 'heart',
 'life',
 'hand',
 'men',
 'mother',
 'world',
 'head',
 'things',
 'father',
 'woman',
 'ones',
 'door',
 'children',
 'thing',
 'work',
 'house',
 'people',
 'water',
 'love',
 'wife',
 'face',
 'soul',
 'morning',
 'tree',
 'voice',
 'bed',
 'earth',
 'words',
 'son',
 'child',
 'bird',
 'hands',
 'mouth',
 'air',
 'light',
 'boy',
 'name',
 'evening',
 'place',
 'sea',
 'gold',
 'fire',
 'window',
 'sun',
 'word',
 'death',
 'street',
 'sir',
 'moment',
 'hair',
 'brother',
 'body',
 'boys',
 'ground',
 'girl',
 'home',
 'spirit',
 'virtue',
 'daughter',
 'forest',
 'eye',
 'friend',
 'side',
 'horse',
 'hour',
 'years',
 'end',
 'room',
 'others',
 'wind',
 'blood',
 'hath',
 'feet',
 'days',
 'fellow',
 'wood',
 'works',
 'hat',
 'money',
 'dog',
 'princess',
 'Chapter',
 'table',
 'truth',
 'happiness',
 'master',
 'youth',
 'castle',
 'pity',
 'lady',
 'mine',
 'will',
 'art',
 'stone',
 'foot',
 'ca

### Add to word map

In [11]:
#clean up sentences (take out line breaks)
#using all_texts
text_lines = all_texts.split(".")
sentences_clean = []
for line in text_lines:
    new_line = re.sub("\n", " ", line)
    new_line = new_line + ". "
    sentences_clean.append(new_line)

In [12]:
#check length
len(sentences_clean)

393307

In [13]:
#check sample output
sentences_clean[1000]

' ‘If the wild beasts in the forest had but devoured us, we should at any rate have died together. '

In [14]:
#word map objects - create "corpus" using sentences
#this also takes a little while
#ex:{'apple': ["The apple was...", "I like apples...",...], "wasp": [...], "sun": [...]}
word_map = {} 

#loop through all sentences
for line in sentences_clean:
    
    #for each stencne, loop through all items
    for item in objs:
        #if item is found in that sentence....
        
        if re.search(r"\b" + item + r"\b", line):
            #if not yet in word map, create new array for item
            
            if item not in word_map:
                word_map[item] = []
            #append sentence to word map item
            word_map[item].append(line)

In [15]:
#check one to see what we got
word_map['animal']

['’ Thus war was announced to the Bear, and all four-footed animals were summoned to take part in it, oxen, asses, cows, deer, and every other animal the earth contained. ',
 ' ‘Softly, softly; it can’t be done as quickly as that,’ said he, and stood still and waited until the animal was quite close, and then sprang nimbly behind the tree. ',
 '  The king, however, had a lion which was a wondrous animal, for he knew all concealed and secret things. ',
 ' Then he spake thus:  Man is a rope stretched between the animal and the Superman--a rope over an abyss. ',
 '  I love him who laboureth and inventeth, that he may build the house for the Superman, and prepare for him earth, animal, and plant: for thus seeketh he his own down-going. ',
 ' I am not much more than an animal which hath been taught to dance by blows and scanty fare. ',
 '  “The proudest animal under the sun, and the wisest animal under the sun,--they have come out to reconnoitre. ',
 '  Ye do not mean to slay, ye judges and

In [16]:
#double check object length
len(objs)

365

### Add Wikipedia entry and put through Markov

In [17]:
#import libraries
import markovify
import wikipedia

In [18]:
#set up markovify to use with characters
class SentencesByChar(markovify.Text):
    def word_split(self, sentence):
        return list(sentence)
    def word_join(self, words):
        return "".join(words)

In [19]:
#first open novel text file
f = open('novel_villustrated.txt','w')

In [20]:
#Add structure for titles
numDays = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
monthMap = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

In [21]:
#Now create month dictionary
monthObjs = {}
totalDays = 0
count = 0;

for month in monthMap:
    
    #set old and current day/month
    oldMonth = totalDays
    newMonth = totalDays + numDays[count]
    
    #now loop through months and add to object
    if month not in monthObjs:
        monthObjs[month] = []

     #append sentence to word map item
    monthObjs[month].append(objs[totalDays:newMonth])

    #upate total days
    totalDays = newMonth
    count += 1
    

In [22]:
#check
monthObjs['January'][0]

['man',
 'time',
 'day',
 'way',
 'king',
 'one',
 'eyes',
 'night',
 'heart',
 'life',
 'hand',
 'men',
 'mother',
 'world',
 'head',
 'things',
 'father',
 'woman',
 'ones',
 'door',
 'children',
 'thing',
 'work',
 'house',
 'people',
 'water',
 'love',
 'wife',
 'face',
 'soul',
 'morning']

In [29]:
count = 0
#loop through months
#for month in monthMap:

for month in monthMap:
    
    #### add title
    
    T = "\n" + "\n" + "<h2>" + monthMap[count] + "</h2>" + "\n" + "\n"
    
    #add image?
    
    #### add items
    for item in monthObjs[month][0]:
        print(item)
        
    count += 1

man
time
day
way
king
one
eyes
night
heart
life
hand
men
mother
world
head
things
father
woman
ones
door
children
thing
work
house
people
water
love
wife
face
soul
morning
tree
voice
bed
earth
words
son
child
bird
hands
mouth
air
light
boy
name
evening
place
sea
gold
fire
window
sun
word
death
street
sir
moment
hair
brother
body
boys
ground
girl
home
spirit
virtue
daughter
forest
eye
friend
side
horse
hour
years
end
room
others
wind
blood
hath
feet
days
fellow
wood
works
hat
money
dog
princess
Chapter
table
truth
happiness
master
youth
castle
pity
lady
mine
will
art
stone
foot
cat
animals
friends
power
part
wisdom
bread
ears
brethren
corner
pocket
wolf
doth
form
times
wine
dwarf
land
matter
order
course
lips
book
shadow
terms
trees
tailor
queen
garden
finger
cave
legs
ring
lion
care
prince
mountains
path
mind
road
glass
joy
ass
rest
arms
account
sleep
evil
good
thought
fish
yourselves
arm
values
country
skin
idea
husband
laughter
kind
year
food
sound
moon
thoughts
hearts
souls
birds
us

In [30]:
#now loop through word map and run markovify on each corpus + add wiki entry

count = 0
#loop through months
for month in monthMap:
    
    #### add title
    
    T = "\n" + "\n" + "<h2>" + monthMap[count] + "</h2>" + "\n" + "\n"
    
    #add image?
    
    #### add items
    for item in monthObjs[month][0]:
        
        gen_char = SentencesByChar(word_map[item], state_size=9)
    
        #add title
        T = T + "\n" + "\n" + "<h3>" + item.capitalize() + "</h3>" + "\n" + "\n"
    
        #add wikipedia entry - if no match, leave blank
        try: 
            wiki = wikipedia.summary(item, sentences=3)  
        except wikipedia.exceptions.DisambiguationError as e:
            wiki = " "

        #add formatting and add wikipedia entry to text so far
        T = T + "<p>" + wiki

        #add markov generated sentences
        for i in range(5):
            T = T + gen_char.make_sentence(test_output=False) + ". "

        #add formatting at end of paragraph
        formatted = T + "</p>"
    
    count += 1
    
    #end
    #write to file
    f.writelines(formatted)

    
# #test_objs = ["man", "time", "way"]
# for item in objs:
#     gen_char = SentencesByChar(word_map[item], state_size=9)
    
#     #add title
#     T = "\n" + "\n" + "<h3>" + item.capitalize() + "</h3>" + "\n" + "\n"
    
#     #add wikipedia entry - if no match, leave blank
#     try: 
#         wiki = wikipedia.summary(item, sentences=3)  
#     except wikipedia.exceptions.DisambiguationError as e:
#         wiki = " "
    
#     #add formatting and add wikipedia entry to text so far
#     T = T + "<p>" + wiki
    
#     #add markov generated sentences
#     for i in range(5):
#         T = T + gen_char.make_sentence(test_output=False) + ". "
    
#     #add formatting at end of paragraph
#     formatted = T + "</p>"
    
#     #write to file
#     f.writelines(formatted)



  lis = BeautifulSoup(html).find_all('li')


In [31]:
#then close the text
f.close()

# that's it!

For now...I'd like to try to make it more coherent, somehow connect or organize the objects (for example KOK organized his objects by season) and also add some visuals