This tutorial contains the text of *The Hunger Games* by Suzanne Collins.

# FlashText Tutorial

In [1]:
#install for the first time - uncomment the below line to install

In [2]:
#!pip install flashtext

In [3]:
from flashtext import KeywordProcessor
kp = KeywordProcessor() #can make the search for keywords case sensitive by adding the argument (case_sensitive = True)

The dataset is a text file (.txt) containing the text of the the novel *The Hunger Games*.

In [4]:
#import text data

with open('The Hunger Games.txt') as f:
    contents = f.read()

    print(contents[82:1000])

1.

When I wake up, the other side of the bed is cold. My fingers stretch out, seeking Prims warmth but finding only the rough canvas cover of the mattress. She must have had bad dreams and climbed in with our mother. Of course, she did. This is the day of the reaping. I prop myself up on one elbow. Theres enough light in the bedroom to see them. My little sister, Prim, curled up on her side, cocooned in my mothers body, their cheeks pressed together. In sleep, my mother looks younger, still worn but not so beaten-down. Prims face is as fresh as a raindrop, as lovely as the primrose for which she was named. My mother was very beautiful once, too. Or so

they tell me. Sitting at Prims knees, guarding her, is the worlds ugliest cat. Mashed-in nose, half of one ear missing, eyes the color of rotting squash. Prim named him Buttercup, insisting that his muddy yellow coat matched the bright flower. I he hates m


## Extracting keywords

In [5]:
#compile keywords to search for
#function structure: kp.add_keyword(<unprocessed word>, optional: <standardized word to extract when unprocessed word is found>)

kp.add_keyword('Katniss', 'Katniss Everdeen - protagonist')
kp.add_keyword('District twelve', 'District 12')  
kp.add_keyword('costume', 'battle costume')

wordsfound = kp.extract_keywords(contents)

print(set(wordsfound))

{'Katniss Everdeen - protagonist', 'District 12', 'battle costume'}


### Add multiple keywords simultaneously

In [6]:
#when want to substitute standardized word for an unprocessed word, compile keywords in a dictionary
#dict structure: {<standardized word to extract: [<list of unprocessed words that can be substituted for standardized word>]}

word_dict = {'Primrose Everdeen': ['Prim', 'my sister'], 'Hunger Games setting': ['dome', 'arena']}
kp.add_keywords_from_dict(word_dict)

#when there is no need to differentiate between the unprocessed and standardized word, keywords can be compiled in a list

characters = ['Peeta', 'Gale', 'Rue', 'capitol']
kp.add_keywords_from_list(characters)

wordsfound = kp.extract_keywords(contents)
print(set(wordsfound))

{'Katniss Everdeen - protagonist', 'Peeta', 'Rue', 'Primrose Everdeen', 'capitol', 'District 12', 'Gale', 'Hunger Games setting', 'battle costume'}


Note that words added previously are still in the dictionary and therefore extracted. The dictionary is cumulative; the function will also extract or replace all words added to the dictionary in previous steps, unless they are removed.

## Replacing keywords

In [7]:
#To replace keywords, the standardized word identified in the add_keywords step(s) will be what replaces the unprocessed words listed in the add_keywords step(s) as they are found"

sample_sentence = 'Katniss wore her costume in the arena, allowing her to move quickly and hide when necessary.'
new_sentence = kp.replace_keywords(sample_sentence)
print(new_sentence)

Katniss Everdeen - protagonist wore her battle costume in the Hunger Games setting, allowing her to move quickly and hide when necessary.


In [8]:
new_contents = kp.replace_keywords(contents)
print(new_contents[18000:20000])

t. Gale and I divide our spoils, leaving two fish, a couple of loaves of good bread, greens, a quart of strawberries, salt, paraffin, and a bit of money for each. See you in the square, I say. Wear something

pretty, he says flatly. At home, I find my mother and sister are ready to go. My mother wears a fine dress from her apothecary days. Primrose Everdeen is in my first reaping outfit, a skirt and ruffled blouse. Its a bit big on her, but my mother has made it stay with pins. Even so, shes having trouble keeping the blouse tucked in at the back. A tub of warm water waits for me. I scrub off the dirt and sweat from the woods and even wash my hair. To my surprise, my mother has laid out one of her own lovely dresses for me. A soft blue thing with matching shoes. Are you sure? I ask. Im trying to get past rejecting offers of help from her. For a while, I was so angry, I wouldnt allow her to do anything for me. And this is something special. Her clothes from her

past are very precious t

## Determine words in dictionary

In [9]:
#get all keywords in the dictionary

all_words = kp.get_all_keywords()
print(all_words)

#check if a keyword is present in the dictionary

'Prim' in kp

{'katniss': 'Katniss Everdeen - protagonist', 'district twelve': 'District 12', 'dome': 'Hunger Games setting', 'costume': 'battle costume', 'capitol': 'capitol', 'prim': 'Primrose Everdeen', 'peeta': 'Peeta', 'my sister': 'Primrose Everdeen', 'arena': 'Hunger Games setting', 'gale': 'Gale', 'rue': 'Rue'}


True

In [10]:
#Remove keyword

print(kp.extract_keywords(sample_sentence))

kp.remove_keyword('costume')
new_sentence2 = kp.extract_keywords(sample_sentence)
print(new_sentence2)


['Katniss Everdeen - protagonist', 'battle costume', 'Hunger Games setting']
['Katniss Everdeen - protagonist', 'Hunger Games setting']
