# Shakespeare Search Engine in 7 Lines of Code

Okay, a few more than that because we need to load the plays and import dependencies.

The idea is to create a hash table that relates searchable terms with their occurences in the document. This is an extremely fast method of search, but it does require building an index beforehand.

**Import dependencies**

In [1]:
import pandas as pd
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
from collections import defaultdict

#### Load the corpus
The first task is to load Shakespeare into a data frame. I have pre-cleaned it, but it is available [here](https://www.kaggle.com/kingburrito666/shakespeare-plays).

In [2]:
df = pd.read_csv('shakespeare.csv')
df.head()

Unnamed: 0,Play,Character,Act,Scene,Passage,Line_Num,Line
0,Henry IV,KING HENRY IV,1,1,1,1,"So shaken as we are, so wan with care,"
1,Henry IV,KING HENRY IV,1,1,1,2,"Find we a time for frighted peace to pant,"
2,Henry IV,KING HENRY IV,1,1,1,3,And breathe short-winded accents of new broils
3,Henry IV,KING HENRY IV,1,1,1,4,To be commenced in strands afar remote.
4,Henry IV,KING HENRY IV,1,1,1,5,No more the thirsty entrance of this soil


#### Index
We'll use NLTK to do the annoying sentence processing and word stemming

In [3]:
ps = PorterStemmer()
word2idx = defaultdict(list)

for row in df.itertuples():
    for word in word_tokenize(row.Line): # Decompose the sentence to tokens
        if word.isalpha(): # Only add this token it if is a word (i.e., not punctuation)
            word2idx[ps.stem(word.lower())].append(row.Index)

#### Search
Basic search is a simple manner of inputting a key to the index dictionary

In [4]:
search_term = 'lustre'
for idx in word2idx[ps.stem(search_term.lower())]:
    print(df.loc[idx].Line)

It lends a lustre and more great opinion,
He beats thee 'gainst the odds: thy lustre thickens,
A lustre to it.
That hath not noble lustre in your eyes.
Equal in lustre, were now best, now worst,
About his neck, yet never lost her lustre;
Did lose his lustre: I did hear him groan:
Where is thy lustre now?
Piercing a hogshead! a good lustre of conceit in a
You have added worth unto 't and lustre,
The lustre of the better yet to show,
The lustre in your eye, heaven in your cheek,
Tincture or lustre in her lip, her eye,


This can be condensed to a one line function. (Though this is a little clunky.)

In [5]:
def search(term):
    return df.loc[[idx for idx in word2idx[ps.stem(term.lower())]]]

Let's look at some examples:

In [6]:
search_result = search('Tragic')
search_result.sample(6)

Unnamed: 0,Play,Character,Act,Scene,Passage,Line_Num,Line
65015,A Midsummer nights dream,PHILOSTRATE,5,1,11,69,"And tragical, my noble lord, it is;"
65006,A Midsummer nights dream,THESEUS,5,1,10,60,And his love Thisbe; very tragical mirth.'
21120,A Comedy of Errors,AEGEON,1,1,5,64,Gave any tragic instance of our harm:
78058,Richard III,QUEEN ELIZABETH,2,2,16,39,To make an act of tragic violence:
71513,Othello,LODOVICO,5,2,178,416,Look on the tragic loading of this bed;
7849,Henry VI Part 2,Captain,4,1,1,4,That drag the tragic melancholy night;


In [7]:
search_result = search('Love')
search_result.sample(6)

Unnamed: 0,Play,Character,Act,Scene,Passage,Line_Num,Line
22620,A Comedy of Errors,AEMELIA,5,1,20,54,Stray'd his affection in unlawful love?
90401,Timon of Athens,APEMANTUS,4,3,95,339,have loved thyself better now. What man didst ...
4272,Henry VI Part 1,KING HENRY VI,3,1,20,78,To join your hearts in love and amity.
37451,Henry V,BURGUNDY,5,2,6,37,"Our fertile France, put up her lovely visage?"
925,Henry IV,LADY PERCY,2,3,18,102,I will not love myself. Do you not love me?
92248,Titus Andronicus,MARCUS ANDRONICUS,3,1,35,183,Now let me show a brother's love to thee.
