# Star Wars Boolean Search 
## Construct and execute simple Boolean searches on five documents on Star Wars

Below are five brief "documents" (text snippets) on the Star Wars movies, taken from Wikipedia.

In [3]:
doc1 = "Frank Oz provided Yoda's voice with each film and spent his skills to be a puppeteer in the original trilogy and Star Wars Episode I: The Phantom Menace. For the latter, in some walking scenes, Warwick Davis incarnated Yoda as well. For the radio dramatizations of The Empire Strikes Back and Return of the Jedi, Yoda was voiced by John Lithgow, while Tom Kane voiced him in the Clone Wars animated series, several video games, and the new series Star Wars: The Clone Wars."
doc2 = "Luke Skywalker lives a humdrum existence on Tatooine with his Uncle Owen and Aunt Beru that have kept his father's true history a secret from him. He initially wants to join the Imperial Academy to become a pilot with his childhood friend Biggs Darklighter, but is held back by his uncle who ostensibly needs his help on the moisture farm (while it was to hopefully prevent Luke from following his father's path). He takes his first steps toward his destiny when he finds the two droids C-3PO and R2-D2. After delivering R2-D2's message to hermit Ben Kenobi, Ben tells Luke that his father was a Jedi and presents him with his father's lightsaber and then tells him that his father was murdered by a traitorous Jedi. Ben offers to take Luke to the planet Alderaan and train him in the ways of the Force, but Luke rejects his offer."
doc3 = "When the Empire attacks the Rebel base, Solo transports Chewbacca, along with Princess Leia and C-3PO to Cloud City where his old friend (and Cloud City administrator) Lando Calrissian operates to hide from Imperial agents. When bounty hunter Boba Fett tracks the Falcon to Cloud City, Darth Vader forces Calrissian to help capture Solo sealed in carbonite for delivery to Jabba the Hutt. But Lando frees Vader's other captives and they may rescue Solo but are too late as Fett escapes with Solo's frozen body."
doc4 = "In her first appearance in Star Wars Episode IV: A New Hope, Princess Leia is introduced as the Princess of Alderaan and a member of the Imperial Senate. Darth Vader captures her on board the ship Tantive IV, where she is acting as a spy for the Rebel Alliance. He accuses her of being a traitor and demands to know the location of the secret technical plans of the Death Star, the Galactic Empire's most powerful weapon. Unknown to Vader, the young Senator has hidden the plans inside the Astromech droid R2-D2 and has sent it to find Jedi Master Obi-Wan Kenobi on the nearby planet of Tatooine."
doc5 = "Three years later in Star Wars Episode V: The Empire Strikes Back, Lord Vader leads an Imperial starfleet in pursuit of the Rebels. Intent on turning Luke to the dark side, Vader captures Princess Leia, Han Solo, Chewbacca and C-3PO on Cloud City to use them as bait for Luke. During a lightsaber duel, Vader cuts off Luke's right hand and reveals that he is Luke's father"

# store these documents in a dictionary
documents = {'doc1':doc1} 
documents['doc2'] = doc2
documents['doc3'] = doc3
documents['doc4'] = doc4
documents['doc5'] = doc5

for doc in sorted(documents):
  print('\n', doc, ':\n\n', documents[doc][0:50], '...')


 doc1 :

 Frank Oz provided Yoda's voice with each film and  ...

 doc2 :

 Luke Skywalker lives a humdrum existence on Tatooi ...

 doc3 :

 When the Empire attacks the Rebel base, Solo trans ...

 doc4 :

 In her first appearance in Star Wars Episode IV: A ...

 doc5 :

 Three years later in Star Wars Episode V: The Empi ...


In [4]:
# Boolean search function that looks for documents based on a set of terms and whether the documents should contain all of any of the terms, using the "and" and "or" operators, respectively.
# Inputs: 
# - the operator ("and" or "or")
# - a list of terms to search for (`["Term1","Term2"..."TermN"]`)
# - the document collection to search through (in this case our dictionary called `documents`)
# Output: documents that match terms with operators
import re
def boolean(comparison,terms,documents):
    # invert dictionary
    documents_inverse = {value.lower():key for key,value in documents.items()}
    # if we got just one term, make it into a list of one
    if type(terms)==str:
        terms = [terms]
        
    relevant = []
    if comparison == "or":
        for d in documents:
            document = documents[d].lower()
            rel = False
            for term in terms:
                # search using regular expression \b (word boundary) operator so we only match whole words
                if re.search(r"\b" + re.escape(term.lower()) + r"\b", document):
                    rel = True
            if rel == True:
                relevant.append(documents_inverse[document])
    if comparison == "and":
        for d in documents:
            document = documents[d].lower()
            rel = True
            for term in terms:
                # search using regular expression \b (word boundary) operator so we only match whole words
                if not re.search(r"\b" + re.escape(term.lower()) + r"\b", document):
                    rel = False
            if rel == True:
                relevant.append(documents_inverse[document])
    
    return relevant

In [6]:
boolean("and", ["the", "force"], documents)

['doc2']

In [7]:
boolean("and", ["Darth", "Vader"], documents)

['doc3', 'doc4']

In [9]:
boolean("or", ["Darth", "Vader"], documents)

['doc3', 'doc4', 'doc5']

In [8]:
boolean("or", ["Darth", "Vader", "Luke", "Skywalker"], documents)

['doc2', 'doc3', 'doc4', 'doc5']