### Problem Statement
- Using NLTK's Inaugural Address (inaugural) corpus, create three keyword-based document retrieval systems. The corpus consists of inaugural addresses made by U.S presidents upon assuming office.

#### Requirements
- The system should take keywords from user and return the most relevant inaugural speeches (a maximum of 5) based on the keywords.
- The matches need not be verbatim.
- The system should return results with the matching regions highlighted.
- Build three IR systems based on TF-IDF Vectorization, Binary Independence Model and a custom model created by you.
- Run the following keywords on all three systems and report the results. Compare the results and write your observations. You can also use your own keyword combinations in addition to the ones mentioned below:
  - freedom, jobs
  - slavery, war
  - liberty, slavery
  - freedom, military


In [1]:
#importing libraries
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
import nltk
import string
import numpy as np
import copy
import pandas as pd
import math
from sklearn.feature_extraction.text import TfidfVectorizer
from collections import defaultdict
from math import log, sqrt
from copy import deepcopy

In [2]:
#loading inaugural speeches data 
speeches = nltk.corpus.inaugural.fileids()
speeches

['1789-Washington.txt',
 '1793-Washington.txt',
 '1797-Adams.txt',
 '1801-Jefferson.txt',
 '1805-Jefferson.txt',
 '1809-Madison.txt',
 '1813-Madison.txt',
 '1817-Monroe.txt',
 '1821-Monroe.txt',
 '1825-Adams.txt',
 '1829-Jackson.txt',
 '1833-Jackson.txt',
 '1837-VanBuren.txt',
 '1841-Harrison.txt',
 '1845-Polk.txt',
 '1849-Taylor.txt',
 '1853-Pierce.txt',
 '1857-Buchanan.txt',
 '1861-Lincoln.txt',
 '1865-Lincoln.txt',
 '1869-Grant.txt',
 '1873-Grant.txt',
 '1877-Hayes.txt',
 '1881-Garfield.txt',
 '1885-Cleveland.txt',
 '1889-Harrison.txt',
 '1893-Cleveland.txt',
 '1897-McKinley.txt',
 '1901-McKinley.txt',
 '1905-Roosevelt.txt',
 '1909-Taft.txt',
 '1913-Wilson.txt',
 '1917-Wilson.txt',
 '1921-Harding.txt',
 '1925-Coolidge.txt',
 '1929-Hoover.txt',
 '1933-Roosevelt.txt',
 '1937-Roosevelt.txt',
 '1941-Roosevelt.txt',
 '1945-Roosevelt.txt',
 '1949-Truman.txt',
 '1953-Eisenhower.txt',
 '1957-Eisenhower.txt',
 '1961-Kennedy.txt',
 '1965-Johnson.txt',
 '1969-Nixon.txt',
 '1973-Nixon.txt',
 '1

In [3]:
#reading required data and storing in a dictionary
from nltk.corpus import inaugural
speeches_dict = {}
for speech in speeches:
    speeches_dict[speech] = inaugural.raw(speech)

In [4]:
speeches_dict['1793-Washington.txt']

'Fellow citizens, I am again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America.\n\nPrevious to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.\n\n \n'

### preprocessing:

In [5]:
#function to convert into lowercase
def lower_case(text):
    return text.lower()

In [6]:
#function to remove whitespace characters
def remove_white_space(text):
    return text.replace('\n','').replace('\t','').replace('\r','')

In [7]:
#function to remove stopwords
def stopwords_removal(text):
    stop_words = stopwords.words('english')
    words = word_tokenize(text)
    new_text = ""
    for word in words:
        if word not in stop_words and len(word) > 1:
            new_text = new_text + " " + word
    return new_text

In [8]:
#function to remove punctuations
def remove_punctuation(text):
    symbols = "!\"#$%&()*+-''./:;<=>?@[\]^_`{|}~"
    for i in range(len(symbols)):
        text = text.replace(symbols[i], "")
    return text

In [9]:
# function for preprocessing data
def preprocess(text):
    text = lower_case(text)
    text = remove_white_space(text)
    text = remove_punctuation(text)
    text = stopwords_removal(text)
    return text

In [10]:
#storing preprocessed in a dict
preprocessed_dict = {}
for key in list(speeches_dict.keys()):
    preprocessed_dict[key] = preprocess(speeches_dict[key])

## TF-IDF:

In [11]:
## Create Vocabulary
vocabulary = set()
for doc in preprocessed_dict.values():
    vocabulary.update(doc.split(' '))
vocabulary = list(vocabulary)

In [12]:
# Intializating the tfIdf model
tfidf = TfidfVectorizer(vocabulary=vocabulary)

In [13]:
# Transform the TfIdf model
tfidf.fit(list(preprocessed_dict.values()))
tfidf_tran=tfidf.transform(preprocessed_dict.values())

In [14]:
#creating dataframe for tfidf values
tfidf_pd = pd.DataFrame(tfidf_tran.A)
tfidf_pd.columns = vocabulary

In [15]:
#function to highlight the matching regions
def highlight(text,word):
    if text.find(word) > 0:
        idx = text.find(word)
        end_idx = idx + text[idx:].find(" ")
        new_str = text[:idx]+'\x1b[1;31m'+ text[idx:end_idx]+'\x1b[0m'
        if text[:end_idx].find(word) > 0:
            return new_str+" " + highlight(text[:end_idx],word)
        else:
            return new_str+" " + text[:end_idx]

In [16]:
#extracting highlighted matching regions
def display_highlight(df,words):
    text = df['Data'][0]
    sentances = sent_tokenize(text)
    for sent in sentances:
        highlighted_str = highlight(sent,words[0])
        if highlighted_str is None:
            highlighted_str2 = highlight(sent,words[1])
        else:
            highlighted_str2 = highlight(highlighted_str,words[1])
        if highlighted_str2 is not None:
            print(highlighted_str2)
        elif highlighted_str is not None :
            print(highlighted_str)

In [17]:
#finding tfidf scores and storing top 5 relevant documents in a dataframe
def get_TFIDF_Score(df,user_words):
    new_df = pd.DataFrame(df[user_words[0]]+df[user_words[1]])
    new_df = new_df.sort_values([0],ascending=False).head()
    index = new_df.index.values
    
    df = pd.DataFrame()
    for i in range(len(index)):
        df.loc[i,'File'] = list(preprocessed_dict.keys())[index[i]]
        df.loc[i,'Data'] = list(speeches_dict.values())[index[i]]
        df.loc[i,'Score'] = new_df[0][index[i]]
    return df
#displaying final results
def user_query(user_words):
    df = get_TFIDF_Score(tfidf_pd,user_words)
    display(df)
    display_highlight(df,user_words)

In [18]:
#taking number of queries from user
n = int(input("Number of queries: "))
n_queries = []
#taking queries from user
for i in range(n):
    n_queries.append(list(map(str,input("Enter a query:").split(" "))))
for query in n_queries:
    print(f"\n\nTop 5 relevant documents for given query {query}:")
    #calling the function to print final results
    user_query(query)

Number of queries: 4
Enter a query:freedom jobs
Enter a query:slavery war
Enter a query:liberty slavery
Enter a query:freedom military


Top 5 relevant documents for given query ['freedom', 'jobs']:


Unnamed: 0,File,Data,Score
0,2005-Bush.txt,"Vice President Cheney, Mr. Chief Justice, Pres...",0.339238
1,1957-Eisenhower.txt,"The Price of Peace\nMr. Chairman, Mr. Vice Pre...",0.15635
2,2017-Trump.txt,"Chief Justice Roberts, President Carter, Presi...",0.134013
3,1985-Reagan.txt,"Senator Mathias, Chief Justice Burger, Vice Pr...",0.131705
4,1949-Truman.txt,"Mr. Vice President, Mr. Chief Justice, and fel...",0.129353


For a half a century, America defended our own [1;31mfreedom[0m For a half a century, America defended our own [1;31m[0m For a half a century, America defended our own
There is only one force of history that can break the reign of hatred and resentment, and expose the pretensions of tyrants, and reward the hopes of the decent and tolerant, and that is the force of human [1;31m[0m There is only one force of history that can break the reign of hatred and resentment, and expose the pretensions of tyrants, and reward the hopes of the decent and tolerant, and that is the force of human
The best hope for peace in our world is the expansion of [1;31mfreedom[0m The best hope for peace in our world is the expansion of [1;31m[0m The best hope for peace in our world is the expansion of
Our goal instead is to help others find their own voice, attain their own [1;31mfreedom,[0m Our goal instead is to help others find their own voice, attain their own [1;31m[0m Our goal instead is to h

Unnamed: 0,File,Data,Score
0,1865-Lincoln.txt,Fellow-Countrymen:\n\nAt this second appearing...,0.280064
1,1813-Madison.txt,About to add the solemnity of an oath to the o...,0.232751
2,1857-Buchanan.txt,"Fellow citizens, I appear before you this day ...",0.154216
3,1881-Garfield.txt,Fellow-Citizens:\n\nWe stand to-day upon an em...,0.120977
4,1821-Monroe.txt,"Fellow citizens, I shall not attempt to descri...",0.109167


On the occasion corresponding to this four years ago all thoughts were anxiously directed to an impending civil [1;31m[0m On the occasion corresponding to this four years ago all thoughts were anxiously directed to an impending civil
While the inaugural address was being delivered from this place, devoted altogether to saving the Union without [1;31mwar,[0m While the inaugural address was being delivered from this place, devoted altogether to saving the Union without [1;31m[0m While the inaugural address was being delivered from this place, devoted altogether to saving the Union without
Both parties deprecated [1;31mwar,[0m Both parties deprecated [1;31m[0m Both parties deprecated
All knew that this interest was somehow the cause of the [1;31m[0m All knew that this interest was somehow the cause of the
To strengthen, perpetuate, and extend this interest was the object for which the insurgents would rend the Union even by [1;31mwar,[0m To strengthen, perpetuate, and extend

Unnamed: 0,File,Data,Score
0,2005-Bush.txt,"Vice President Cheney, Mr. Chief Justice, Pres...",0.194388
1,1857-Buchanan.txt,"Fellow citizens, I appear before you this day ...",0.128028
2,1881-Garfield.txt,Fellow-Citizens:\n\nWe stand to-day upon an em...,0.124289
3,1833-Jackson.txt,"Fellow citizens, the will of the American peop...",0.076645
4,1841-Harrison.txt,Called from a retirement which I had supposed ...,0.076322


We are led, by events and common sense, to one conclusion: The survival of [1;31mliberty[0m We are led, by events and common sense, to one conclusion: The survival of [1;31m[0m We are led, by events and common sense, to one conclusion: The survival of
In the long run, there is no justice without freedom, and there can be no human rights without human [1;31m[0m In the long run, there is no justice without freedom, and there can be no human rights without human
Some, I know, have questioned the global appeal of [1;31mliberty¡Xthough[0m Some, I know, have questioned the global appeal of [1;31m[0m Some, I know, have questioned the global appeal of
We do not accept the existence of permanent tyranny because we do not accept the possibility of permanent [1;31m[0m We do not accept the existence of permanent tyranny because we do not accept the possibility of permanent
When you stand for your [1;31mliberty,[0m When you stand for your [1;31m[0m When you stand for your
In a world

Unnamed: 0,File,Data,Score
0,2005-Bush.txt,"Vice President Cheney, Mr. Chief Justice, Pres...",0.339238
1,1957-Eisenhower.txt,"The Price of Peace\nMr. Chairman, Mr. Vice Pre...",0.200736
2,1985-Reagan.txt,"Senator Mathias, Chief Justice Burger, Vice Pr...",0.147285
3,1949-Truman.txt,"Mr. Vice President, Mr. Chief Justice, and fel...",0.146045
4,1953-Eisenhower.txt,"My friends, before I begin the expression of t...",0.129414


For a half a century, America defended our own [1;31mfreedom[0m For a half a century, America defended our own [1;31m[0m For a half a century, America defended our own
There is only one force of history that can break the reign of hatred and resentment, and expose the pretensions of tyrants, and reward the hopes of the decent and tolerant, and that is the force of human [1;31m[0m There is only one force of history that can break the reign of hatred and resentment, and expose the pretensions of tyrants, and reward the hopes of the decent and tolerant, and that is the force of human
The best hope for peace in our world is the expansion of [1;31mfreedom[0m The best hope for peace in our world is the expansion of [1;31m[0m The best hope for peace in our world is the expansion of
Our goal instead is to help others find their own voice, attain their own [1;31mfreedom,[0m Our goal instead is to help others find their own voice, attain their own [1;31m[0m Our goal instead is to h

## Binary independence model:

In [19]:
#finding inverted index for each term(the documents in which the term appeared)
def inverted_index(corpus):
    index = defaultdict(set)
    for docid, article in enumerate(corpus):
        for term in article.split(" "):
            index[term].add(docid)
    return index

In [20]:
#union of all the documents containing all the query words
def posting_lists_union(pl1, pl2):
        pl1 = sorted(list(pl1))
        pl2 = sorted(list(pl2))
        union = []
        i = 0
        j = 0
        while (i < len(pl1) and j < len(pl2)):
            if (pl1[i] == pl2[j]):
                union.append(pl1[i])
                i += 1
                j += 1
            elif (pl1[i] < pl2[j]):
                union.append(pl1[i])
                i += 1
            else:
                union.append(pl2[j])
                j += 1
        for k in range(i, len(pl1)):
            union.append(pl1[k])
        for k in range(j, len(pl2)):
            union.append(pl2[k])
        return union


In [21]:
#finding document frequency
def DF(term, index):
    return len(index[term])

#finding inverse document frequency
def IDF(term, index, corpus):
    return log(len(corpus)/DF(term, index))

#calculating RSV weights for each term using df and idf
def RSV_weights(corpus,index):
    N = len(corpus)
    w = {}
    for term in index.keys():
        p = DF(term, index)/(N+0.5)  
        w[term] = IDF(term, index, corpus) + log(p/(1-p))
    return w

In [22]:
#creating a class for the model
class BIM():
    #initialising the objects
    def __init__(self, corpus):
        self.original_corpus = deepcopy(corpus)
        self.articles = corpus
        self.index = inverted_index(self.articles)
        self.weights = RSV_weights(self.articles, self.index)
        self.ranked = []
        self.query_text = ''
   #finding scores of documents using rsv weights
    def RSV_doc_query(self, doc_id, query):
        score = 0
        doc = self.articles[doc_id]
        for term in doc.split(" "):
            if term in query:
                score += self.weights[term]     
        return score
    #ranking the documents based on scores
    def ranking(self, query):
        docs = []
        for term in self.index: 
            if term in query:
                docs = posting_lists_union(docs, self.index[term])
        scores = []
        for doc in docs:
            scores.append((doc, self.RSV_doc_query(doc, query)))
        self.ranked = sorted(scores, key=lambda x: x[1], reverse = True)
        return self.ranked

    # displaying top 5 relevant documents in a dataframe
    def user_query(self, query):
        ranking = self.ranking(query)
        n = 5
        df = pd.DataFrame()
        for i in range(n):
            df.loc[i,'File'] = list(preprocessed_dict.keys())[ranking[i][0]]
            df.loc[i,'Data'] = list(speeches_dict.values())[ranking[i][0]]
            df.loc[i,'Score'] = ranking[i][1]
        display(df)
        #highlighting the matching regions in the relevant documents
        display_highlight(df,query)
        self.weights = RSV_weights(self.articles, self.index)    

In [23]:
#calling the model for a query
def user_query_bim(user_words):
    bim  = BIM(list(preprocessed_dict.values()))
    bim.user_query(user_words)

In [24]:
#user input for number of queries
n = int(input("Number of queries: "))
n_queries = []
#queries from user
for i in range(n):
    n_queries.append(list(map(str,input("Enter a query:").split(" "))))
#top 5 relevant documents for each query
for query in n_queries:
    print(f"\n\nTop 5 relevant documents for given query {query}:")
    user_query_bim(query)

Number of queries: 4
Enter a query:freedom jobs
Enter a query:slavery war
Enter a query:liberty slavery
Enter a query:freedom military


Top 5 relevant documents for given query ['freedom', 'jobs']:


Unnamed: 0,File,Data,Score
0,2005-Bush.txt,"Vice President Cheney, Mr. Chief Justice, Pres...",22.726265
1,1985-Reagan.txt,"Senator Mathias, Chief Justice Burger, Vice Pr...",11.363132
2,1949-Truman.txt,"Mr. Vice President, Mr. Chief Justice, and fel...",10.416205
3,1957-Eisenhower.txt,"The Price of Peace\nMr. Chairman, Mr. Vice Pre...",9.469277
4,1953-Eisenhower.txt,"My friends, before I begin the expression of t...",7.575422


For a half a century, America defended our own [1;31mfreedom[0m For a half a century, America defended our own [1;31m[0m For a half a century, America defended our own
There is only one force of history that can break the reign of hatred and resentment, and expose the pretensions of tyrants, and reward the hopes of the decent and tolerant, and that is the force of human [1;31m[0m There is only one force of history that can break the reign of hatred and resentment, and expose the pretensions of tyrants, and reward the hopes of the decent and tolerant, and that is the force of human
The best hope for peace in our world is the expansion of [1;31mfreedom[0m The best hope for peace in our world is the expansion of [1;31m[0m The best hope for peace in our world is the expansion of
Our goal instead is to help others find their own voice, attain their own [1;31mfreedom,[0m Our goal instead is to help others find their own voice, attain their own [1;31m[0m Our goal instead is to h

Unnamed: 0,File,Data,Score
0,1821-Monroe.txt,"Fellow citizens, I shall not attempt to descri...",23.324053
1,1813-Madison.txt,About to add the solemnity of an oath to the o...,20.408547
2,1921-Harding.txt,My Countrymen:\n\nWhen one surveys the world a...,18.950793
3,1865-Lincoln.txt,Fellow-Countrymen:\n\nAt this second appearing...,16.235
4,1817-Monroe.txt,I should be destitute of feeling if I was not ...,14.577533


Just before the commencement of the last term the United States had concluded a [1;31mwar[0m Just before the commencement of the last term the United States had concluded a [1;31m[0m Just before the commencement of the last term the United States had concluded a
The events of that [1;31mwar[0m The events of that [1;31m[0m The events of that
As soon as the [1;31mwar[0m As soon as the [1;31m[0m As soon as the
But if there were no fortifications, then the enemy might go where he pleased, and, changing his position and sailing from place to place, our force must be called out and spread in vast numbers along the whole coast and on both sides of every bay and river as high up in each as it might be navigable for ships of [1;31m[0m But if there were no fortifications, then the enemy might go where he pleased, and, changing his position and sailing from place to place, our force must be called out and spread in vast numbers along the whole coast and on both sides of every bay an

Unnamed: 0,File,Data,Score
0,1841-Harrison.txt,Called from a retirement which I had supposed ...,20.370116
1,2005-Bush.txt,"Vice President Cheney, Mr. Chief Justice, Pres...",16.975103
2,1881-Garfield.txt,Fellow-Citizens:\n\nWe stand to-day upon an em...,6.790064
3,1901-McKinley.txt,"My fellow-citizens, when we assembled here on ...",5.991211
4,2013-Obama.txt,Thank you. Thank you so much.\n\nVice Presiden...,5.991211


It has been found powerful in war, and hitherto justice has been administered, and intimate union effected, domestic tranquillity preserved, and personal [1;31mliberty[0m It has been found powerful in war, and hitherto justice has been administered, and intimate union effected, domestic tranquillity preserved, and personal [1;31m[0m It has been found powerful in war, and hitherto justice has been administered, and intimate union effected, domestic tranquillity preserved, and personal
But if there is danger to public [1;31mliberty[0m But if there is danger to public [1;31m[0m But if there is danger to public
The great dread of the former seems to have been that the reserved powers of the States would be absorbed by those of the Federal Government and a consolidated power established, leaving to the States the shadow only of that independent action for which they had so zealously contended and on the preservation of which they relied as the last hope of [1;31m[0m The great drea

Unnamed: 0,File,Data,Score
0,2005-Bush.txt,"Vice President Cheney, Mr. Chief Justice, Pres...",22.726265
1,1985-Reagan.txt,"Senator Mathias, Chief Justice Burger, Vice Pr...",11.747275
2,1949-Truman.txt,"Mr. Vice President, Mr. Chief Justice, and fel...",10.800347
3,1957-Eisenhower.txt,"The Price of Peace\nMr. Chairman, Mr. Vice Pre...",10.237562
4,1953-Eisenhower.txt,"My friends, before I begin the expression of t...",8.343706


For a half a century, America defended our own [1;31mfreedom[0m For a half a century, America defended our own [1;31m[0m For a half a century, America defended our own
There is only one force of history that can break the reign of hatred and resentment, and expose the pretensions of tyrants, and reward the hopes of the decent and tolerant, and that is the force of human [1;31m[0m There is only one force of history that can break the reign of hatred and resentment, and expose the pretensions of tyrants, and reward the hopes of the decent and tolerant, and that is the force of human
The best hope for peace in our world is the expansion of [1;31mfreedom[0m The best hope for peace in our world is the expansion of [1;31m[0m The best hope for peace in our world is the expansion of
Our goal instead is to help others find their own voice, attain their own [1;31mfreedom,[0m Our goal instead is to help others find their own voice, attain their own [1;31m[0m Our goal instead is to h

### Custom model:

In [25]:
#function for custom model
def customModel(words):
    #declaring a dictionary
    word_dict = {}
    #finding query words frequency in each document
    for word in words:
       
        word_dict[word] = {}
        #iterating for each document
        for key in list(preprocessed_dict.keys()):
            #iterating for each word in a document
            for text in preprocessed_dict[key].split(" "):
                #if the query word is present in document,increase the count
                if word == text:
                    
                    if key in word_dict[word]:
                        word_dict[word][key] += 1
                    else:
                        word_dict[word][key] = 1
    #sorting the documents based on query word frequencies                    
    for word in words:
        word_dict[word] = dict(sorted(word_dict[word].items(), key=lambda item:item[1],reverse=True))
    results = {}
    #finding the top documents in sorted dictionaries which are containing all the query words
    for key1 in list(word_dict[words[0]].keys()):
        for key2 in list(word_dict[words[1]].keys()):
            if key1 == key2:
                #adding the total count of all the query words in each documents
                results[key1] = word_dict[words[0]][key1] + word_dict[words[1]][key1]
    #sorting the documents based on above total count
    results = dict(sorted(results.items(), key=lambda item:item[1],reverse=True))
    #creating a dataframe to display top 5 relevant documents
    df = pd.DataFrame()
    data = []
    for key in list(results.keys()):
        data.append(speeches_dict[key])
    df['File'] = list(results.keys())
    df['Data'] = data
    return df.head()

In [26]:
#function to display final relevant documents with highlighted regions
def user_query_custom_model(user_words):
    df = customModel(user_words)
    display(df)
    display_highlight(df,user_words)

In [27]:
#user input for number of queries
n = int(input("Number of queries: "))
n_queries = []
#queries from user
for i in range(n):
    n_queries.append(list(map(str,input("Enter a query:").split(" "))))
#finding relevant documents for each query
for query in n_queries:
    print(f"\n\nTop 5 relevant documents for given query {query}:")
    user_query_custom_model(query)

Number of queries: 4
Enter a query:freedom jobs
Enter a query:slavery war
Enter a query:liberty slavery
Enter a query:freedom military


Top 5 relevant documents for given query ['freedom', 'jobs']:


Unnamed: 0,File,Data
0,2013-Obama.txt,Thank you. Thank you so much.\n\nVice Presiden...
1,1981-Reagan.txt,"Senator Hatfield, Mr. Chief Justice, Mr. Presi..."
2,2009-Obama.txt,My fellow citizens:\n\nI stand here today humb...
3,1993-Clinton.txt,"My fellow citizens, today we celebrate the mys..."


For history tells us that while these truths may be self-evident, they've never been self-executing; that while [1;31mfreedom[0m For history tells us that while these truths may be self-evident, they've never been self-executing; that while [1;31m[0m For history tells us that while these truths may be self-evident, they've never been self-executing; that while
But we have always understood that when times change, so must we; that fidelity to our founding principles requires new responses to new challenges; that preserving our individual [1;31mfreedoms[0m But we have always understood that when times change, so must we; that fidelity to our founding principles requires new responses to new challenges; that preserving our individual [1;31m[0m But we have always understood that when times change, so must we; that fidelity to our founding principles requires new responses to new challenges; that preserving our individual
No single person can train all the math and science teachers 

Unnamed: 0,File,Data
0,1865-Lincoln.txt,Fellow-Countrymen:\n\nAt this second appearing...
1,1857-Buchanan.txt,"Fellow citizens, I appear before you this day ..."
2,1881-Garfield.txt,Fellow-Citizens:\n\nWe stand to-day upon an em...
3,1889-Harrison.txt,"Fellow-Citizens, there is no constitutional or..."
4,1909-Taft.txt,My fellow citizens: Anyone who has taken the o...


On the occasion corresponding to this four years ago all thoughts were anxiously directed to an impending civil [1;31m[0m On the occasion corresponding to this four years ago all thoughts were anxiously directed to an impending civil
While the inaugural address was being delivered from this place, devoted altogether to saving the Union without [1;31mwar,[0m While the inaugural address was being delivered from this place, devoted altogether to saving the Union without [1;31m[0m While the inaugural address was being delivered from this place, devoted altogether to saving the Union without
Both parties deprecated [1;31mwar,[0m Both parties deprecated [1;31m[0m Both parties deprecated
All knew that this interest was somehow the cause of the [1;31m[0m All knew that this interest was somehow the cause of the
To strengthen, perpetuate, and extend this interest was the object for which the insurgents would rend the Union even by [1;31mwar,[0m To strengthen, perpetuate, and extend

Unnamed: 0,File,Data
0,2005-Bush.txt,"Vice President Cheney, Mr. Chief Justice, Pres..."
1,1881-Garfield.txt,Fellow-Citizens:\n\nWe stand to-day upon an em...
2,1857-Buchanan.txt,"Fellow citizens, I appear before you this day ..."
3,1861-Lincoln.txt,Fellow-Citizens of the United States: In compl...
4,1889-Harrison.txt,"Fellow-Citizens, there is no constitutional or..."


We are led, by events and common sense, to one conclusion: The survival of [1;31mliberty[0m We are led, by events and common sense, to one conclusion: The survival of [1;31m[0m We are led, by events and common sense, to one conclusion: The survival of
In the long run, there is no justice without freedom, and there can be no human rights without human [1;31m[0m In the long run, there is no justice without freedom, and there can be no human rights without human
Some, I know, have questioned the global appeal of [1;31mliberty¡Xthough[0m Some, I know, have questioned the global appeal of [1;31m[0m Some, I know, have questioned the global appeal of
We do not accept the existence of permanent tyranny because we do not accept the possibility of permanent [1;31m[0m We do not accept the existence of permanent tyranny because we do not accept the possibility of permanent
When you stand for your [1;31mliberty,[0m When you stand for your [1;31m[0m When you stand for your
In a world

Unnamed: 0,File,Data
0,1985-Reagan.txt,"Senator Mathias, Chief Justice Burger, Vice Pr..."
1,1949-Truman.txt,"Mr. Vice President, Mr. Chief Justice, and fel..."
2,1957-Eisenhower.txt,"The Price of Peace\nMr. Chairman, Mr. Vice Pre..."
3,1953-Eisenhower.txt,"My friends, before I begin the expression of t..."
4,1825-Adams.txt,In compliance with an usage coeval with the ex...


By 1980, we knew it was time to renew our faith, to strive with all our strength toward the ultimate in individual [1;31mfreedom[0m By 1980, we knew it was time to renew our faith, to strive with all our strength toward the ultimate in individual [1;31m[0m By 1980, we knew it was time to renew our faith, to strive with all our strength toward the ultimate in individual
We will not rest until every American enjoys the fullness of [1;31mfreedom,[0m We will not rest until every American enjoys the fullness of [1;31m[0m We will not rest until every American enjoys the fullness of
These will be years when Americans have restored their confidence and tradition of progress; when our values of faith, family, work, and neighborhood were restated for a modern age; when our economy was finally freed from government's grip; when we made sincere efforts at meaningful arms reduction, rebuilding our defenses, our economy, and developing new technologies, and helped preserve peace in a trouble

### Observations:
#### For the first query with the keywords "freedom,jobs"
   - For TFIDF,the top 5 relevant documents are: 
         2005-Bush.txt,1857-Buchanan.txt,1881-Garfield,1833-Jackson.txt,1841-Harrison.txt
   - For BIM,the top 5 relevant documents are:
         2005-Bush.txt,1985-Reagan.txt,1949-Truman.txt,1957-Eisenhower.txt,1953-Eisenhower.txt
   - For Custom model,the top 5 relevant documents are:
         2013-Obama.txt,1981-Reagan.txt,2009-Obama.txt,1993-Clinton.txt
#### For the second query with the keywords "slavery,war"
   - For TFIDF,the top 5 relevant documents are:
         1865-Lincoln.txt,1813-Madison.txt,1857-Buchanan.txt,1881-Garfield.txt,1821-Monroe.txt
   - For BIM,the top 5 relevant documents are:
         1821-Monroe.txt,1813-Madison.txt,1921-Harding.txt,1865-Lincoln.txt,1817-Monroe.txt
   - For Custom model,the top 5 relevant documents are:
         1865-Lincoln.txt,1857-Buchanan.txt,1881-Garfield.txt,1889-Harrison.txt,1909-Taft.txt
#### For the third query with the keywords "liberty,slavery"
   - For TFIDF,the top 5 relevant documents are:
         2005-Bush.txt,1857-Buchanan.txt,1881-Garfield.txt,1833-Jackson.txt,1841-Harrison.txt
   - For BIM,the top 5 relevant documents are:
         1841-Harrison.txt,2005-Bush.txt,1881-Garfield.txt,1901-McKinley.txt,2013-Obama.txt
   - For Custom model,the top 5 relevant documents are:
         2005-Bush.txt,1881-Garfield.txt,1857-Buchanan.txt,1861-Lincoln.txt,1889-Harrison.txt
#### For the fourth query with the keywords "freedom,military"
   - For TFIDF,the top 5 relevant documents are:
         2005-Bush.txt,1957-Eisenhower.txt,1985-Reagan.txt,1949-Truman.txt,1953-Eisenhower.txt
   - For BIM,the top 5 relevant documents are:
         2005-Bush.txt,1985-Reagan.txt,1949-Truman.txt,1957-Eisenhower.txt,1953-Eisenhower.txt
   - For Custom model,the top 5 relevant documents are:
         1985-Reagan.txt,1949-Truman.txt,1957-Eisenhower.txt,1953-Eisenhower.txt,1825-Adams.txt