### Project Notebook

Take the data generated from the siteScraper notebook and create the appropriate structures that will allow a information request and subsequently a response to the request.  The answer should be in the response and a link should be provided if the user who requested the information wants to learn more.  

It might be beneficial to split out the data into a different format (page, dictionary of key phrases, text correspnding to key phrases, etc).

You can demonstrate the notebook's ability to provide responses (answers **AND** different link) to where the information needed is located on the UM CIS website.



In [51]:
import string
import nltk
import warnings

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


nltk.download('popular', quiet=True)
warnings.filterwarnings('ignore')

In [52]:
with open('text.txt', 'r', encoding='utf8') as f:
    sentences = f.read().lower().split('\n')

with open('links.txt', 'r', encoding='utf8') as f:
    links = f.read().lower().split('\n')

print(len(links), len(sentences))

322 322


Remove punctuation, tokenize, lemmatize 

In [53]:
lemmatizer = nltk.stem.WordNetLemmatizer()

#Lemmatize
def LemTokens(tokens):
    return [lemmatizer.lemmatize(token) for token in tokens]

#Tokenize and remove punctuation
def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(str.maketrans('','',string.punctuation))))   

### Bot Responses
- Create a Tf-idf model and Using cosine similarity to get text close to the query
- Return the sentence with the closest similarity and the corresponding link

In [54]:
error_response = "I am sorry! I don't understand :( Please try something else!"
tf_idf_model = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')


def bot_response(query,level):
    # Add query to data for encoder (but remove at the end)
    sentences.append(query)                                 

    tf_idf = tf_idf_model.fit_transform(sentences)  
    
    # tf_idf[-1] is the query, and  calculating the distance to the other sentences
    similarities = cosine_similarity(tf_idf[-1], tf_idf) 
    
     # -1 is the query (d=0), -2 is the closest sentence to it and -3 after that
    best_match = similarities.argsort()[0][level]             

    # If any similar data to the query could be found
    if similarities[0, best_match] == 0:                    
        answer = error_response
    else:
        answer = f'Here is what I found: {sentences[best_match]}.\n{"":8}Learn more at {links[best_match]}\n'
    
    
    sentences.remove(query)
    return answer

### Run Bot

In [57]:
greetings = ['hi', 'hello', 'hey', 'hello there',"what's up"]
bye_texts = ['bye', 'thank you bye', 'thank you, bye', 'bye bye', 'see you']
thanks = ['thanks', 'thank you', 'tnx']

print('CS Bot: Hello Explorer, you can ask me anything about the CS department of Olemiss. Will I know the answer? Oh well...')

while True:
    query = input()
    print(f'\nUser:   {query}')
    answerLevel=-2
    
    query=query.lower().translate(str.maketrans('','',string.punctuation))
    
    if query in greetings:
        print('CS Bot: Hello there!')
        continue

    if query in bye_texts:
        print('CS Bot: I hope my humble services satisfied you!')
        break

    if query in thanks:
        print('CS Bot: At your service. Always!!')
        continue

    # ANSWER
    answer=bot_response(query,answerLevel)
    print(f'CS Bot: {answer}')
    
    #Check if bot does not understand
    if "I am sorry!" in answer:
        continue
    
    #Checking for User's satisfaction
    while True:
        print("Does that answer your question?")
        query2 = input().lower()
        print(f'\nUser:   {query2}')
        
        if query2=="yes":
            print("Great!")
            break
        if query2=="no":
            answerLevel-=1
            #print(answerLevel)
            print(f'CS Bot: {bot_response(query,answerLevel)}')

CS Bot: Hello Explorer, you can ask me anything about the CS department of Olemiss. Will I know the answer? Oh well...


 Hello



User:   Hello
CS Bot: Hello there!


 How to get BS degree?



User:   How to get BS degree?
CS Bot: Here is what I found: students can obtain a second bachelor’s degree by satisfying the requirements for the bscs or the ba computer science major and earning at least 30 hours of credit above the first degree. however, most students with an undergraduate degree in a field related to computer science may want to consider study toward a graduate degree in computer science as an option..
        Learn more at https://cs.olemiss.edu/faq/

Does that answer your question?


 yes



User:   yes
Great!


 How about MS degree?



User:   How about MS degree?
CS Bot: Here is what I found: a student must be recommended for admission to the ms degree program by the department of computer and information science. to be admitted into the ms degree program a student should:.
        Learn more at https://cs.olemiss.edu/master-of-science/

Does that answer your question?


 No



User:   no
CS Bot: Here is what I found: a student who does not pass the comprehensive examination in two sittings may choose to get an m.s. degree after completing a m.s. project or a m.s. thesis..
        Learn more at https://cs.olemiss.edu/doctor-of-philosophy/

Does that answer your question?


 yes



User:   yes
Great!


 What research groups can I join?



User:   What research groups can I join?
CS Bot: Here is what I found: the computer science student body is diverse, with students from all across mississippi, the united states, and beyond. whether you are looking to join a study group, professional society, team club, or simply a new group of friends, you will find our student body is warm and engaging..
        Learn more at https://cs.olemiss.edu/mission/

Does that answer your question?


 No



User:   no
CS Bot: Here is what I found: bioinformatics research group: dr. dawn e. wilkins and dr. yixin chen. research on use of machine learning techniques in analysis of microarray data, bioinformatics, and computational biology. funding by nih and nsf..
        Learn more at https://cs.olemiss.edu/research-groups/

Does that answer your question?


 yes



User:   yes
Great!


 Who is Dr. Rhodes?



User:   Who is Dr. Rhodes?
CS Bot: Here is what I found: dr. philip j. rhodes joined the faculty in 2004 after receiving his ph.d. from the university of new hampshire. he is chief architect and implementer of the granite system, a library that provides efficient and convenient access to spatial datasets such as those produced by two and three dimensional simulations. at the graduate level, rhodes teaches on scientific data representation and analysis, scientific visualization, computer graphics, and cloud and parallel computing..
        Learn more at http://csci.cs.olemiss.edu/faculty/rhodes/

Does that answer your question?


 yes



User:   yes
Great!


 Thank you!



User:   Thank you!
CS Bot: At your service. Always!!


 Bye



User:   Bye
CS Bot: I hope my humble services satisfied you!
