## Orange-bot
#### A quick to train, resource-light retreival chatbot. 
##### Easy to learn how to build, easier to implement. 

##### 0) Load data preprocessing libraries
###### (If you are getting 'can not be found' errors, copy the error message and paste it into a search engine. Stack Overflow will have your answer.)

In [2]:
# Pandas is a dataframe tool, the skeletal scaffolding of Orange. 
# Numpy is a math related extension for Python; necessary for Orange's brain, whom sees numbers where we see words. 
import pandas as pd
import numpy as np

# Scikit learn is a Swiss Army knife for data scientists, with many tools in its folds. 
# TfidfVectorizer will turn words into numbers to allow Orange to see our text.
# Cosine Similarity allows Orange to draw relationships between words. 
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

##### 1) Read the Data

In [25]:
# Using Pandas, import the Amazon Web Service FAQ
df = pd.read_csv("aws_faq.csv")
df.dropna(inplace=True)

##### 2) Teach Orange to learn from the vectorized words.

In [10]:
# Here is a grand union of Numpy, Pandas and scikit learn; without which these two lines of code would be hundreds in length. 
vectorize = TfidfVectorizer()
vectorize.fit(np.concatenate((df.Question, df.Answer)))

TfidfVectorizer()

##### 3) Turn the questions into vectors. 

In [14]:
question_to_vectors = vectorize.transform(df.Question)

##### 4) Build a basic interface where users can chat with Orange. 

In [27]:
print("Hello dear friend, what would you like to say? Say 'Bye' to leave me to my solitude. Please don't say 'Bye'...")
while True:
    
    # User input
    user_question = input()
    
    if user_question == 'Bye':
        print("Well... I guess this is it... Parting is such sweet sorrow. *sniff*")
        break

    # Orange looks for all applicable answers 
    input_question_vector = vectorize.transform([user_question])

    # Orange uses cosine similarity to calculate connections between user's words and recorded question-answer pairs
    deduction = cosine_similarity(input_question_vector, question_to_vectors)

    # Orange decides which question-answer pair is most appropriate and conjures forth the answer. 
    best = np.argmax(deduction, axis=1)

    # Orange replies with the answer
    print("Orange: " + df.Answer.iloc[best].values[0])

Hello dear friend, what would you like to say? Say 'Bye' to leave me to my solitude. Please don't say 'Bye'...


 Bye


Well... I guess this is it... Parting is such sweet sorrow. *sniff*


###### Play around with the questions and answers, witness the human-perfect dictation, yet dire limitation in responsiveness. This is a simple retrieval-based chatbot with a rather large pregenerated corpus provided by Amazon. 
###### Open the .csv file and peruse the large volume of questions and answers. Imagine the amount of work required to create a useful corpus of your own to answer potential questions and then training a retreival chatbot like Orange-bot on it. Only to have this frustrating lack of responsiveness. 




##### Despite being far more experimental, difficult to create and computationally expensive; this is why 'generative chatbots' are the future. Which leads us to Grape-bot, the next bot you'll be introduced to. Until then, stay curious!