# Introduction

 In this innovative project, we delve into the world of Natural Language Processing (NLP) with a specialized focus on flight data. Our goal is to create an interactive dialogue system that not only understands user inputs but also provides insightful responses while incorporating Part-of-Speech (POS) tagging for enhanced linguistic analysis.

To achieve this, we harnessed the capabilities of libraries like NLTK, spaCy, and scikit-learn to build a robust dialogue analysis platform. The data at the core of this project comes from flight-related dialogues, enabling us to create an AI-driven system that engages users in conversations related to aviation.

Our endeavor goes beyond mere text processing. By understanding grammatical constructs through POS tagging, ConvoBot+ offers a unique perspective on how language shapes communication within the flight domain.

# Section 1: Import Libraries
I began by importing essential libraries, which lay the foundation for data analysis, natural language processing, and machine learning.

In [None]:
import nltk
import random
import string
import pandas as pd
import spacy
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

nltk.download('all')
nlp = spacy.load('en_core_web_sm')


# Section 2: Load and Pre-process Data
Taking the loaded dataset, I organized it into agent and customer statements, preparing it for further analysis.

In [2]:
from datasets import load_dataset

dataset = load_dataset("air_dialogue")
train_dialogues = dataset['train']['dialogue'][:1000]

agent_statements_list = []
customer_statements_list = []

for dialogue in train_dialogues:
    for statement in dialogue:
        if statement.startswith("agent:"):
            agent_statement = statement.replace("agent:", "").strip()
            agent_statements_list.append(agent_statement)
        elif statement.startswith("customer:"):
            customer_statement = statement.replace("customer:", "").strip()
            customer_statements_list.append(customer_statement)

agent_df = pd.DataFrame({'Agent Statements': agent_statements_list})
customer_df = pd.DataFrame({'Customer Statements': customer_statements_list})


# Section 3: Tokenization and Lemmatization
To facilitate text processing, I broke down the statements into sentences and words, ensuring consistency through lemmatization.

In [3]:
agent_df.rename(columns={'Agent Statements': 'agent'}, inplace=True)
customer_df.rename(columns={'Customer Statements': 'customer'}, inplace=True)
agent_df['customer'] = customer_df['customer']
df1=agent_df

df1['agent'] = df1['agent'].str.lower()
raw_doc = agent_df['agent']

sentence_tokens = []
for doc in raw_doc:
    sentences = nltk.sent_tokenize(doc)
    sentence_tokens.extend(sentences)

word_tokens = []
for sentence in sentence_tokens:
    words = nltk.word_tokenize(sentence)
    word_tokens.extend(words)

lemmer = nltk.stem.WordNetLemmatizer()

def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]


# Section 4: Token Normalization
By removing punctuation and converting text to lowercase, I standardized tokens for a more uniform analysis.

In [4]:
remove_punc_dict = dict((ord(punct), None) for punct in string.punctuation)

def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punc_dict)))


# Section 5: Greeting Functions
I created functions to identify and respond to user greetings, adding a touch of personalized interaction.

In [5]:
greet_inputs = ('hello', 'hi', 'whassup', 'how are you?', 'how are you')
greet_responses = ('hi', 'Hey', 'Hey there!', 'Hellothere!!')

def greet(sentence):
    for word in sentence.split():
        if word.lower() in greet_inputs:
            return random.choice(greet_responses)


# Section 6: Response Function
With a specialized function, I generated responses by analyzing the input's TF-IDF vectorization and cosine similarity.

In [6]:
def response(user_response):
    robol_response = ''
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sentence_tokens)
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx = vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]

    if req_tfidf == 0:
        robol_response = robol_response + "I am sorry. Unable to understand you!"
        return robol_response, None
    else:
        robo1_response = robol_response + sentence_tokens[idx]
        doc = nlp(robo1_response)
        pos_tags = [(token.text, token.pos_) for token in doc]
        return robo1_response, pos_tags


# Section 7: Main Loop for Conversations
A central loop orchestrated user-bot interactions, encompassing greetings, responses, and farewells.

In [10]:
print('Hello, I am the Blitz Bot.')
while True:
    user_input = input('Choose an option:\n1. Conversation\n2. POS Tagging\n3. Exit\nYour choice: ')
    if user_input == '1':
        flag = True
        print('What would you like to ask?')
        while flag:
            user_response = input('User: ')
            user_response = user_response.lower()

            if user_response != 'bye':
                if user_response == 'thank you' or user_response == 'thanks':
                    flag = False
                    print('Bot: You are welcome!')
                else:
                    if greet(user_response) is not None:
                        print('Bot: ' + greet(user_response))
                    else:
                        sentence_tokens.append(user_response)
                        word_tokens = word_tokens + nltk.word_tokenize(user_response)
                        final_words = list(set(word_tokens))

                        bot_response, pos_tags = response(user_response)
                        print('Bot:', bot_response)
                        sentence_tokens.remove(user_response)
            else:
                flag = False
                print('Bot: Goodbye!')
                
    elif user_input == '2':
        sentence = input('Enter a sentence for POS tagging: ')
        doc = nlp(sentence)
        pos_tags = [(token.text, token.pos_) for token in doc]
        print('POS tags:', pos_tags)
    elif user_input == '3':
        print('Goodbye!')
        break
    else:
        print('Invalid input. Please choose a valid option.')

Hello, I am the Blitz Bot.
Choose an option:
1. Conversation
2. POS Tagging
3. Exit
Your choice: 1
What would you like to ask?
User: hello
Bot: Hey
User: my name is edward
Bot: hello edward williams.
User: how are you today
Bot: what can i do for you today?
User: i need a ticket
Bot: to cancel your ticket, we need your name.
User: smith
Bot: hello smith, how can i help you for today?
User: any flight for today
Bot: what can i do for you today?
User: any flight?
Bot: no flights were found by your name.
User: check again
Bot: sure, i will check it.
User: thanks
Bot: You are welcome!
Choose an option:
1. Conversation
2. POS Tagging
3. Exit
Your choice: 2
Enter a sentence for POS tagging: Amazon is own by united state goverment
POS tags: [('Amazon', 'PROPN'), ('is', 'AUX'), ('own', 'ADJ'), ('by', 'ADP'), ('united', 'PROPN'), ('state', 'PROPN'), ('goverment', 'NOUN')]
Choose an option:
1. Conversation
2. POS Tagging
3. Exit
Your choice: 2
Enter a sentence for POS tagging: my name is oghenef

# Limitation
One limitation of this project is that it relies on the data it has been trained on and cannot handle unseen data effectively. Since its responses are based on patterns and similarities in the provided dataset, it may struggle to provide meaningful responses to inputs that deviate significantly from the training data.

Additionally, the system lacks the ability to generate entirely new text on its own. While it can select responses from the existing dataset based on similarity, it doesn't possess the creative capacity to generate original text beyond what it has been pre-programmed with. This limitation restricts its adaptability to novel or unanticipated scenarios, which could hinder its performance in dynamic conversation contexts.