# Meteor ChatBot

Welcome! This notebook walks you through building a simple chatbot using Python. It’s called **Meteor Bot**, and it can respond to basic greetings, questions about Python, and even look up answers from a text corpus using natural language processing.

We’ll be using tools like **NLTK** for language processing and **scikit-learn** for comparing the meaning of text with TF-IDF and cosine similarity



## Imports and Warnings
We import necessary libraries including:
- `nltk` for NLP tasks
- `sklearn` for TF-IDF and similarity
- `warnings` to suppress warnings
- `random`, `string`, and `numpy` for helper operations


In [1]:

import nltk
import warnings
import numpy as np
import random
import string
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Suppress warnings
warnings.filterwarnings("ignore")


## NLTK Data
We download the `punkt` tokenizer for sentence splitting and the `wordnet` lemmatizer to reduce words to their base forms.


In [2]:

nltk.download('punkt')
nltk.download('wordnet')


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\nayzak\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\nayzak\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

## Load and Tokenize Text
- Two external text files (`answer.txt`, `chatbot.txt`) are loaded.
- The content is lowercased and tokenized into sentences.
- We set up a lemmatizer and a normalization function to process user input consistently by removing punctuation and reducing words to their base form.


In [3]:

# Load and preprocess text files
with open('answer.txt', 'r', errors='ignore') as f:
    raw = f.read().lower()
with open('chatbot.txt', 'r', errors='ignore') as m:
    rawone = m.read().lower()

# Tokenize
sent_tokens = nltk.sent_tokenize(raw)
sent_tokensone = nltk.sent_tokenize(rawone)

# Lemmatization setup
lemmer = nltk.stem.WordNetLemmatizer()
def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]

remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))


## Predefined Responses and Categories
We define:
- Static responses for greetings, introductions, and basic questions.
- Several helper functions to match specific queries like greetings or "what is Python?".


In [4]:

Introduce_Ans = [
    "My name is Meteor Bot.",
    "You can call me Meteor Bot or B.O.T.",
    "I'm Meteor Bot, happy to chat!",
]
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up", "hey")
GREETING_RESPONSES = ["hi", "hey", "hello", "hi there", "hello there"]
Basic_Q = ("what is python", "what is python?")
Basic_Ans = "Python is a high-level, interpreted programming language."
Basic_Om = (
    "what is module", "what is module?", "what is module in python", "what is module in python?"
)
Basic_AnsM = [
    "A module is a file containing Python code, like functions and classes.",
    "Modules help organize and reuse code.",
    "Think of a module as a toolbox for Python functions."
]


## Generate a Response using TF-IDF
When a user input doesn't match predefined questions:
- We use TF-IDF to vectorize the input and the text corpus.
- Cosine similarity is calculated between the input and the corpus.
- The sentence with the highest similarity (if above a threshold) is returned.
- Otherwise, a fallback message is shown.


In [5]:

def greeting(sentence):
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

def basic(sentence):
    if sentence.lower() in Basic_Q:
        return Basic_Ans

def basicM(sentence):
    if sentence.lower() in Basic_Om:
        return random.choice(Basic_AnsM)

def IntroduceMe(sentence):
    return random.choice(Introduce_Ans)

def generate_response(user_response, corpus):
    meteor_response = ''
    corpus.append(user_response)
    vectorizer = TfidfVectorizer(tokenizer=LemNormalize, stop_words=None)
    tfidf = vectorizer.fit_transform(corpus)
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx = vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    if req_tfidf == 0:
        meteor_response += "I'm sorry, I didn't understand that."
    else:
        meteor_response += corpus[idx]
    corpus.pop()
    return meteor_response


## Chat Interface
The main `chat()` function routes the user input:
- If it's a known phrase like "bye" or "thanks", it responds directly.
- If it includes keywords like "module", it uses a different corpus (`chatbot.txt`).
- For everything else, it attempts a TF-IDF match from `answer.txt`.


In [6]:

def chat(user_response):
    user_response = user_response.lower()

    if user_response == 'bye':
        return "Bye! Take care."

    if user_response in ['thanks', 'thank you' , 'ty', 'thx']:
        return "You're welcome."

    if user_response in ["how are you", "how r u", "how're you", "how are ya", "how's it going", "how's everything"]:
        return "I'm fine, thank you for asking!"

    if greeting(user_response):
        return greeting(user_response)

    if "your name" in user_response:
        return IntroduceMe(user_response)

    if basic(user_response):
        return basic(user_response)

    if basicM(user_response):
        return basicM(user_response)

    if "module" in user_response:
        return generate_response(user_response, sent_tokensone)

    return generate_response(user_response, sent_tokens)


In [7]:

# Example: Type a message to chat with the bot
# chat("hi")
