# LogRhythm Chatbot

This script implements a chatbot using Word2Vec embeddings and TF-IDF vectors to provide relevant information based on user input. The overall process includes:

1. **Importing Necessary Libraries**: Importing libraries required for data processing, embedding generation, and model loading.
2. **Downloading NLTK Resources**: Downloading necessary NLTK resources for tokenization.
3. **Loading Models and Data**: Loading the trained Word2Vec model, TF-IDF vectorizer, and cleaned data with categories.
4. **Defining Categories and Keywords**: Defining categories and their associated keywords for categorizing user input.
5. **Determining Category**: Implementing a function to determine the category of user input based on keywords.
6. **Generating Combined Embedding**: Implementing a function to generate a combined embedding for a given text.
7. **Finding Most Relevant Document**: Implementing a function to find the most relevant document based on user input.
8. **Generating Chatbot Response**: Implementing a function to generate a chatbot response based on user input and relevant documents.
9. **Main Chatbot Interaction Loop**: Implementing the main interaction loop where users can input queries and receive responses from the chatbot.

### Import Libraries and Resources

In [None]:
# Importing necessary libraries
import pandas as pd
import numpy as np
from gensim.models import Word2Vec
from sklearn.metrics.pairwise import cosine_similarity
import joblib
from nltk.tokenize import word_tokenize

# Downloading necessary NLTK resources
import nltk
nltk.download('punkt')

### Load CSV and Trained Model Data

In [None]:
#Loading the trained Word2Vec model and TF-IDF vectorizer
word2vec_model = Word2Vec.load("word2vec_model.bin")
tfidf_vectorizer = joblib.load("tfidf_model.pkl")

# Loading the cleaned data with categories
df = pd.read_csv('cleaned_section_data_with_categories.csv')

# Loading the combined embeddings
combined_embeddings = np.load('combined_features.npy')

### Define Categories with Associated Keywords

In [None]:
# Defining categories and their associated keywords
categories = {
    'Installation & Setup': [
        'install', 'setup', 'implementation', 'deployment', 'configure', 'initialization', 
        'installing', 'deploy', 'configuration', 'set-up', 'initiate', 'launch', 'activate',
        'how to install', 'setting up', 'installation guide', 'deploying', 'configuring'
    ],
    'Maintenance & Management': [
        'maintain', 'maintenance', 'servicing', 'management', 'optimization', 'service', 
        'manage', 'routine check', 'system upkeep', 'system care', 'upkeep', 'tune-up',
        'maintaining', 'managing', 'service routine', 'optimizing', 'how to maintain'
    ],
    'Troubleshooting & Support': [
        'troubleshoot', 'error', 'issue', 'problem', 'diagnosis', 'resolution', 'fix', 
        'solve', 'rectify', 'repair', 'resolve', 'correct', 'debug', 'fault finding',
        'troubleshooting', 'solving issues', 'fixing errors', 'diagnosing problems', 'resolving'
    ],
    'Upgrades & Updates': [
        'upgrade', 'update', 'new version', 'patch', 'release', 'enhancement', 'updating', 
        'upgrading', 'version upgrade', 'system update', 'software update', 'patching',
        'how to upgrade', 'applying updates', 'version updating', 'software enhancement'
    ],
    'General Information & Overview': [
        'overview', 'introduction', 'info', 'summary', 'guide', 'documentation', 
        'information', 'details', 'background', 'basics', 'general data', 'key points',
        'what is', 'explain', 'description of', 'details about'
    ],
    'Security & Monitoring': [
        'surveillance', 'log management', 'event tracking', 'real-time analysis', 
        'security watch', 'monitoring', 'security check', 'system monitoring', 'network watch',
        'security overview', 'monitoring setup', 'event tracking system'
    ],
    'Threat Detection & Analysis': [
        'threat detection', 'anomaly detection', 'intrusion detection', 'threat intelligence', 
        'security alerts', 'risk detection', 'threat identification', 'vulnerability detection', 
        'security threat detection', 'analyzing threats', 'identifying risks', 'detecting anomalies'
    ],
    'Incident Response & Management': [
        'incident response', 'incident management', 'forensics', 'mitigation', 'recovery', 
        'incident handling', 'crisis management', 'incident analysis', 'emergency response',
        'responding to incidents', 'managing incidents', 'incident recovery'
    ],
    'Compliance & Auditing': [
        'compliance', 'regulatory compliance', 'audit', 'reporting', 'policy enforcement', 
        'regulation management', 'compliance tracking', 'legal compliance', 'audit management',
        'compliance policies', 'auditing processes', 'regulatory reporting'
    ],
    'Integration & Compatibility': [
        'integration', 'compatibility', 'third-party integration', 'API', 'interoperability', 
        'system merging', 'software integration', 'data integration', 'platform integration',
        'integrating systems', 'API usage', 'compatibility issues'
    ],
    'Network Security & Protection': [
        'network security', 'firewall', 'traffic analysis', 'intrusion prevention', 
        'network protection', 'cybersecurity', 'network defense', 'network safeguard',
        'protecting networks', 'network firewalls', 'cybersecurity measures'
    ]
}

## Define Functions for the Chatbot

Here the functions used by the Chatbot are defined

### Determining Category

This is used to help associate intent from user input

In [None]:
# Function to determine the category of user input based on keywords
def determine_category(user_input):
    """
    Determine the category of user input based on keywords.

    Args:
    - user_input (str): User input text.

    Returns:
    - str: Category determined from the keywords.
    """
    for category, keywords in categories.items():
        if any(keyword in user_input.lower() for keyword in keywords):
            return category
    return 'Other'

### Generating Combined Embedding


In [None]:
# Function to generate a combined embedding for a given text
def get_combined_embedding(text):
    """
    Generate a combined embedding for a given text using Word2Vec and TF-IDF.

    Args:
    - text (str): Input text.

    Returns:
    - np.ndarray: Combined embedding vector.
    """
    words = word_tokenize(text)
    word_embeddings = [word2vec_model.wv[word] for word in words if word in word2vec_model.wv]
    w2v_embedding = np.mean(word_embeddings, axis=0) if word_embeddings else np.zeros(word2vec_model.vector_size)
    tfidf_embedding = tfidf_vectorizer.transform([text]).toarray()[0]
    return np.hstack((w2v_embedding, tfidf_embedding))

### Finding Most Relevant Document

In [None]:
# Function to find the most relevant document
def find_most_relevant_document(input_text, filtered_df):
    """
    Find the most relevant document based on user input.

    Args:
    - input_text (str): User input text.
    - filtered_df (DataFrame): Filtered DataFrame containing relevant documents.

    Returns:
    - int: Index of the most relevant document.
    - float: Cosine similarity score with the most relevant document.
    """
    input_embedding = get_combined_embedding(input_text)
    max_similarity = 0
    most_similar_doc_index = None
    for index, row in filtered_df.iterrows():
        doc_embedding = combined_embeddings[index]
        similarity = cosine_similarity([input_embedding], [doc_embedding])[0][0]
        if similarity > max_similarity:
            max_similarity = similarity
            most_similar_doc_index = index
    return most_similar_doc_index, max_similarity

In [None]:
### Generating Chatbot Response

In [None]:
# Function to generate chatbot response
def generate_chatbot_response(user_input):
    """
    Generate a response from the chatbot based on user input.

    Args:
    - user_input (str): User input text.

    Returns:
    - str: Chatbot response.
    """
    user_category = determine_category(user_input)
    filtered_df = df[df['Category'] == user_category]
    doc_index, similarity = find_most_relevant_document(user_input, filtered_df)
    
    if similarity > 0.15 and doc_index is not None:
        full_text = df.iloc[doc_index]['content']  # Fetching the unprocessed text
        return full_text
    else:
        return "I'm sorry, I couldn't find relevant information based on your query."

In [None]:
### Main Chatbot Interaction Loop

In [None]:
# Main chatbot interaction loop
print("LogRhythm Chatbot — Type 'quit' to exit.")
while True:
    user_input = input("You: ").strip()
    if user_input.lower() == "quit":
        print("\033[1mLogRhythm Chatbot: Goodbye!\033[0m")
        break
    response = generate_chatbot_response(user_input)
    print("\033[1mLogRhythm Chatbot:\033[0m", response)
