# Kefir Q&A Bot

This notebook implements a simple question-answering system about kefir using natural language processing techniques.

---
## 1. Setup and Dependencies

First, let's import all necessary libraries and set up our environment.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import re

# Download required NLTK data (run this cell once)
nltk.download('punkt')
nltk.download('stopwords')

---
## 2. Data Loading and Preprocessing

In this section, we'll load the kefir knowledge base and preprocess it for better performance.

In [None]:
def load_kefir_knowledge():
    """
    Load kefir knowledge from external file.
    Returns: List of text segments about kefir.
    """
    try:
        with open('kefir_knowledge.txt', 'r', encoding='utf-8') as file:
            content = file.read()
            # Split content into paragraphs, assuming paragraphs are separated by double newlines
            paragraphs = [p.strip() for p in content.split('\n\n') if p.strip()]
        return paragraphs
    except FileNotFoundError:
        print("Warning: 'kefir_knowledge.txt' not found. Please create this file with kefir information.")
        return []

def preprocess_text(text):
    """
    Preprocess text by removing special characters and converting to lowercase.
    Args:
        text (str): Input text.
    Returns:
        str: Preprocessed text.
    """
    # Convert to lowercase
    text = text.lower()
    # Remove special characters (keep alphanumeric and whitespace)
    text = re.sub(r'[^\w\s]', '', text)
    return text

# Load and preprocess kefir knowledge
kefir_knowledge = load_kefir_knowledge()
processed_knowledge = [preprocess_text(text) for text in kefir_knowledge]

print(f"Loaded {len(kefir_knowledge)} knowledge segments.")
if not kefir_knowledge:
    print("Please create 'kefir_knowledge.txt' with information about kefir to use the bot.")

---
## 3. Question Answering System

This section implements the core Q&A functionality using **TF-IDF** (Term Frequency-Inverse Document Frequency) and **cosine similarity**.

In [None]:
class KefirQABot:
    def __init__(self, knowledge_base):
        """
        Initialize the Q&A bot with a knowledge base.
        Args:
            knowledge_base (list): List of text segments about kefir.
        """
        self.knowledge_base = knowledge_base
        # Initialize TfidfVectorizer
        self.vectorizer = TfidfVectorizer()
        # Fit the vectorizer to the knowledge base and transform it into TF-IDF vectors
        self.knowledge_vectors = self.vectorizer.fit_transform(knowledge_base)
        print("Kefir Q&A Bot initialized and knowledge base vectorized.")
    
    def answer_question(self, question, top_k=1):
        """
        Answer a question about kefir by finding the most similar knowledge segment.
        Args:
            question (str): The question to answer.
            top_k (int): Number of top answers to return.
        Returns:
            list: Top k answers from the knowledge base.
        """
        if not self.knowledge_base:
            return ["I cannot answer questions as the knowledge base is empty. Please check 'kefir_knowledge.txt'."]

        # Preprocess the question using the same method as the knowledge base
        processed_question = preprocess_text(question)
        
        # Transform the preprocessed question into a TF-IDF vector
        question_vector = self.vectorizer.transform([processed_question])
        
        # Calculate cosine similarity scores between the question and all knowledge base vectors
        similarity_scores = cosine_similarity(question_vector, self.knowledge_vectors).flatten()
        
        # Get the indices of the top k answers based on similarity scores
        # argsort() gives indices that would sort the array, [-top_k:] gets the last k (highest scores),
        # and [::-1] reverses them to be in descending order of similarity.
        top_indices = similarity_scores.argsort()[-top_k:][::-1]
        
        # You could add a relevance threshold here if desired
        # For example: if similarity_scores[top_indices[0]] < 0.2: return ["Sorry, I don't have a very relevant answer for that."]

        # Return the original knowledge base segments corresponding to the top indices
        return [self.knowledge_base[idx] for idx in top_indices]

# Initialize the bot with the processed knowledge
qa_bot = KefirQABot(processed_knowledge)

---
## 4. Interactive Testing

Now, let's test the Q&A bot with some example questions. Run the cell below to start an interactive session.

In [None]:
def test_qa_bot():
    """
    Interactive function to test the Q&A bot in a loop.
    """
    print("\n--- Welcome to the Kefir Q&A Bot! ---")
    print("Type your questions about kefir. Type 'quit' to exit.\n")
    
    while True:
        question = input("Ask a question about kefir: ")
        
        if question.lower() == 'quit':
            print("Thank you for using the Kefir Q&A Bot. Goodbye!")
            break
            
        answers = qa_bot.answer_question(question)
        print("\nAnswer:")
        if answers:
            for i, answer in enumerate(answers, 1):
                print(f"{i}. {answer}")
        else:
            print("No answer found. Please ensure 'kefir_knowledge.txt' contains information.")

# Uncomment the line below and run this cell to start the interactive test
# test_qa_bot()

---
## 5. Example Questions

Here are some example questions you can try with the bot:

1.  What is kefir?
2.  How is kefir made?
3.  What are the health benefits of kefir?
4.  How long does it take to ferment kefir?
5.  What's the difference between kefir and yogurt?
6.  How should kefir be stored?
7.  Can kefir be made with non-dairy milk?
8.  Is kefir good for digestion?
9.  What nutrients are in kefir?