# RAG Knowledge Base Generation
This notebook generates knowledge base documents for the flights chatbot assistant.

In [8]:
import json
import yaml
import pickle
from pathlib import Path

## Generate FAQ Knowledge Base

In [9]:
# Generate comprehensive FAQ data
faqs_data = [
    "What is the baggage allowance for domestic flights? Each passenger is allowed one checked bag up to 23kg and one carry-on bag up to 8kg.",
    "How early should I arrive at the airport before my flight? It is recommended to arrive at least 2 hours before domestic flights and 3 hours before international flights.",
    "Can I change my flight after booking? Yes, you can change your flight up to 24 hours before departure, subject to availability and fare difference.",
    "What items are prohibited in carry-on luggage? Prohibited items include liquids over 100ml, sharp objects, and flammable materials.",
    "How do I check in online? Visit our website or mobile app, enter your booking reference, and follow the instructions to check in online.",
    "What should I do if my flight is delayed or cancelled? You will be notified via email or SMS. You can rebook or request a refund through our customer service.",
    "Are pets allowed on board? Small pets are allowed in the cabin with prior reservation. Larger pets must travel in the cargo hold.",
    "Do you offer special assistance for passengers with reduced mobility? Yes, please contact our support team at least 48 hours before your flight to arrange assistance.",
    "Can I select my seat in advance? Yes, seat selection is available during booking and online check-in, subject to availability.",
    "What is the policy for unaccompanied minors? Children aged 5-12 can travel alone with our unaccompanied minor service. Additional fees apply.",
    # Chatbot capabilities
    "What can I ask the chatbot? You can ask about flight availability, booking flights, checking your bookings, cancelling flights, and general airline FAQs.",
    "Can I book a flight through the chatbot? Yes, you can book flights by providing the flight ID and confirming your booking.",
    "How do I cancel a booking using the chatbot? You can cancel a booking by providing the booking ID and confirming the cancellation.",
    "Can the chatbot help me find flights? Yes, you can search for flights by providing the origin, destination, and departure date.",
    "How do I check my bookings with the chatbot? You can retrieve your bookings by asking for your flight reservations. You can filter by status (booked, cancelled, completed), booking date, or departure date, and navigate through multiple pages if you have many bookings.",
    "Can I filter my bookings when checking them? Yes, you can filter your bookings by status (booked, cancelled, completed), by the date you made the booking, or by the flight departure date. You can also specify how many bookings to show per page.",
    "What booking statuses can I filter by? You can filter by 'booked' (active upcoming bookings), 'cancelled' (bookings you've cancelled), or 'completed' (flights that have already departed).",
    "What information do I need to book a flight? You need the flight ID to book a flight through the chatbot.",
    "Can I change my booking status using the chatbot? Yes, you can update your booking status to cancelled or other statuses as needed.",
]

print(f"Generated {len(faqs_data)} FAQ entries.")
print("Sample FAQ:", faqs_data[0][:100] + "...")

Generated 19 FAQ entries.
Sample FAQ: What is the baggage allowance for domestic flights? Each passenger is allowed one checked bag up to ...


## Generate Product Documentation

In [10]:
# Generate product documentation
product_docs = [
    "Flight Search API: The flight search functionality allows users to search for flights by origin, destination, and departure date. All search parameters are optional for flexible searching. Users can also paginate through results.",
    "Booking Management: Users can create, view, update, and cancel flight bookings through the system. Advanced filtering options are available including status filters, date filters, and pagination support.",
    "User Authentication: The system uses JWT tokens for secure user authentication. All API calls require a valid bearer token for access control.",
    "Flight Listing: Users can browse all available flights with pagination support. The system shows comprehensive flight information including prices, airlines, and schedules.",
    "Booking Status Management: Bookings can have different statuses - booked (active), cancelled, or completed. Users can update booking statuses and view booking history.",
    "Real-time Updates: The system provides real-time information about flight delays, cancellations, and schedule changes through notifications.",
    "Multi-channel Support: Users can interact with the system through web interface, mobile app, or chatbot for maximum convenience.",
    "Data Security: All user data is encrypted and securely stored. The system follows industry best practices for data protection and privacy.",
    "Customer Support: Users can access 24/7 customer support through various channels including chat, email, and phone for assistance with their bookings."
]

print(f"Generated {len(product_docs)} product documentation entries.")
print("Sample Doc:", product_docs[0][:100] + "...")

Generated 9 product documentation entries.
Sample Doc: Flight Search API: The flight search functionality allows users to search for flights by origin, des...


## Save Knowledge Base to Files

In [11]:
# Create knowledge base directory if it doesn't exist
knowledge_base_dir = Path("knowledge_base")
knowledge_base_dir.mkdir(exist_ok=True)

# Save FAQs as JSON
faqs_file = knowledge_base_dir / "airline_faqs.json"
with open(faqs_file, 'w', encoding='utf-8') as f:
    json.dump(faqs_data, f, ensure_ascii=False, indent=2)
    
print(f"✅ FAQs saved to {faqs_file}")

# Save Product Documentation as YAML
docs_file = knowledge_base_dir / "product_docs.yaml"
with open(docs_file, 'w', encoding='utf-8') as f:
    yaml.dump(product_docs, f, allow_unicode=True, default_flow_style=False)
    
print(f"✅ Product documentation saved to {docs_file}")

✅ FAQs saved to knowledge_base/airline_faqs.json
✅ Product documentation saved to knowledge_base/product_docs.yaml


## Save Knowledge Base as Pickle Binary

In [12]:
# Combine all knowledge base documents
all_knowledge_docs = faqs_data + product_docs

# Save as a pickle binary
pickle_file = knowledge_base_dir / "knowledge_base.pkl"
with open(pickle_file, 'wb') as f:
    pickle.dump(all_knowledge_docs, f)
    
print(f"✅ Knowledge base saved as pickle binary to {pickle_file}")
print(f"Total documents in knowledge base: {len(all_knowledge_docs)}")

# Verify the pickle file by loading it
with open(pickle_file, 'rb') as f:
    loaded_pickle_docs = pickle.load(f)
    
print(f"✅ Verified: Loaded {len(loaded_pickle_docs)} documents from pickle")
print("\nFirst document from pickle:")
print(loaded_pickle_docs[0][:100] + "...")

✅ Knowledge base saved as pickle binary to knowledge_base/knowledge_base.pkl
Total documents in knowledge base: 28
✅ Verified: Loaded 28 documents from pickle

First document from pickle:
What is the baggage allowance for domestic flights? Each passenger is allowed one checked bag up to ...


## Load and Test Knowledge Base

In [13]:
def load_knowledge_base():
    """Load and combine all knowledge base documents."""
    pickle_file = knowledge_base_dir / "knowledge_base.pkl"
    
    # Try to load from pickle first
    if pickle_file.exists():
        with open(pickle_file, 'rb') as f:
            return pickle.load(f)
        
    # Fallback to loading from individual files
    all_docs = []
    
    # Load FAQs
    with open(knowledge_base_dir / "airline_faqs.json", 'r', encoding='utf-8') as f:
        faqs = json.load(f)
        all_docs.extend(faqs)
    
    # Load product docs
    with open(knowledge_base_dir / "product_docs.yaml", 'r', encoding='utf-8') as f:
        docs = yaml.safe_load(f)
        all_docs.extend(docs)   
        
    return all_docs

# Test the loading function
loaded_docs = load_knowledge_base()
print(f"✅ Successfully loaded {len(loaded_docs)} knowledge base entries.")
print("\nFirst 3 entries:")
for i, doc in enumerate(loaded_docs[:3], 1):
    print(f"{i}. {doc[:80]}...")

✅ Successfully loaded 28 knowledge base entries.

First 3 entries:
1. What is the baggage allowance for domestic flights? Each passenger is allowed on...
2. How early should I arrive at the airport before my flight? It is recommended to ...
3. Can I change my flight after booking? Yes, you can change your flight up to 24 h...


## Example: RAG Retrieval Simulation

In [14]:
# Simple keyword-based retrieval example
def simple_retrieval(query, documents, top_k=3):
    """Simple keyword-based document retrieval."""
    query_words = query.lower().split()
    scores = []
    
    for doc in documents:
        doc_lower = doc.lower()
        score = sum(1 for word in query_words if word in doc_lower)
        scores.append((score, doc))
        
    # Sort by score and return top_k documents
    scores.sort(reverse=True, key=lambda x: x[0])
    return [doc for score, doc in scores[:top_k] if score > 0]

# Test queries
test_queries = [
    "How do I cancel my booking?",
    "What can I bring in carry-on luggage?",
    "Can the chatbot help me book flights?"
]

print("🔍 RAG Retrieval Examples:")
print("=" * 50)

for query in test_queries:
    print(f"\nQuery: {query}")
    relevant_docs = simple_retrieval(query, loaded_docs, top_k=2)
    
    if relevant_docs:
        for i, doc in enumerate(relevant_docs, 1):
            print(f"  {i}. {doc[:100]}...")
    else:
        print("  No relevant documents found.")
    print("-" * 30)

🔍 RAG Retrieval Examples:

Query: How do I cancel my booking?
  1. How do I check my bookings with the chatbot? You can retrieve your bookings by asking for your fligh...
  2. How early should I arrive at the airport before my flight? It is recommended to arrive at least 2 ho...
------------------------------

Query: What can I bring in carry-on luggage?
  1. What items are prohibited in carry-on luggage? Prohibited items include liquids over 100ml, sharp ob...
  2. What is the policy for unaccompanied minors? Children aged 5-12 can travel alone with our unaccompan...
------------------------------

Query: Can the chatbot help me book flights?
  1. Can the chatbot help me find flights? Yes, you can search for flights by providing the origin, desti...
  2. What can I ask the chatbot? You can ask about flight availability, booking flights, checking your bo...
------------------------------
