### Step 1: Install Required Libraries

In [1]:
!pip install sentence-transformers numpy

Collecting sentence-transformers
  Downloading sentence_transformers-3.2.1-py3-none-any.whl (255 kB)
[K     |████████████████████████████████| 255 kB 240 kB/s eta 0:00:01
Collecting huggingface-hub>=0.20.0
  Downloading huggingface_hub-0.35.0-py3-none-any.whl (563 kB)
[K     |████████████████████████████████| 563 kB 2.2 MB/s eta 0:00:01
[?25hCollecting torch>=1.11.0
  Downloading torch-2.4.1-cp38-cp38-manylinux1_x86_64.whl (797.1 MB)
[K     |████████████████████████████████| 797.1 MB 13 kB/s  eta 0:00:011   |▎                               | 5.8 MB 2.3 MB/s eta 0:05:43     |▊                               | 18.2 MB 578 kB/s eta 0:22:27     |█                               | 22.8 MB 1.8 MB/s eta 0:07:23     |█▌                              | 37.4 MB 871 kB/s eta 0:14:32     |██▌                             | 63.2 MB 2.6 MB/s eta 0:04:48     |█████▍                          | 135.6 MB 764 kB/s eta 0:14:25     |██████▌                         | 161.4 MB 1.1 MB/s eta 0:09:26     |█████

In [2]:
import json
import numpy as np
from sentence_transformers import SentenceTransformer
import os

# Define relative paths from the 'notebooks' directory
DATA_PATH = '../data/faq_data.json'
MODELS_DIR = '../models'
EMBEDDINGS_PATH = os.path.join(MODELS_DIR, 'faq_embeddings.npy')


### Step 2: Load the Knowledge Base (FAQs)


In [3]:
try:
    with open(DATA_PATH, 'r') as f:
        faq_data = json.load(f)
    print(f"Successfully loaded {len(faq_data)} Q&A pairs from {DATA_PATH}")
except FileNotFoundError:
    print(f"Error: The file {DATA_PATH} was not found. Please make sure it exists.")
    faq_data = []


Successfully loaded 14 Q&A pairs from ../data/faq_data.json


### Step 4: Generate and Save the Embeddings

This is the core AI step. We load a powerful pre-trained model and use it to process our questions. The resulting numerical vectors are then saved to a file, which our Streamlit app can load very quickly.


In [4]:
if faq_data:
    # Load the pre-trained model
    print("Loading the Sentence Transformer model ('all-MiniLM-L6-v2'). This may take a moment...")
    model = SentenceTransformer('all-MiniLM-L6-v2')
    print("Model loaded successfully.")

    # Extract just the questions to be encoded
    faq_questions = [item['question'] for item in faq_data]
    
    # Generate the embeddings
    print("Generating embeddings for all questions...")
    question_embeddings = model.encode(faq_questions, show_progress_bar=True)
    
    # Ensure the models directory exists
    if not os.path.exists(MODELS_DIR):
        os.makedirs(MODELS_DIR)
        print(f"Created directory: {MODELS_DIR}")

    # Save the embeddings to the .npy file
    np.save(EMBEDDINGS_PATH, question_embeddings)
    
    print("\n--- Process Complete ---")
    print(f"Embeddings have been generated and saved to: {EMBEDDINGS_PATH}")
    print(f"Shape of embeddings array: {question_embeddings.shape}")
else:
    print("Skipping embedding generation because no FAQ data was loaded.")


Loading the Sentence Transformer model ('all-MiniLM-L6-v2'). This may take a moment...


Batches: 100%|██████████| 1/1 [00:00<00:00, 19.78it/s]

Model loaded successfully.
Generating embeddings for all questions...

--- Process Complete ---
Embeddings have been generated and saved to: ../models/faq_embeddings.npy
Shape of embeddings array: (14, 384)



