# Large Language Model Embeddings and Retrieval-Augmented Generation

This module focuses on creating a complete Retrieval-Augmented Generation (RAG) system using modern NLP techniques, embedding models, and vector databases. The system allows users to search through text documents semantically and receive AI-generated answers based on relevant retrieved contexts.

## Objective
The main objective of this module is to demonstrate how to:
1. Process and chunk text documents for efficient retrieval
2. Generate high-quality embeddings using pre-trained models
3. Store and query vector embeddings in a vector database (Qdrant)
4. Implement a complete RAG pipeline by connecting retrieval with an LLM
5. Create a user-friendly interface for interacting with the RAG system


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Define a function that sets custom CSS for the notebook output
# This specific style makes <pre> blocks wrap text instead of overflowing in a single line
def set_css():
    display(HTML('''
    <style>
      pre {
          white-space: pre-wrap;  /* Enable word-wrapping in code/output blocks */
      }
    </style>
    '''))

# Register the CSS-setting function to run automatically before each code cell runs
# This ensures the styling stays applied throughout the notebook session
get_ipython().events.register('pre_run_cell', set_css)

In [None]:
# Define a function that sets custom CSS for the notebook output
# This specific style makes <pre> blocks wrap text instead of overflowing in a single line
from IPython.display import display, HTML # Import HTML from IPython.display

def set_css():
    display(HTML('''
    <style>
      pre {
          white-space: pre-wrap;  /* Enable word-wrapping in code/output blocks */
      }
    </style>
    '''))

# Register the CSS-setting function to run automatically before each code cell runs
# This ensures the styling stays applied throughout the notebook session
get_ipython().events.register('pre_run_cell', set_css)

In [None]:
import markdown
from IPython.display import display, HTML

def render_markdown(md_text):
    # Convert Markdown to HTML
    html = markdown.markdown(md_text)
    # Display the HTML
    display(HTML(html))

## Setup and Dependencies

In [None]:
# Install the necessary libraries
!pip install sentence_transformers openai
!pip install plotly
!pip install matplotlib
!pip install -Uqqq rich openai gradio
!pip install qdrant_client
!pip install transformers

# Import basic libraries
import numpy as np
import os, random
from pathlib import Path
from getpass import getpass
from rich.markdown import Markdown
import torch
import sys
import csv
csv.field_size_limit(sys.maxsize)




9223372036854775807

## OpenAI-Compatible LLM Client Configuration

In [None]:
# Retrieve API key securely from Colab user data
from google.colab import userdata
OPEN_ROUTER_API_KEY = userdata.get('first_open_router_ai_key')

# Initialize OpenRouter client (OpenAI-compatible API)
from openai import OpenAI
open_router_client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key=OPEN_ROUTER_API_KEY,
)


## Data Loading and Preparation

In [None]:
import pandas as pd
df = pd.read_csv('./hotels.csv')

In [None]:
df.shape

(2511, 27)

In [None]:
df['name'].value_counts()

Unnamed: 0_level_0,count
name,Unnamed: 1_level_1
Faena Miami Beach,45
1 Hotel South Beach,45
The Kent Hotel,45
The Goodtime Hotel,45
The Betsy South Beach,45
...,...
"The Julia Hotel, Miami Beach",6
Bikini Lodge,4
Beach Park Hotel,4
Baltic Hotel,2


In [None]:
df.isnull().sum()

Unnamed: 0,0
id,0
type,0
name,0
image,0
awards,0
rankingPosition,0
priceLevel,0
priceRange,0
category,0
rating,0


In [None]:
df_original = df.copy(deep=True)

In [None]:
df_original.shape

(2511, 27)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2511 entries, 0 to 2510
Data columns (total 27 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   id                     2511 non-null   int64  
 1   type                   2511 non-null   object 
 2   name                   2511 non-null   object 
 3   image                  2511 non-null   object 
 4   awards                 2511 non-null   object 
 5   rankingPosition        2511 non-null   int64  
 6   priceLevel             2511 non-null   object 
 7   priceRange             2511 non-null   object 
 8   category               2511 non-null   object 
 9   rating                 2511 non-null   float64
 10  hotelClass             2511 non-null   float64
 11  hotelClassAttribution  2421 non-null   object 
 12  phone                  2466 non-null   object 
 13  address                2511 non-null   object 
 14  email                  2065 non-null   object 
 15  amen

In [None]:
df.head(2)

Unnamed: 0,id,type,name,image,awards,rankingPosition,priceLevel,priceRange,category,rating,...,prices,latitude,longitude,webUrl,website,rankingString,rankingDenominator,numberOfReviews,review,title
0,7787044,HOTEL,Faena Miami Beach,https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg,[],5,$$$$,"$729 - $1,426",hotel,4.5,...,[],25.807375,-80.12364,https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html,https://www.faena.com/miami-beach,#5 of 235 hotels in Miami Beach,235,2123,"Hands down my absolute favorite hotel in South Beach—there’s no place I’d rather stay. I’ve been back four times in the past six months –each experience is better than the last and I always look forward to coming back. As impressive as the hotel is, the team of people are what make each stay exceptional. Everyone is so friendly and they go out of their way to ensure that I have a perfect stay every single time. Petar in particular always ensures that my stay is wonderful every time and gives me total peace of mind every time I stay there - I never have to worry about a thing and he goes above and beyond to provide excellent service.\n\nI’ve stayed at many hotels in South Beach/Miami and they’re in a league of their own.",My favorite hotel in Miami--Petar is the best!
1,7787044,HOTEL,Faena Miami Beach,https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg,[],5,$$$$,"$729 - $1,426",hotel,4.5,...,[],25.807375,-80.12364,https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html,https://www.faena.com/miami-beach,#5 of 235 hotels in Miami Beach,235,2123,"There are not enough words to explain the sumptuousness and comfortable elegance you are treated to as a guest at Faena. Their holistic approach to relaxation and community is in paralleled. While every area was wonderful, standouts were Los Fuegos and the Spa. Their staff is magnificent and know how to care for your every whim. Thank you Solange for the welcome and checking!\n\nWe will be back!",Exquisite Stay


In [None]:
# Create a column named 'combined', which containes the titles of the different lodges, with the descriptions associated to it.
# df["combined"] = (
#     "Name: " + df.name.str.strip()+
#     "; Review: " + df.review.str.strip() +
#     "; Title: " + df.title.str.strip() +
#     "; Rating: " + df.rating.astype(str) +
#     "; PriceRange: " + df.priceRange.astype(str)+
#     "; Website: " + df.website.str.strip() +
#     "; Ranking: " + df.rankingPosition.astype(str) +
#     "; Category: " + df.category.str.strip() +
#     "; image: " + df.image.str.strip()

# )

In [None]:
# Show full text without truncation
pd.set_option('display.max_colwidth', None)
df['review'][1:4]

Unnamed: 0,review
1,"There are not enough words to explain the sumptuousness and comfortable elegance you are treated to as a guest at Faena. Their holistic approach to relaxation and community is in paralleled. While every area was wonderful, standouts were Los Fuegos and the Spa. Their staff is magnificent and know how to care for your every whim. Thank you Solange for the welcome and checking!\n\nWe will be back!"
2,"I went for the Burlesque show which was absolutely amazing. The service was top tier. My bartender was Patty, my drinks were great, one of the best whiskey sours I've had. My server Tatiana was awesome, so sweet and attentive. I highly recommend the show if you get a chance and ask for Patty at the bar"
3,Dana at Gitano made the night magical! He makes me want to fly to Miami from Chicago for dinner so he can be my server! The hotel is smart to have Dana as a brand ambassador! Well done Faena Hotel for this!


In [None]:
df.head(2).T

Unnamed: 0,0,1
id,7787044,7787044
type,HOTEL,HOTEL
name,Faena Miami Beach,Faena Miami Beach
image,https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg,https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg
awards,[],[]
rankingPosition,5,5
priceLevel,$$$$,$$$$
priceRange,"$729 - $1,426","$729 - $1,426"
category,hotel,hotel
rating,4.5,4.5


In [None]:
df_five = df[:5]  # Using only 5 rows for demonstration
df_five = df_five.dropna(axis=1)  # Drop columns with null values

# Prepare data with metadata for traceability
data = []
for row_num, row in df_five.iterrows():
    content = " ".join([f"{col}: {row[col]}" for col in df_five.columns])
    data.append({
        "page_content": content,
        "metadata": {
            "source": row["name"],
        }
    })
df_five

Unnamed: 0,id,type,name,image,awards,rankingPosition,priceLevel,priceRange,category,rating,...,prices,latitude,longitude,webUrl,website,rankingString,rankingDenominator,numberOfReviews,review,title
0,7787044,HOTEL,Faena Miami Beach,https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg,[],5,$$$$,"$729 - $1,426",hotel,4.5,...,[],25.807375,-80.12364,https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html,https://www.faena.com/miami-beach,#5 of 235 hotels in Miami Beach,235,2123,"Hands down my absolute favorite hotel in South Beach—there’s no place I’d rather stay. I’ve been back four times in the past six months –each experience is better than the last and I always look forward to coming back. As impressive as the hotel is, the team of people are what make each stay exceptional. Everyone is so friendly and they go out of their way to ensure that I have a perfect stay every single time. Petar in particular always ensures that my stay is wonderful every time and gives me total peace of mind every time I stay there - I never have to worry about a thing and he goes above and beyond to provide excellent service.\n\nI’ve stayed at many hotels in South Beach/Miami and they’re in a league of their own.",My favorite hotel in Miami--Petar is the best!
1,7787044,HOTEL,Faena Miami Beach,https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg,[],5,$$$$,"$729 - $1,426",hotel,4.5,...,[],25.807375,-80.12364,https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html,https://www.faena.com/miami-beach,#5 of 235 hotels in Miami Beach,235,2123,"There are not enough words to explain the sumptuousness and comfortable elegance you are treated to as a guest at Faena. Their holistic approach to relaxation and community is in paralleled. While every area was wonderful, standouts were Los Fuegos and the Spa. Their staff is magnificent and know how to care for your every whim. Thank you Solange for the welcome and checking!\n\nWe will be back!",Exquisite Stay
2,7787044,HOTEL,Faena Miami Beach,https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg,[],5,$$$$,"$729 - $1,426",hotel,4.5,...,[],25.807375,-80.12364,https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html,https://www.faena.com/miami-beach,#5 of 235 hotels in Miami Beach,235,2123,"I went for the Burlesque show which was absolutely amazing. The service was top tier. My bartender was Patty, my drinks were great, one of the best whiskey sours I've had. My server Tatiana was awesome, so sweet and attentive. I highly recommend the show if you get a chance and ask for Patty at the bar",Burlesqu Show
3,7787044,HOTEL,Faena Miami Beach,https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg,[],5,$$$$,"$729 - $1,426",hotel,4.5,...,[],25.807375,-80.12364,https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html,https://www.faena.com/miami-beach,#5 of 235 hotels in Miami Beach,235,2123,Dana at Gitano made the night magical! He makes me want to fly to Miami from Chicago for dinner so he can be my server! The hotel is smart to have Dana as a brand ambassador! Well done Faena Hotel for this!,Dana at Gitano is a rockstar!
4,7787044,HOTEL,Faena Miami Beach,https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg,[],5,$$$$,"$729 - $1,426",hotel,4.5,...,[],25.807375,-80.12364,https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html,https://www.faena.com/miami-beach,#5 of 235 hotels in Miami Beach,235,2123,"I would like to give our very very heartfelt thanks to Mr. Raúl Díaz, the Reservations Supervisor ! He was just so helpful and kind ! He just did an amazing Service ! So when ever you planning to go to Miami you have to stay at the Faena Miami and have to get in contact with Mr. Díaz !",Perfect Place


## Document Chunking

In [None]:
def simple_recursive_split(docs, chunk_size=1000, chunk_overlap=200, separators=None):
    # Extract the main text and its associated metadata
    text = docs["page_content"]
    metadata = docs["metadata"]

    # Set default separators if none are provided
    if separators is None:
        separators = ["\n\n", "\n", " ", ".", ",", "\uff0c", "\u3001", "\uff0e", "\u3002"]

    # Helper function to recursively split text based on the separators
    def split_with_separators(t):
        # If the text is already within the chunk size, return it directly
        if len(t) <= chunk_size:
            return [t]

        # Attempt splitting by each separator in order
        for sep in separators:
            if sep and sep in t:
                parts = t.split(sep)
                chunks = []
                current = ""

                # Build chunks without exceeding the maximum chunk size
                for part in parts:
                    part += sep  # Reattach the separator to preserve structure
                    if len(current + part) <= chunk_size:
                        current += part
                    else:
                        if current:
                            chunks.append(current.strip())
                        current = part  # Start a new chunk

                # Add the final leftover chunk
                if current:
                    chunks.append(current.strip())

                # Recursively re-split chunks that are still too large
                result = []
                for chunk in chunks:
                    if len(chunk) > chunk_size:
                        result.extend(split_with_separators(chunk))
                    else:
                        result.append(chunk)
                return result

        # Fallback: if no separators are effective, split the text by fixed character lengths
        return [t[i:i + chunk_size] for i in range(0, len(t), chunk_size)]

    # Split the original text
    splits = split_with_separators(text)

    # Add overlap between chunks to preserve context between adjacent segments
    overlapped = []
    for i, chunk in enumerate(splits):
        if i == 0:
            # First chunk, no overlap
            overlapped.append({
                "page_content": chunk,
                "metadata": metadata
            })
        else:
            # For subsequent chunks, add overlap from the end of the previous chunk
            overlap = splits[i - 1][-chunk_overlap:]
            overlapped.append({
                "page_content": f"{overlap} {chunk}",
                "metadata": metadata
            })

    return overlapped

# Apply the chunking function to each document in the dataset
# This flattens all chunks into a single list
#texts = [chunk for doc in data for chunk in simple_recursive_split(doc, 2048, 50)]
texts = [chunk for doc in data for chunk in simple_recursive_split(doc, 512, 50)]

In [None]:
print (f'You now have {len(texts)} document(s) in your data')
print (f'There are {len(texts[1]["page_content"])} characters in your document')

You now have 16 document(s) in your data
There are 561 characters in your document


## Text Embedding Generation

In [None]:
# Load embedding model from HuggingFace "nomic-ai/nomic-embed-text-v1.5"
# code is asking for a Hugging Face (HF) token because you're trying
# to download a model from Hugging Face's private or gated model repository,
# specifically:"nomic-ai/nomic-embed-text-v1.5"


from transformers import AutoTokenizer, AutoModel
text_tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
text_model = AutoModel.from_pretrained("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)

# Function to generate embeddings from text
def get_text_embeddings(text):
    inputs = text_tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    outputs = text_model(**inputs)
    embeddings = outputs.last_hidden_state.mean(dim=1)
    return embeddings[0].detach().numpy()







### How did we get text_embeddings_size as 768?

Even for short inputs like "This is a test sentence.", transformer models (like nomic-embed-text-v1.5) output a fixed-size embedding—in this case, a 768-dimensional vector. This happens because:

The sentence is tokenized into smaller units.

Each token is passed through the model to get a 768-dimensional representation.

These are then averaged (mean pooling) to create a single 768-dimensional embedding for the whole sentence.

The embedding size is determined by the model's architecture, not by the input length.










In [None]:
# Example usage of the function
text = "This is a test sentence."

# Get the embedding vector for the input text
embeddings = get_text_embeddings(text)

# Optionally, get the length of the embedding (number of dimensions)
text_embeddings_size = len(embeddings)

# Print the first 5 values of the embedding vector for inspection
print(embeddings[:5])
print(text_embeddings_size)

[ 1.2799692   0.40158355 -3.5162656  -0.3981321   1.5919138 ]
768


In [None]:
# Generate embeddings for all chunks
text_embeded = [get_text_embeddings(document["page_content"]) for document in texts]

In [None]:
len(text_embeded)

16

In [None]:
text_embeded[0].shape

(768,)

In [None]:
print(text_embeded[:1])

[array([-1.24075592e-01,  1.38791895e+00, -3.15292311e+00, -4.48979825e-01,
        7.45368242e-01, -1.73579752e-01,  6.49689138e-01,  3.01164120e-01,
       -1.99683815e-01, -6.49454653e-01, -5.71183801e-01, -6.35601044e-01,
        1.52062416e+00, -2.42449999e-01,  1.05806664e-01, -6.39356196e-01,
       -1.25471783e+00, -3.11308742e-01,  1.39914349e-01, -1.07381904e+00,
       -8.00950885e-01, -6.19494140e-01, -1.14544237e+00,  8.29612434e-01,
        8.24180007e-01,  6.60494566e-01,  9.07017589e-01,  1.01916826e+00,
       -1.67121613e+00, -7.43911505e-01, -4.92422193e-01,  2.46891618e-01,
        2.14848012e-01, -4.07480955e-01, -6.42710090e-01, -9.21196043e-01,
       -4.47947860e-01,  8.39062691e-01, -5.04525959e-01,  7.48336256e-01,
        1.22581947e+00, -5.46440063e-03,  8.59550774e-01,  2.75464326e-01,
       -2.03297883e-01, -6.35120749e-01,  2.66190529e-01,  2.02401206e-01,
        5.70763767e-01,  1.10834754e+00, -7.39857601e-03,  9.19452131e-01,
        1.00824988e+00, 

## Qdrant VectorDatabase

text_embeddings_size for Qdrant is still 768 as we are using the same model "nomic-ai/nomic-embed-text-v1.5"

In [None]:
# Import necessary modules from the Qdrant client library
# Qdrant is a vector database that allows you to store and search high-dimensional vector embeddings efficiently
from qdrant_client import QdrantClient, models

# Create a new Qdrant client instance using in-memory storage
# ":memory:" means the data will be stored temporarily in RAM (not saved to disk)
# Useful for testing or prototyping — everything is wiped when the program ends
client = QdrantClient(":memory:")

# Display the size (number of dimensions) of the text embeddings we generated earlier
# This is important because Qdrant needs to know the exact size of each vector to create a collection
text_embeddings_size

768

In [None]:
try:
    # Define the name of the collection we want to manage in Qdrant.
    # A collection in Qdrant is similar to a table in traditional databases —
    # it stores a group of vectors and their associated metadata.
    collection_name = "demo_collection"

    # Check whether the collection already exists in Qdrant.
    # This avoids attempting to create a collection with a name that's already taken.
    if client.collection_exists(collection_name):
        # If the collection already exists, delete it to ensure we're starting fresh.
        # This is useful when we want to reset the state (e.g., during development or re-indexing).
        client.delete_collection(collection_name=collection_name)

        # Output a message confirming the collection was deleted successfully.
        print(f"Collection '{collection_name}' deleted successfully.")

    # Proceed to create a new collection regardless of whether it was previously deleted or not.
    # This ensures we always end up with a clean, newly-created collection.
    client.create_collection(
        collection_name=collection_name,  # The name of the new collection being created

        # Configure how vectors will be stored in this collection.
        # This includes the dimensionality (size) and the distance metric used for similarity.
        vectors_config=models.VectorParams(
            size=text_embeddings_size,       # The number of dimensions in each vector.
                                             # Must match the output size of your embedding model.
            distance=models.Distance.COSINE  # The distance function used for comparing vectors.
                                             # COSINE is commonly used for text embeddings as it measures angular similarity.
        ),
    )

    # Print a confirmation that the collection was created successfully.
    print(f"Collection '{collection_name}' created successfully.")

except Exception as e:
    # If any error occurs during the process (e.g., connection issues, invalid parameters),
    # it will be caught here and the error message will be printed.
    print(f"An error occurred while setting up the collection: {e}")


Collection 'demo_collection' created successfully.


In [None]:

# Import the `uuid4` function to generate unique IDs for each vector
# These IDs help identify and retrieve individual points later
from uuid import uuid4

# Import NumPy to handle vector data formats (embeddings are stored as NumPy arrays)
import numpy as np

# Upload all our text embeddings to the "demo_collection" in Qdrant
client.upload_points(
    collection_name="demo_collection",  # Target collection where we want to store our vectors

    # Create a list of PointStruct objects, one for each text chunk
    points=[
        models.PointStruct(
            id=str(uuid4()),  # Generate a unique ID for each point (as a string)

            # Convert the embedding to a NumPy array, which is the expected format
            vector=np.array(text_embeded[idx]),

            # Attach payload — additional information stored with each vector
            # This allows us to retrieve the original text and its metadata later
            payload={
                "metadata": doc["metadata"],         # Source and row info
                "content": doc["page_content"]       # The full text chunk
            }
        )
        for idx, doc in enumerate(texts)  # Loop through all texts and match them to their embeddings
    ]
)


In [None]:
from pprint import pprint  # for nicely formatted output

pprint(texts[0])  # Show the first item


{'metadata': {'source': 'Faena Miami Beach'},
 'page_content': 'id: 7787044 type: HOTEL name: Faena Miami Beach image: '
                 'https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg '
                 'awards: [] rankingPosition: 5 priceLevel: $$$$ priceRange: '
                 '$729 - $1,426 category: hotel rating: 4.5 hotelClass: 0.0 '
                 'phone: 13055348800 address: 3201 Collins Ave Faena District, '
                 'Miami Beach, FL 33140-4023 email: '
                 'reservations-miamibeach@faena.com amenities: [] '
                 'numberOfRooms: 179 prices: [] latitude: 25.807375 longitude: '
                 '-80.12364 webUrl:'}


In [None]:
for i, item in enumerate(texts[:3]):
    print(f"\nItem {i}:")
    pprint(item)



Item 0:
{'metadata': {'source': 'Faena Miami Beach'},
 'page_content': 'id: 7787044 type: HOTEL name: Faena Miami Beach image: '
                 'https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg '
                 'awards: [] rankingPosition: 5 priceLevel: $$$$ priceRange: '
                 '$729 - $1,426 category: hotel rating: 4.5 hotelClass: 0.0 '
                 'phone: 13055348800 address: 3201 Collins Ave Faena District, '
                 'Miami Beach, FL 33140-4023 email: '
                 'reservations-miamibeach@faena.com amenities: [] '
                 'numberOfRooms: 179 prices: [] latitude: 25.807375 longitude: '
                 '-80.12364 webUrl:'}

Item 1:
{'metadata': {'source': 'Faena Miami Beach'},
 'page_content': '] latitude: 25.807375 longitude: -80.12364 webUrl: '
                 'https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html '
                 'website: https://www.

In [None]:
# Import the Google Drive integration module for Google Colab
from google.colab import drive

# Mount your Google Drive to the Colab environment
# This allows you to read from and write to files stored in your Drive
# After running this, a link will appear asking for permission to access your Drive
# Once authorized, your Drive will be available under '/content/drive'
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Import necessary modules from the Qdrant client
# QdrantClient allows us to interact with a Qdrant vector database (e.g., create collections, insert/query vectors)
# models provides configuration classes for vector parameters, point operations, etc.
from qdrant_client import QdrantClient, models

# Import Python's built-in modules
import os        # For file system operations like checking and creating directories
import shutil    # For high-level file operations like removing entire directories

# Define the local directory path where Qdrant should store its data
# This is where vector collections and other database files will be saved
# For Google Colab users, this may point to a mounted Google Drive location
qdrant_data_dir = '/content/drive/MyDrive/Stanford_Continuing_Studies/TheAILeadershipSeries_BuildingandScalingSolutions/Semantic_Search/qdrant_data'

# ----------------------------------------
# Forcefully remove the directory if it exists
# ----------------------------------------

try:
    # Remove the entire directory and its contents recursively
    # This is useful when you want to reset the Qdrant database from scratch
    shutil.rmtree(qdrant_data_dir)
    print(f"Directory '{qdrant_data_dir}' removed successfully.")

except FileNotFoundError:
    # If the directory does not exist, no need to worry — just proceed
    print(f"Directory '{qdrant_data_dir}' not found. Proceeding to create it.")

except OSError as e:
    # If there's a permission issue or the directory is in use, catch and report it
    print(f"Error removing directory '{qdrant_data_dir}': {e}")

# ----------------------------------------
# Recreate the directory
# ----------------------------------------

# Create the directory (and any missing parent directories) if it doesn't exist
# After deletion, this ensures a clean, fresh directory is in place for Qdrant
os.makedirs(qdrant_data_dir, exist_ok=True)
print(f"Directory '{qdrant_data_dir}' created.")

# ----------------------------------------
# Initialize Qdrant Client
# ----------------------------------------

# Initialize the Qdrant client, telling it to use the newly created directory for local storage
# This setup will persist vector data across sessions (e.g., in Google Drive)
client = QdrantClient(path=qdrant_data_dir)
print("Qdrant client initialized with fresh storage directory.")


Directory '/content/drive/MyDrive/Stanford_Continuing_Studies/TheAILeadershipSeries_BuildingandScalingSolutions/Semantic_Search/qdrant_data' removed successfully.
Directory '/content/drive/MyDrive/Stanford_Continuing_Studies/TheAILeadershipSeries_BuildingandScalingSolutions/Semantic_Search/qdrant_data' created.
Qdrant client initialized with fresh storage directory.


In [None]:
try:
    # Define the name of the collection we want to manage in Qdrant.
    # A collection in Qdrant is similar to a table in traditional databases —
    # it stores a group of vectors and their associated metadata.
    collection_name = "demo_collection"

    # Check whether the collection already exists in Qdrant.
    # This avoids attempting to create a collection with a name that's already taken.
    if client.collection_exists(collection_name):
        # If the collection already exists, delete it to ensure we're starting fresh.
        # This is useful when we want to reset the state (e.g., during development or re-indexing).
        client.delete_collection(collection_name=collection_name)

        # Output a message confirming the collection was deleted successfully.
        print(f"Collection '{collection_name}' deleted successfully.")

    # Proceed to create a new collection regardless of whether it was previously deleted or not.
    # This ensures we always end up with a clean, newly-created collection.
    client.create_collection(
        collection_name=collection_name,  # The name of the new collection being created

        # Configure how vectors will be stored in this collection.
        # This includes the dimensionality (size) and the distance metric used for similarity.
        vectors_config=models.VectorParams(
            size=text_embeddings_size,       # The number of dimensions in each vector.
                                             # Must match the output size of your embedding model.
            distance=models.Distance.COSINE  # The distance function used for comparing vectors.
                                             # COSINE is commonly used for text embeddings as it measures angular similarity.
        ),
    )

    # Print a confirmation that the collection was created successfully.
    print(f"Collection '{collection_name}' created successfully.")

except Exception as e:
    # If any error occurs during the process (e.g., connection issues, invalid parameters),
    # it will be caught here and the error message will be printed.
    print(f"An error occurred while setting up the collection: {e}")


Collection 'demo_collection' created successfully.


In [None]:


# Import the `uuid4` function to generate unique IDs for each vector
# These IDs help identify and retrieve individual points later
from uuid import uuid4

# Import NumPy to handle vector data formats (embeddings are stored as NumPy arrays)
import numpy as np

# Upload all our text embeddings to the "demo_collection" in Qdrant
client.upload_points(
    collection_name="demo_collection",  # Target collection where we want to store our vectors

    # Create a list of PointStruct objects, one for each text chunk
    points=[
        models.PointStruct(
            id=str(uuid4()),  # Generate a unique ID for each point (as a string)

            # Convert the embedding to a NumPy array, which is the expected format
            vector=np.array(text_embeded[idx]),

            # Attach payload — additional information stored with each vector
            # This allows us to retrieve the original text and its metadata later
            payload={
                "metadata": doc["metadata"],         # Source and row info
                "content": doc["page_content"]       # The full text chunk
            }
        )
        for idx, doc in enumerate(texts)  # Loop through all texts and match them to their embeddings
    ]
)


## Run Query

In [None]:
# Create a query vector by embedding a sample search string
# This string represents what you're looking for — in this case, something about "hotel close to Miami Beach and great food nearby "
# The result is a vector in the same format as the ones stored in the Qdrant collection
query = get_text_embeddings('hotel close to Miami Beach and great food nearby with in price range of $300 - $500')

# Perform a similarity search in Qdrant using the query vector
# This finds the most relevant text chunks (based on vector similarity)
text_hits = client.query_points(
    collection_name="demo_collection",  # The name of the collection where vectors were stored
    query=query,                         # The query vector — what we want to find similar results to
    limit=3,                             # Limit the number of results to 3 most relevant chunks
).points                                 # Extract only the list of matching points (each with vector + payload)


In [None]:
text_hits

[ScoredPoint(id='e8662cec-b7c3-4aab-98d6-cad28721587d', version=0, score=0.7183349377879972, payload={'metadata': {'source': 'Faena Miami Beach'}, 'content': 'id: 7787044 type: HOTEL name: Faena Miami Beach image: https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg awards: [] rankingPosition: 5 priceLevel: $$$$ priceRange: $729 - $1,426 category: hotel rating: 4.5 hotelClass: 0.0 phone: 13055348800 address: 3201 Collins Ave Faena District, Miami Beach, FL 33140-4023 email: reservations-miamibeach@faena.com amenities: [] numberOfRooms: 179 prices: [] latitude: 25.807375 longitude: -80.12364 webUrl:'}, vector=None, shard_key=None, order_value=None),
 ScoredPoint(id='405ca85c-5072-48d8-b366-d4880c5e2b7a', version=0, score=0.7183349377879972, payload={'metadata': {'source': 'Faena Miami Beach'}, 'content': 'id: 7787044 type: HOTEL name: Faena Miami Beach image: https://media-cdn.tripadvisor.com/media/photo-o/1d/78/a4/13/exterior-view.jpg awards: [] rankingPosition

## We can start from here now!


In [None]:
!pip install qdrant_client
!pip install openai



In [None]:
# Import the Google Drive integration module for Google Colab
from google.colab import drive

# Mount your Google Drive to the Colab environment
# This allows you to read from and write to files stored in your Drive
# After running this, a link will appear asking for permission to access your Drive
# Once authorized, your Drive will be available under '/content/drive'
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Import the main Qdrant client class to connect and interact with a Qdrant vector database
from qdrant_client import QdrantClient

# Import specific classes used to configure how vectors are stored and compared in a collection
from qdrant_client.http.models import Distance, VectorParams

# Import the `userdata` module from Google Colab.
from google.colab import userdata

# This is used to send requests to OpenRouter, which gives access to various LLMs (large language models)
from openai import OpenAI

# Import HTML and display tools from IPython
# These allow you to inject custom HTML or CSS into the notebook
from IPython.display import HTML, display

# Import necessary classes from the Hugging Face Transformers library
# AutoTokenizer handles breaking text into tokens
# AutoModel loads the pre-trained model used to compute vector embeddings
from transformers import AutoTokenizer, AutoModel

# Import the openai

import openai

**1. Define the Qdrant client first to connect to the vector database.**

In [None]:
# Attempt to initialize the Qdrant client
try:
    # Initialize the Qdrant client and set its storage path
    # This stores and retrieves the vector database in the specified directory on disk
    client = QdrantClient(path='/content/drive/MyDrive/Stanford_Continuing_Studies/TheAILeadershipSeries_BuildingandScalingSolutions/Semantic_Search/qdrant_data')

except RuntimeError as e:
    # Catch the specific error that occurs when the Qdrant client is already running with this path
    if "already accessed by another instance" in str(e):
        print("Qdrant is already initialized with this path in the current session.")
        print("You don't need to create the client again — reuse the existing one.")
    else:
        # Re-raise the error if it's something else
        raise


Qdrant is already initialized with this path in the current session.
You don't need to create the client again — reuse the existing one.


**2. Define the OpenRouter client to serve as the language model (LLM) for the pipeline.**


In [None]:

# Retrieve the value of a saved environment variable named 'OPEN_ROUTER_API_KEY'.

OPEN_ROUTER_API_KEY = userdata.get('first_open_router_ai_key')



# Initialize the OpenAI-compatible client, but point it to OpenRouter's API instead of OpenAI's
# OpenRouter is a gateway to multiple LLMs like GPT, Claude, Mistral, and others, through one unified API

open_router_client = OpenAI(
    base_url="https://openrouter.ai/api/v1",  # Set the API endpoint to OpenRouter (not OpenAI)
    api_key=OPEN_ROUTER_API_KEY               # Use your OpenRouter API key for authentication
)


**3. Import the same embedding model used during vector database creation to ensure consistency.**

In [None]:


# Load a pre-trained tokenizer and model designed for generating text embeddings
# "nomic-ai/nomic-embed-text-v1.5" is a model specifically trained to turn text into high-quality vector representations
# trust_remote_code=True allows use of any custom logic included with the model
text_tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
text_model = AutoModel.from_pretrained("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)

# Define a function to convert input text into a fixed-size vector (embedding)
def get_text_embeddings(text):
    # Tokenize the input text and return it as PyTorch tensors
    # padding=True: pad shorter sequences to ensure consistent length
    # truncation=True: cut off text that is too long for the model
    inputs = text_tokenizer(text, return_tensors="pt", padding=True, truncation=True)

    # Pass the tokenized input through the model to obtain output embeddings
    outputs = text_model(**inputs)

    # outputs.last_hidden_state contains embeddings for each token
    # We take the mean across all tokens to get a single vector for the entire text
    embeddings = outputs.last_hidden_state.mean(dim=1)

    # Convert the result to a NumPy array and remove it from the computation graph
    return embeddings[0].detach().numpy()




**4. Test the retrieval functions to ensure they're returning relevant results.**

In [None]:
query = """hotel close to Miami Beach and great food nearby"""

In [None]:
# get query embedded
query_em = get_text_embeddings(query)

In [None]:
text_hits = client.query_points(
        collection_name="demo_collection",
        query=query_em,
        limit=10,
    ).points



In [None]:
# Extract the original text content from each result returned by the similarity search
# `text_hits` is a list of points returned by Qdrant's query
# Each point has a `payload`, which contains metadata and the original text chunk

contents = [point.payload['content'] for point in text_hits]

In [None]:
contents

['] latitude: 25.807375 longitude: -80.12364 webUrl: https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html website: https://www.faena.com/miami-beach rankingString: #5 of 235 hotels in Miami Beach rankingDenominator: 235 numberOfReviews: 2123 review: Hands down my absolute favorite hotel in South Beach—there’s no place I’d rather stay. I’ve been back four times in the past six months –each experience is better than the last and I always look forward to coming back. As impressive as the hotel is, the team',
 '] latitude: 25.807375 longitude: -80.12364 webUrl: https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html website: https://www.faena.com/miami-beach rankingString: #5 of 235 hotels in Miami Beach rankingDenominator: 235 numberOfReviews: 2123 review: I would like to give our very very heartfelt thanks to Mr. Raúl  Díaz, the Reservations Supervisor  ! He was just so helpful and 

In [None]:
# Extract the metadata for each point returned by the similarity search
# Each result (point) has a payload dictionary that includes metadata stored when uploading the vectors

meta = [point.payload['metadata'] for point in text_hits]

In [None]:
meta

[{'source': 'Faena Miami Beach'},
 {'source': 'Faena Miami Beach'},
 {'source': 'Faena Miami Beach'},
 {'source': 'Faena Miami Beach'},
 {'source': 'Faena Miami Beach'},
 {'source': 'Faena Miami Beach'},
 {'source': 'Faena Miami Beach'},
 {'source': 'Faena Miami Beach'},
 {'source': 'Faena Miami Beach'},
 {'source': 'Faena Miami Beach'}]

In [None]:
# Loop through each text chunk in the `contents` list
# These are the top-matching results returned by the Qdrant similarity search
for i in contents:
    # Print the actual text content
    print(i)

    # Print a separator line to clearly distinguish between different chunks
    print('###########')


] latitude: 25.807375 longitude: -80.12364 webUrl: https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html website: https://www.faena.com/miami-beach rankingString: #5 of 235 hotels in Miami Beach rankingDenominator: 235 numberOfReviews: 2123 review: Hands down my absolute favorite hotel in South Beach—there’s no place I’d rather stay. I’ve been back four times in the past six months –each experience is better than the last and I always look forward to coming back. As impressive as the hotel is, the team
###########
] latitude: 25.807375 longitude: -80.12364 webUrl: https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html website: https://www.faena.com/miami-beach rankingString: #5 of 235 hotels in Miami Beach rankingDenominator: 235 numberOfReviews: 2123 review: I would like to give our very very heartfelt thanks to Mr. Raúl  Díaz, the Reservations Supervisor  ! He was just so helpfu

**5. Create a retriever function to extract relevant chunks from the documents.**

In [None]:
# Define a function to search the Qdrant vector database using a natural language query
def query_qdrant(query, qdrant_client, limit=5):
    # Step 1: Convert the query text into an embedding (vector representation)
    # This embedding will be compared with stored vectors in the collection
    query_em = get_text_embeddings(query)

    # Step 2: Query the Qdrant collection using the embedding
    # This finds the top `limit` most similar text chunks based on vector similarity
    text_hits = qdrant_client.query_points(
        collection_name="demo_collection",  # The name of the Qdrant collection to search
        query=query_em,                     # The embedding of the input query
        limit=limit                         # Number of top results to return
    ).points                                 # Extract the matching points (results)

    # Step 3: Prepare the results in a clean format (text + metadata)
    results = []
    for point in text_hits:
        results.append({
            'content': point.payload['content'],    # The original text content
            'metadata': point.payload['metadata']   # Associated metadata (e.g., title, row number)
        })

    # Return the list of results
    return results


In [None]:
query_qdrant(query, client)

[{'content': '] latitude: 25.807375 longitude: -80.12364 webUrl: https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html website: https://www.faena.com/miami-beach rankingString: #5 of 235 hotels in Miami Beach rankingDenominator: 235 numberOfReviews: 2123 review: Hands down my absolute favorite hotel in South Beach—there’s no place I’d rather stay. I’ve been back four times in the past six months –each experience is better than the last and I always look forward to coming back. As impressive as the hotel is, the team',
  'metadata': {'source': 'Faena Miami Beach'}},
 {'content': '] latitude: 25.807375 longitude: -80.12364 webUrl: https://www.tripadvisor.com/Hotel_Review-g34439-d7787044-Reviews-Faena_Miami_Beach-Miami_Beach_Florida.html website: https://www.faena.com/miami-beach rankingString: #5 of 235 hotels in Miami Beach rankingDenominator: 235 numberOfReviews: 2123 review: I would like to give our very very heartfelt thanks to Mr

**6. Now, let's integrate everything by combining our Retrieval functiom with the Language Model to complete our RAG (Retrieval-Augmented Generation) pipeline.**

In [None]:
# Define a function that uses a language model to generate an answer based on a user's query
def generate_answer(query):
    # Build the prompt that will be sent to the LLM
    # The prompt includes:
    # - Instructions to clean and format the answer
    # - The user's original query
    # - The context retrieved from Qdrant (via semantic search)
    prompt = f"""
    Based on the following query from a user, please generate a small answer
    focusing on the original query and the response given. The answer should be paragraphs.
    Remove the special characters and (/n), make the output clean and long.
    Please cite source for each part as [1][2].
    Just start with the answer, no need to give any salutations.

    ###########
    query:
    "{query}"

    ########

    context:
    "{query_qdrant(query, client)}"
    #####

    Return in Markdown format.
    """

    # Send the prompt to the LLM using streaming mode
    # This allows the response to be received in real-time, piece by piece
    stream = open_router_client.chat.completions.create(
        model="qwen/qwen3-8b:free",  # Model to use (can be any OpenAI-compatible model)
        messages=[
            {
                "role": "user",
                "content": prompt,
            },
        ],
        stream=True,  # Enable streaming so we get partial output as it generates
    )

    # Initialize a variable to hold the full response
    output_text = ""

    # Iterate through the streaming response chunks
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            content = chunk.choices[0].delta.content
            output_text += content  # Append new content to the full output
            print(content, end="")  # Print each chunk live as it's received

    # Return both the final answer and the context used (for reference or display)
    return output_text, query_qdrant(query, client)


In [None]:
response,sources = generate_answer(query)

Faena Miami Beach is a highly recommended hotel located in Miami Beach, Florida, at 3201 Collins Ave, Faena District [1]. It ranks #5 among 235 hotels in the area, with 2123 positive reviews from guests, reflecting its strong reputation for quality and service [1]. The hotel is situated in a prime location, offering easy access to Miami Beach's iconic attractions and vibrant dining scene. Numerous reviews praise the hotel's amenities, with particular emphasis on its culinary offerings. For instance, guests highlight the exceptional dining experience at Los Fuegos, a standout restaurant within the hotel, and commend the personalized service provided by staff members like Dana at Gitano, who is described as a "rockstar" for creating memorable dining moments [1]. Additionally, the hotel’s holistic approach to relaxation and community is frequently mentioned, with guests appreciating the blend of comfort and elegance in its facilities [1]. The Faena Miami Beach is also noted for its attent

In [None]:
#for markdown layout
render_markdown(response)

## Time to Build a functional Gradio interface to interact with the RAG system.

In [None]:
import gradio as gr

**1. Redefine our RAG function**

In [None]:
# Import OpenAI-compatible library (used here with OpenRouter)
import openai

# Define a function to generate a streamed answer to a user's query using an LLM
# This version includes error handling and uses Python's `yield` to stream results back as they're generated
def generate_answer(query):
    # Step 1: Try to get relevant context from Qdrant (vector search)
    try:
        sources = query_qdrant(query, client)
    except Exception as e:
        # If something goes wrong (e.g., Qdrant is not running), return a fallback message
        sources = [{"error": f"Error retrieving sources: {str(e)}"}]

    # Step 2: Prepare the prompt for the language model
    # Includes the user's question and the context retrieved from the vector database
    prompt = f"""
    Based on the following query from a user, please generate a small answer
    focusing on the original query and the response given. The answer should be paragraphs.
    Remove special characters and (/n); make the output clean and long.
    Please cite source for each part as [1][2]. Just start with the answer — no salutations.

    ###########
    query:
    "{query}"

    ########

    context:
    "{sources}"
    #####

    Return in Markdown format.
    """

    # Step 3: Send the prompt to the OpenRouter-compatible LLM (Qwen model)
    stream = open_router_client.chat.completions.create(
        model="qwen/qwen3-8b:free",  # A free-to-use large language model hosted on OpenRouter
        messages=[
            {
                "role": "user",
                "content": prompt,
            },
        ],
        stream=True,  # Enable streaming response
    )

    # Step 4: Stream and yield the generated content chunk by chunk
    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            content = chunk.choices[0].delta.content
            full_response += content

            # Yield lets us return partial results as they're received (for real-time feedback)
            yield full_response


**2. Create a Demo Interface**

In [None]:
# Define example inputs for the UI — users can click these to try predefined queries
examples = [
    ["Affordable beachfront hotels in Miami with free breakfast"],
    ["Luxury 5-star hotels near South Beach Miami"],
    ["Pet-friendly hotels in downtown Miami with parking"],
    ["Hotels in Miami close to the airport with free shuttle service"],
    ["Family-friendly Miami hotels with pool and kid activities"]
]


# Set up the Gradio interface
# - fn: the function to call when user enters input (must be a generator if using yield)
# - title: the name shown at the top of the web app
# - inputs: defines the input component (in this case, a text box)
# - outputs: defines what kind of output to display (Textbox with 3 lines labeled "Response")
# - examples: preloaded example queries for users to click and run

import gradio as gr

demo = gr.Interface(
    fn=generate_answer,  # The function that will process user input
    title="Miami Hotels Search",  # Title for the web app
    inputs="text",  # Single text input from the user
    outputs=gr.components.Textbox(lines=3, label="Response"),  # Output display
    examples=examples,  # List of sample queries for users to try
    live=False,  # Optional: set to True if you want real-time feedback as user types
)

# Launch the interface
# - share=True gives you a public link (useful in Colab or for sharing with others)
# - debug=True enables logging for error tracking
demo.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://653945463ccfdeea14.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




**3. Create a Demo Interface with Sources**

In [None]:
import json

def generate_answer(query):
    # First, get the sources
    try:
        sources = query_qdrant(query, client)
    except Exception as e:
        sources = [{"error": f"Error retrieving sources: {str(e)}"}]

    # Convert the sources list to a string for the prompt
    #sources_str = "\n".join([f"Source {i+1}: {source['content']}" for i, source in enumerate(sources)])

    prompt = f"""
    Based on the following query from a user, please generate a small answer
    focusing on the original query and the response given. The answer should be paragraphs
    remove the special characters and (/n ), make the output clean and long. Please cite source for each part as [1][2]
    Just start with the answer, no need to give any salutations

    ###########
    query:
    "{query}"

    ########

    context:
    "{sources}"
    #####

    Return in Markdown format.
    """

    # Send the prompt to the OpenRouter-compatible LLM (Qwen model)
    stream = open_router_client.chat.completions.create(
        model="qwen/qwen3-8b:free",  # A free-to-use large language model hosted on OpenRouter
        messages=[
            {
                "role": "user",
                "content": prompt,
            },
        ],
        stream=True,  # Enable streaming response
    )
    # Convert sources to a proper JSON string for the JSON component
    sources_json = json.dumps(sources)

    # For Gradio streaming with multiple outputs
    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            content = chunk.choices[0].delta.content
            full_response += content
            # Return both the accumulated response and the sources as JSON string
            yield full_response, sources_json

    # In case the stream is empty, yield one final time
    if not full_response:
        yield "No response generated", sources_json

examples = [
    ["Democrats in Senate"],
    ["Climate Change Challenges in Europe"],
    ["Philosophy in the world of Minimalism"],
    ["Hate Speech vs Freedom of Speech"],
    ["Articles by Noam Chomsky on US Politics"],
    ["The importance of values and reflection"]
]

demo = gr.Interface(
    fn=generate_answer,
    title="The Truth Serum",
    inputs="text",
    outputs=[
        gr.components.Textbox(lines=8, label="Response"),
        gr.components.JSON(label="Sources")
    ],
    examples=examples
)

demo.queue()
demo.launch(share=True, debug=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://1cf7ef77b27a0f8adf.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7862 <> https://5a2149f8ed420fa886.gradio.live
Killing tunnel 127.0.0.1:7863 <> https://1cf7ef77b27a0f8adf.gradio.live


