# Jeopardy Vector Search Lab
This notebook demonstrates how to build a semantic search application using **Weaviate**, an open-source vector database. Unlike traditional databases that search for exact keywords, vector databases search for *meaning*.

### 1. Load the data
We start by fetching a small dataset of Jeopardy questions in JSON format.

In [None]:
import requests
import json

# Download the data
resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
data = json.loads(resp.text)

# Parse the JSON and preview the first item
print(f"Loaded {len(data)} items.")
print("Preview of first item:")
print(json.dumps(data[0], indent=2))

In [None]:
def json_print(data):
    print(json.dumps(data, indent=2))

### 2. Initializing Weaviate (The Vector DB)

**What is a Vector Database?**
A vector database stores data as numerical representations called **embeddings**. These embeddings capture the semantic relationship between data points. In our case, the database will use an AI model (OpenAI) to convert Jeopardy questions into vectors. 



When you query the database, Weaviate converts your query into a vector and finds the nearest data points in high-dimensional space.



In [None]:
import weaviate
from weaviate import EmbeddedOptions
import os

# Start up an instance of Weaviate using Embedded mode
# Note: You will need an OpenAI API key to vectorize the text
client = weaviate.Client(
    embedded_options=EmbeddedOptions(),
    additional_headers={
        "X-OpenAI-Api-Key": os.environ.get("OPENAI_API_KEY") # Ensure this is set in your environment
    }
)

In [None]:
# Check that weaviate is up and running
client.is_ready()

In [None]:
# Delete the schema if it already exists to ensure a fresh start
if client.schema.exists("Question"):
    client.schema.delete_class("Question")

### 3. Defining the Schema
We define a class called `Question`. We specify `text2vec-openai` as the vectorizer, which tells Weaviate to automatically create vectors for any text we upload using OpenAI's models.

In [None]:
# Create the schema that will house our data
class_obj = {
    "class": "Question",
    "vectorizer": "text2vec-openai",  
}

client.schema.create_class(class_obj)
print("Schema created successfully.")

### 4. Batch Import
Batching is more efficient than uploading items one by one. During this process, Weaviate sends the text to OpenAI, receives the vector, and stores both the original text and the vector in its index.

In [None]:
with client.batch.configure(batch_size=100) as batch:
    for i, d in enumerate(data):
        print(f"Importing question: {i+1}")
            
        properties = {
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        }
        
        client.batch.add_data_object(
            data_object=properties,
            class_name="Question"
        )

In [None]:
# Check how many objects we've loaded
count = client.query.aggregate("Question").with_meta_count().do()
json_print(count)

### 5. Preview the Data
We can now query the database to verify the content. In a real application, you would use `.with_near_text()` here to perform semantic searches.

In [None]:
# Extract and show any 3 questions and answers
result = client.query.get("Question", ["question", "answer", "category"])\
    .with_limit(3).do()

json_print(result)