# Notebook 2: Data Processing & Vector Database

**Objectives:**
- Clean and structure event data
- Use LLM to extract additional metadata (baby-friendliness only)
- Generate embeddings for semantic search
- Set up Qdrant and index all events

**✅✅✅ Important Notes:**
- We're keeping it simple: only extracting `baby_friendly: bool` from LLM
- Semantic search will naturally handle mood/vibe queries
- Event-level chunks (1 event = 1 document)

---


## Setup & Imports


In [1]:
# Install required packages if needed
# !pip install openai qdrant-client pandas numpy python-dotenv


In [16]:
import pandas as pd
import numpy as np
import os
import json
from datetime import datetime
from pathlib import Path
from dotenv import load_dotenv
from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import time
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor, as_completed

# Load environment variables
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"), organization=os.getenv("OPENAI_ORG_ID"))

print("✅ Imports successful!")
print(f"OpenAI API Key loaded: {'Yes' if os.getenv('OPENAI_API_KEY') else 'No'}")


✅ Imports successful!
OpenAI API Key loaded: Yes


## 1. Load Raw Data

Load the CSV file from Notebook 1.


In [2]:
os.getenv("OPENAI_ORG_ID")

'org-U43gtMbXKtifsSUvWmXcbydP'

In [11]:
# Find the most recent CSV file in data/raw/
raw_data_dir = Path("../data/raw")
csv_files = list(raw_data_dir.glob("timeout_events_*.csv"))

if not csv_files:
    raise FileNotFoundError("No CSV files found in data/raw/. Please run Notebook 1 first.")

# Get the most recent file
latest_csv = sorted(csv_files)[-1]
print(f"Loading data from: {latest_csv}")

# Load data
df = pd.read_csv(latest_csv)

print(f"\n✅ Loaded {len(df)} events")
print(f"\nColumns: {list(df.columns)}")
print(f"\nFirst event:")
df.head(1)


Loading data from: ../data/raw/timeout_events_20251107.csv

✅ Loaded 90 events

Columns: ['event_id', 'title', 'description', 'url', 'long_description', 'is_free']

First event:


Unnamed: 0,event_id,title,description,url,long_description,is_free
0,evt_001,The NY Comedy Festival,The New York Comedy Festival is where the best...,https://www.timeout.com/newyork/news/the-ny-co...,"The New York Comedy Festival(NYCF), the countr...",True


## 2. LLM-Based Feature Extraction

**✅✅✅ Simplified Approach:**
- Extract only `baby_friendly: bool` from event descriptions
- If True, implies stroller-accessible and suitable for infants/toddlers
- No mood tags or energy levels needed - semantic search handles this naturally!

**Why this works:**
- Baby-friendliness is objective and actionable
- Semantic embeddings capture nuanced vibes ("romantic", "relaxing", etc.) automatically
- Reduces API calls and processing time by 50%+


In [12]:
def extract_baby_friendly(title: str, description: str) -> dict:
    """
    Use GPT-4 to determine if an event is baby-friendly, indoor/outdoor, and create a clean summary.
    
    Args:
        title: Event title
        description: Event description
    
    Returns:
        dict: {"baby_friendly": bool, "indoor_or_outdoor": str, "clean_summary": str}
    """
    prompt = f"""Given this NYC event, provide three things:

1. Is it baby-friendly?
   - Suitable for infants and toddlers (ages 0-3)
   - Stroller-accessible
   - Family-oriented or welcoming to young children
   - Not explicitly adult-only (bars, nightclubs, R-rated content)

2. Is it indoor, outdoor, or both?
   - "indoor": Takes place entirely indoors (museums, theaters, indoor markets)
   - "outdoor": Takes place entirely outdoors (parks, outdoor festivals, street events)
   - "both": Has both indoor and outdoor components, or location is flexible

3. Create a clean, concise summary (MAX 5 sentences, ideally 2-3):
   - Remove all marketing language and promotional hype
   - Focus on WHAT the event is and what attendees will do/experience
   - Include key details: type of activity, notable features, what makes it unique
   - NO flowery language, NO calls to action, NO superlatives
   - Write in simple, factual sentences

Event Title: {title}
Event Description: {description}

Return ONLY a JSON object with this format:
{{
  "baby_friendly": true,
  "indoor_or_outdoor": "indoor",
  "clean_summary": "This is a brief 2-3 sentence summary of what the event actually is."
}}

Possible values for indoor_or_outdoor: "indoor", "outdoor", or "both"
No explanation needed, just the JSON."""
    
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",  # Using mini for cost efficiency
            messages=[
                {"role": "system", "content": "You are a helpful assistant that analyzes event descriptions. Always respond with valid JSON."},
                {"role": "user", "content": prompt}
            ],
            temperature=0,  # Deterministic output
            response_format={"type": "json_object"}
        )
        
        result = json.loads(response.choices[0].message.content)
        return {
            "baby_friendly": result.get("baby_friendly", False),
            "indoor_or_outdoor": result.get("indoor_or_outdoor", "indoor"),
            "clean_summary": result.get("clean_summary", f"{title}. {description[:200]}...")  # Fallback
        }
    
    except Exception as e:
        print(f"Error processing event: {e}")
        return {
            "baby_friendly": False,
            "indoor_or_outdoor": "indoor",
            "clean_summary": f"{title}. {description[:200]}..."
        }

print("✅ Function defined successfully!")

✅ Function defined successfully!


In [14]:
# Test the function on first event
test_event = df.iloc[0]
result = extract_baby_friendly(test_event['title'], test_event['description'])

print(f"Test event: {test_event['title']}")
print(f"Baby-friendly: {result['baby_friendly']}")
print(f"Indoor/Outdoor: {result['indoor_or_outdoor']}")
print(f"Clean description: {result['clean_summary']}")
print("\n✅ Test successful!")


Test event: The NY Comedy Festival
Baby-friendly: False
Indoor/Outdoor: indoor
Clean description: The New York Comedy Festival features over 200 comedians performing more than 100 shows across various venues in the five boroughs. Notable acts include comedy legends and a reunion of the cast from 'Strangers with Candy.' The festival runs from November 7 to November 16, 2025.

✅ Test successful!


In [17]:
# Parallelize event processing for faster extraction
# Using ThreadPoolExecutor for I/O-bound API calls
# Limit concurrent requests to avoid rate limits (8 concurrent is safe for OpenAI)

def process_event(row_tuple):
    """Wrapper function for parallel processing."""
    idx, row = row_tuple
    try:
        result = extract_baby_friendly(row['title'], row['long_description'])
        return idx, result
    except Exception as e:
        print(f"Error processing event {idx} ({row['title']}): {e}")
        return idx, {
            "baby_friendly": False,
            "indoor_or_outdoor": "indoor",
            "clean_summary": f"{row['title']}. {row['long_description'][:200]}..."
        }

# Prepare data for parallel processing
event_rows = [(idx, row) for idx, row in df.iterrows()]

# Process events in parallel with progress bar
print(f"Processing {len(event_rows)} events in parallel...")
print("Using 8 concurrent workers (adjust if you hit rate limits)\n")

baby_friendly_flags = [None] * len(df)
indoor_outdoor_flags = [None] * len(df)
clean_summaries = [None] * len(df)

# Use ThreadPoolExecutor for parallel API calls
with ThreadPoolExecutor(max_workers=8) as executor:
    # Submit all tasks
    future_to_idx = {executor.submit(process_event, row_tuple): row_tuple[0] 
                     for row_tuple in event_rows}
    
    # Process completed tasks with progress bar
    for future in tqdm(as_completed(future_to_idx), total=len(event_rows), desc="Processing events"):
        idx, result = future.result()
        baby_friendly_flags[idx] = result['baby_friendly']
        indoor_outdoor_flags[idx] = result['indoor_or_outdoor']
        clean_summaries[idx] = result['clean_summary']

# Add to dataframe
df['baby_friendly'] = baby_friendly_flags
df['indoor_or_outdoor'] = indoor_outdoor_flags
df['clean_summary'] = clean_summaries

print(f"\n✅ Processing complete!")
print(f"Baby-friendly events: {sum(baby_friendly_flags)} out of {len(df)} ({sum(baby_friendly_flags)/len(df)*100:.1f}%)")
print(f"\nIndoor/Outdoor breakdown:")
print(f"  Indoor: {indoor_outdoor_flags.count('indoor')}")
print(f"  Outdoor: {indoor_outdoor_flags.count('outdoor')}")
print(f"  Both: {indoor_outdoor_flags.count('both')}")

Processing 90 events in parallel...
Using 8 concurrent workers (adjust if you hit rate limits)



Processing events: 100%|██████████| 90/90 [00:31<00:00,  2.84it/s]


✅ Processing complete!
Baby-friendly events: 57 out of 90 (63.3%)

Indoor/Outdoor breakdown:
  Indoor: 64
  Outdoor: 13
  Both: 13





In [18]:
# Preview some baby-friendly events
baby_friendly_events = df[df['baby_friendly'] == True]
print(f"Sample baby-friendly events:")
baby_friendly_events[['title', 'baby_friendly']].head(10)


Sample baby-friendly events:


Unnamed: 0,title,baby_friendly
1,The Other Art Fair Brooklyn,True
2,Canstruction,True
3,Queer History Walking Tour,True
4,Cheese Week,True
6,FAD Market,True
8,"""CHROMA: Tales Between Hues"" immersive exhibit",True
9,"""Renoir’s Drawings"" at The Morgan",True
10,"""Rauschenberg’s New York: Pictures from the Re...",True
12,Grand Bazaar NYC,True
14,"""Monet and Venice"" at Brooklyn Museum",True


## 3. Create Embeddings

Generate semantic embeddings using OpenAI's `text-embedding-3-small` model.

**✅✅✅ Embedding Strategy:**
- Combine title + description for rich semantic representation
- Use OpenAI's latest embedding model (1536 dimensions)
- Event-level chunks: 1 event = 1 embedding


In [19]:
def create_embedding(text: str) -> list[float]:
    """
    Create embedding for text using OpenAI.
    
    Args:
        text: Text to embed
    
    Returns:
        list[float]: 1536-dimensional embedding vector
    """
    try:
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return response.data[0].embedding
    except Exception as e:
        print(f"Error creating embedding: {e}")
        return [0.0] * 1536  # Return zero vector on error

print("✅ Embedding function defined!")


✅ Embedding function defined!


In [20]:
# Create text for embedding (title + description)
df['embedding_text'] = df['title'] + " " + df['clean_summary']


# Generate embeddings for all events
print("Generating embeddings for all events...")
print("This will take ~1-2 minutes for 100 events.\n")

embeddings = []

for idx, row in tqdm(df.iterrows(), total=len(df), desc="Creating embeddings"):
    embedding_text = f"{row['title']}. {row['clean_summary']}"
    print(embedding_text)
    embedding = create_embedding(row['embedding_text'])
    embeddings.append(embedding)
    
    # Rate limiting
    time.sleep(0.05)

# Convert to numpy array
embeddings_array = np.array(embeddings)

print(f"\n✅ Created {len(embeddings)} embeddings")
print(f"Embedding shape: {embeddings_array.shape}")


Generating embeddings for all events...
This will take ~1-2 minutes for 100 events.



Creating embeddings:   0%|          | 0/90 [00:00<?, ?it/s]

The NY Comedy Festival. The New York Comedy Festival features over 200 comedians performing across 100 shows at various iconic venues in NYC. The festival includes stand-up, improv, panels, and live podcast tapings. Notable events include a reunion of the cast from 'Strangers with Candy' and showcases for emerging talent.


Creating embeddings:   1%|          | 1/90 [00:00<00:55,  1.60it/s]

The Other Art Fair Brooklyn. The Other Art Fair Brooklyn features original artworks from over 130 independent and emerging artists. Attendees can meet artists and explore various media including painting, photography, and sculpture. The event includes interactive installations and immersive experiences.


Creating embeddings:   2%|▏         | 2/90 [00:01<00:47,  1.86it/s]

Canstruction. Canstruction is an annual competition where architecture teams create large art installations using canned food. The event takes place at Brookfield Place in Manhattan from October 30 to November 10, showcasing 23 unique sculptures. Attendees can view the installations and vote for their favorites, with all cans donated to City Harvest to help combat hunger.


Creating embeddings:   3%|▎         | 3/90 [00:01<00:53,  1.64it/s]

Queer History Walking Tour. The Queer History Walking Tour explores the history of queer communities in Manhattan's East Village and Lower East Side. It is a two-hour tour with six stops that change based on the guide and current events. The tour emphasizes the evolution of queer spaces in New York.


Creating embeddings:   4%|▍         | 4/90 [00:02<00:44,  1.94it/s]

Cheese Week. New York Cheese Week is a three-week celebration of cheese and wine featuring over 15 tastings and 18 participating restaurants. The event includes culinary experiences such as masterclasses and tastings at various locations throughout the city. Attendees can enjoy cheese-centric menus and learn about different cheese varieties.


Creating embeddings:   6%|▌         | 5/90 [00:02<00:39,  2.14it/s]

Immigrant Jam. Immigrant Jam is a comedy show featuring stand-up, storytelling, and sketch performances focused on immigration themes. The event includes various comedians from diverse backgrounds, representing countries such as Australia, China, and Germany. It takes place at Caveat and includes games and prizes.


Creating embeddings:   7%|▋         | 6/90 [00:02<00:38,  2.18it/s]

FAD Market. FAD Market is a curated pop-up marketplace featuring independent vendors and creators. Attendees can shop for handmade jewelry, apparel, skincare products, and artisanal food. The event is free to enter and welcomes dogs.


Creating embeddings:   8%|▊         | 7/90 [00:03<00:38,  2.15it/s]

Amber Georgia Wine Fair. The Amber Georgia Wine Fair is a wine festival featuring over 30 winemakers from the Republic of Georgia. Attendees can taste various Georgian amber wines and enjoy food and music that reflect the culture of Tbilisi. The event will take place at Brooklyn's Industry City.


Creating embeddings:   9%|▉         | 8/90 [00:03<00:35,  2.30it/s]

"CHROMA: Tales Between Hues" immersive exhibit. CHROMA: Tales Between Hues is an immersive exhibit at Genesis House that explores Korean folktales through vibrant color installations. Attendees will walk through six themed areas, each representing different stories and emotions associated with the Obangsaek color spectrum. The exhibition is open for free from Tuesday to Sunday, showcasing a unique blend of art and cultural storytelling.


Creating embeddings:  10%|█         | 9/90 [00:04<00:37,  2.15it/s]

"Renoir’s Drawings" at The Morgan. The event is an exhibition titled 'Renoir’s Drawings' at The Morgan Library & Museum, showcasing over 100 works on paper by Pierre-Auguste Renoir. It includes pastels, watercolors, prints, and a plaster sculpture, highlighting the artist's range and sensitivity. The exhibition runs from October 17, 2025, to February 8, 2026, and features related programming such as concerts, talks, and a watercolor workshop.


Creating embeddings:  11%|█         | 10/90 [00:04<00:35,  2.26it/s]

"Rauschenberg’s New York: Pictures from the Real World". The event is an exhibition titled 'Robert Rauschenberg’s New York: Pictures from the Real World' at the Museum of the City of New York. It explores Rauschenberg's integration of photography and found objects, showcasing his work from 1963 to 1994. The exhibition is organized into three sections that trace the evolution of his photographic practice.


Creating embeddings:  12%|█▏        | 11/90 [00:05<00:34,  2.31it/s]

Oratorio for Living Things. Oratorio for Living Things is a musical exploration of existence by Heather Christian, featuring a blend of classical choral music with pop, blues, and gospel elements. The performance includes a dozen vocalists and instrumentalists, engaging the audience in themes of time, evolution, and cosmology. The show runs for 90 minutes without an intermission and is designed for full engagement with its complex harmonies and thought-provoking lyrics.


Creating embeddings:  13%|█▎        | 12/90 [00:05<00:34,  2.25it/s]

Grand Bazaar NYC. Grand Bazaar NYC is a large marketplace featuring over 150 local merchants selling vintage items, antiques, and handmade goods. The market operates on Sundays from 10am to 5pm and includes both indoor and outdoor spaces. Each week has a different theme, highlighting various types of vendors and products.


Creating embeddings:  14%|█▍        | 13/90 [00:06<00:34,  2.24it/s]

"Peter Hujar’s Day" movie. The event is a screening of the film 'Peter Hujar’s Day,' which explores the life of New York photographer Peter Hujar through a conversation with his friend Linda Rosenkrantz. The film is adapted from a lost interview transcript and showcases Hujar's interactions with notable figures in the 1970s and 1980s art scene. It highlights themes of friendship and the impact of the HIV/AIDS crisis on artists.


Creating embeddings:  17%|█▋        | 15/90 [00:06<00:25,  2.96it/s]

"Monet and Venice" at Brooklyn Museum. The Brooklyn Museum's exhibit 'Monet and Venice' features a collection of Claude Monet's paintings inspired by Venice, including 19 of his works. The exhibit includes multi-sensory elements such as an original symphonic score and immersive film clips. It explores Monet's artistic journey in Venice and showcases additional works by other artists influenced by the city.
"Ragtime". Ragtime is a Broadway musical that explores American life in the early 20th century through the stories of three fictional families and real historical figures. The production features a large cast and an extensive orchestra, showcasing a stirring score and intricate storytelling. It is directed by Lear deBessonet and is a revival of the original 1998 musical.


Creating embeddings:  18%|█▊        | 16/90 [00:06<00:24,  2.97it/s]

"Robert Rauschenberg: Life Can't Be Stopped" at the Guggenheim. The Guggenheim Museum is hosting an exhibition titled 'Robert Rauschenberg: Life Can't Be Stopped,' featuring over a dozen historic pieces by the artist. The show highlights Rauschenberg's experiments with photographs in various media, including his notable 32-foot-long silkscreen painting 'Barge.' The exhibition runs from October 10, 2025, to April 5, 2026.


Creating embeddings:  19%|█▉        | 17/90 [00:07<00:22,  3.24it/s]

"Dress, Dreams and Desire: Fashion and Psychoanalysis" at The Museum at FIT. The event is an exhibition titled 'Dress, Dreams and Desire: Fashion and Psychoanalysis' at The Museum at FIT. It explores the relationship between fashion, gender, and psychoanalysis through a display of nearly 100 items of dress. The exhibition examines historical and contemporary themes related to self-image and nonbinary fashion.


Creating embeddings:  20%|██        | 18/90 [00:07<00:22,  3.23it/s]

"Predator: Badlands" movie. Predator: Badlands is a new installment in the Predator movie franchise directed by Dan Trachtenberg. The film follows Dek, a Yautja who is sent to an alien planet to prove his strength by hunting a powerful creature. It features a mix of action and dark humor, with themes of survival and teamwork.


Creating embeddings:  21%|██        | 19/90 [00:07<00:23,  2.97it/s]

"The Gay Harlem Renaissance" at The New York Historical. The event is an exhibit titled 'The Gay Harlem Renaissance' at The New York Historical. It explores the contributions of Black LGBTQ+ artists, writers, and performers during the Harlem Renaissance. The exhibit features over 200 objects, including paintings, photographs, and documents, highlighting the creativity and resilience of Black LGBTQ+ individuals in early 20th century Harlem.


Creating embeddings:  22%|██▏       | 20/90 [00:08<00:28,  2.49it/s]

Reposo y Recuerdo (Rest and Remember) at Green-Wood Cemetery. Reposo y Recuerdo is a Día de Muertos ofrenda installation at Green-Wood Cemetery, featuring a community altar where visitors can bring photos and keepsakes. The installation includes handwoven hammocks and colorful decorations, and it will be on view in the historic chapel. The event encourages personal remembrance and will culminate in a family celebration with crafts, music, and activities on November 1.


Creating embeddings:  23%|██▎       | 21/90 [00:09<00:33,  2.06it/s]

"Christy" film. The film 'Christy' is a boxing biopic directed by David Michôd, focusing on the life of prizefighter Christy Martin. It explores her rise in the boxing world and the personal struggles she faced, including domestic abuse. The film features strong performances, particularly by Sydney Sweeney as Christy and Ben Foster as her husband.


Creating embeddings:  24%|██▍       | 22/90 [00:09<00:27,  2.44it/s]

"Monet and Venice" at Brooklyn Museum. The Brooklyn Museum's exhibit 'Monet and Venice' features a collection of Claude Monet's paintings inspired by Venice, including 19 of his works. The exhibit includes multi-sensory elements such as an original symphonic score and immersive film clips. It explores Monet's artistic journey in Venice and showcases additional works by other artists influenced by the city.


Creating embeddings:  26%|██▌       | 23/90 [00:09<00:25,  2.67it/s]

"Divine Egypt" at The Met. The Met's exhibition 'Divine Egypt' showcases Egyptian art focusing on images of gods from ancient Egypt. It features over 3,000 years of history and includes statues and figurines representing 25 main deities. The exhibit explores the complex nature of these deities and their significance in daily worship.


Creating embeddings:  28%|██▊       | 25/90 [00:10<00:18,  3.48it/s]

Classic Harbor Line Fall Foliage Tours. Classic Harbor Line offers scenic fall foliage sails on vintage vessels from October 4 to November 16. Attendees can enjoy views of the changing leaves along the Hudson River and see landmarks like the George Washington Bridge and the Cloisters. The experience includes options for guided tours with narration and light lunches.
The Future Was Then: The Changing Face of Fascist Italy. The event is an exhibition titled 'The Future Was Then: The Changing Face of Fascist Italy' at Poster House. It runs from September 27 until February 22, 2026, and features 75 pieces from the Fondazione Massimo e Sonia Cirulli. The exhibition explores the rise of Benito Mussolini's regime and the role of art in propaganda.


Creating embeddings:  29%|██▉       | 26/90 [00:10<00:19,  3.34it/s]

Blazing A Trail: Dorothy Waugh’s National Parks Posters. The event is an exhibition titled 'Blazing A Trail: Dorothy Waugh’s National Parks Posters' at Poster House. It features 17 travel posters designed by Dorothy Waugh for the National Park Service's first poster campaign from 1934 to 1936. The exhibition runs from September 27 until February 22, 2026.


Creating embeddings:  30%|███       | 27/90 [00:10<00:17,  3.57it/s]

"Witnessing Humanity: The Art of John Wilson" at the Met. The event is an exhibition of the works of American artist John Wilson at The Met. It features over 100 pieces including paintings, prints, drawings, and sculptures that reflect his experiences as a Black American artist. The exhibition runs from September 20 to February 8, 2026.


Creating embeddings:  31%|███       | 28/90 [00:10<00:16,  3.82it/s]

Handmade Happy Hour. Handmade Happy Hour is a pottery class held at Greenwich House Pottery. Participants can create pottery pieces with guidance from an expert teaching artist. Classes are BYOB and run from 7:30 to 9:30pm.


Creating embeddings:  32%|███▏      | 29/90 [00:11<00:19,  3.15it/s]

Ai Weiwei's Camouflage. Ai Weiwei's installation 'Camouflage' is located at Franklin D. Roosevelt Four Freedoms State Park. The artwork features camouflage netting and allows visitors to contribute reflections on freedom. The installation is on view from September 10, 2023, through December 1, 2025.


Creating embeddings:  34%|███▍      | 31/90 [00:11<00:15,  3.81it/s]

"The New York Sari" exhbit. The New York Sari exhibition at the New York Historical Society explores the history and cultural significance of the sari from the Indian subcontinent to New York City. It features over 50 objects, photographs, and ephemera, highlighting the garment's role in identity and community. The exhibition includes family-friendly programming and a dedicated family guide.
"Man Ray: When Objects Dream" at the Met. The event is an exhibition of Man Ray's works, including over 60 rayographs and 100 other artistic pieces. It showcases his innovative manipulation of objects, light, and media. The exhibition is on display at The Met from September 14 until February 2026.


Creating embeddings:  36%|███▌      | 32/90 [00:12<00:18,  3.17it/s]

"Masquerade" immersive experience. Masquerade is an immersive theater experience based on The Phantom of the Opera. Audiences are guided through multiple locations in a midtown complex, experiencing a reimagined narrative focused on the relationship between the Phantom and Christine. The event features simultaneous performances with multiple actors in various roles, creating a complex and coordinated theatrical experience.


Creating embeddings:  37%|███▋      | 33/90 [00:12<00:18,  3.05it/s]

An Ecology of Quilts: The Natural History of American Textiles. The American Folk Art Museum is hosting an exhibition titled 'An Ecology of Quilts: The Natural History of American Textiles.' The exhibit features approximately 30 quilts from the 18th to 20th centuries, focusing on the relationship between the environment and quilting practices. It explores the origins of textile production and its impact on quiltmaking in American culture.


Creating embeddings:  38%|███▊      | 34/90 [00:13<00:21,  2.60it/s]

"Amaze" magic show. The event is a magic show featuring British conjurer Jamie Allan, known as iMagician. It combines traditional magic tricks with modern technology and storytelling. The show is directed by Jonathan Goodwin and co-created with Tommy Bond.


Creating embeddings:  39%|███▉      | 35/90 [00:13<00:19,  2.78it/s]

Latin Mix Saturdays. Latin Mix Saturdays is a rooftop party featuring DJ Torres playing reggaetón, salsa, merengue, bachata, and Latin house music. Attendees can enjoy specialty cocktails while dancing under the stars with views of the NYC skyline. Seating is available on a first-come, first-served basis.


Creating embeddings:  40%|████      | 36/90 [00:13<00:18,  2.91it/s]

The New Yorker in Dog Years. The event is an exhibition titled 'The New Yorker in Dog Years' at the AKC Museum of the Dog. It features 44 dog-themed covers from The New Yorker magazine, celebrating its centennial anniversary. Attendees can view the artwork and learn about the stories behind each cover.


Creating embeddings:  41%|████      | 37/90 [00:13<00:17,  3.12it/s]

Arte Museum. The Arte Museum features immersive digital art installations, soundscapes, and custom fragrances. Attendees can expect a multi-sensory experience inspired by nature. The event lasts about an hour and a half and is located at 61 Chelsea Piers.


Creating embeddings:  42%|████▏     | 38/90 [00:14<00:17,  3.03it/s]

"Slavery Trails" augmented reality exhibition. The 'Slavery Trails' augmented reality exhibition features four installations across New York City that educate the public about the history of slavery. The exhibitions include sculptures and augmented reality experiences that can be accessed via mobile devices. Locations include City Hall Park in Manhattan, Bush Terminal Park in Brooklyn, and Astoria Park in Queens.


Creating embeddings:  43%|████▎     | 39/90 [00:14<00:18,  2.79it/s]

RaaaatScraps: The Best Improv Show in the World. RaaaatScraps is a weekly improv comedy show at Caveat NYC. A guest monologist shares true stories, which are then used by performers to create improvised scenes. The event features notable NYC improvisers and aims to provide entertainment on Sundays.


Creating embeddings:  46%|████▌     | 41/90 [00:15<00:15,  3.20it/s]

"Fragile Giants" sculptures on Park Avenue. The event features a public art exhibition titled 'Fragile Giants' along Park Avenue, showcasing nine monumental animal sculptures by artist Michel Bassompierre. The sculptures, including a large gorilla and a polar bear, aim to raise awareness about endangered species and the beauty of nature. This open-air gallery is free to visit and will be on display until May 11, 2026.
City Climb at Edge. City Climb at Edge is an external building climb on the observation deck of Edge, which is the tallest in the Western Hemisphere. Participants are secured by cables as they ascend 32 steps to 'The Cliff' and 151 steps on a 45-degree incline to 'The Stair.' The experience culminates with a viewing area on the 100th floor.


Creating embeddings:  47%|████▋     | 42/90 [00:15<00:14,  3.40it/s]

"Aunties" sculptures in Harlem. The event features the installation of three colorful sculptures titled 'Aunties' by artist Fitgi Saint-Louis, located at the intersection of 124th Street and Lenox Avenue in Harlem. These figures honor the contributions of women in the community and are designed to invite interaction. The installation is part of a larger initiative to bring public art to urban environments.


Creating embeddings:  48%|████▊     | 43/90 [00:15<00:13,  3.50it/s]

Van Gogh’s Flowers at New York Botanical Garden. Van Gogh’s Flowers is an exhibition at the New York Botanical Garden that showcases the works of Vincent van Gogh through living installations and floral sculptures. The event features a field of sunflowers, contemporary interpretations of his art, and interactive programming such as plein air painting sessions and LEGO pop-ups. The exhibition runs from May 24 to October 26 and includes both indoor and outdoor components.


Creating embeddings:  49%|████▉     | 44/90 [00:16<00:13,  3.37it/s]

Torch & Crown's open-air beer garden. Torch & Crown Brewing Company is hosting an open-air beer garden in Union Square Park from May 1 to early November. The venue features a variety of locally-made beers and seasonal food options sourced from the nearby Union Square Market. The space includes both indoor and outdoor areas, and offers live music and trivia events throughout the summer.


Creating embeddings:  50%|█████     | 45/90 [00:16<00:14,  3.02it/s]

"Path of Liberty: That Which Unites US". Path of Liberty: That Which Unites US is an immersive art exhibition in Midtown East, featuring large screens that display the voices of 50 Americans discussing themes of democracy and unity. The installation spans 6 acres and includes winding paths for visitors to explore personal stories and historical context. It is free to visit and open Thursday through Saturday evenings, with additional public viewing hours on other days.


Creating embeddings:  51%|█████     | 46/90 [00:16<00:13,  3.23it/s]

“Gardens of Renewal” by Lily Kwong. The Gardens of Renewal is a meditative spiral pathway located in Madison Square Park, featuring native plant species and areas for reflection. The event includes a Children's Garden with play structures and educational programming for kids. Visitors can engage with nature, enjoy performances, and participate in free public programs throughout the summer.


Creating embeddings:  52%|█████▏    | 47/90 [00:16<00:11,  3.59it/s]

“Foot Fountain (pink)”. The event features Mika Rottenberg's sculpture 'Foot Fountain (pink)' located at the 30th Street entrance of the High Line. This 10-foot tall sculpture includes a working sprinkler that can be activated by pedals, providing an interactive experience for passersby. The sculpture is part of a series of artworks displayed along the elevated park.


Creating embeddings:  53%|█████▎    | 48/90 [00:17<00:14,  2.97it/s]

Rooftop Cinema Club. Rooftop Cinema Club is a film series held on a rooftop in midtown Manhattan. Attendees can watch movies while enjoying views of the skyline, with options for seating and movie snacks available. The event features a variety of films and is subject to weather conditions.


Creating embeddings:  54%|█████▍    | 49/90 [00:17<00:13,  3.02it/s]

"Urban Stomp: Dreams & Defiance on the Dance Floor". The event is an exhibition at the Museum of the City of New York that explores the history of social dance in the city over the past 200 years. It features artifacts, digital dance lessons, and an interactive dance floor. Attendees can learn about various dance styles and their cultural significance.


Creating embeddings:  56%|█████▌    | 50/90 [00:18<00:13,  2.97it/s]

BQ Flea. BQ Flea is a weekly flea market held every Sunday from 10am to 5pm under the Brooklyn-Queens Expressway. Vendors sell vintage items, collectibles, and handmade goods from their cars. The market also features local food options and fosters a community atmosphere.


Creating embeddings:  57%|█████▋    | 51/90 [00:18<00:13,  2.95it/s]

"New York Roots" in the Garment District. The event features a public art installation titled 'New York Roots' by artist Steve Tobin, located in the Garment District. It consists of seven large metal sculptures that invite visitors to walk among them and reflect on community and connection. The installation is accessible to the public and will be on display until February 2026.


Creating embeddings:  58%|█████▊    | 52/90 [00:19<00:15,  2.44it/s]

“Mission: Impossible—Story and Spectacle” at MOMI. The event is an exhibition at the Museum of the Moving Image focused on the Mission: Impossible film franchise. It features sections dedicated to each film, highlighting stunts and action sequences. The exhibition includes behind-the-scenes content related to the series.


Creating embeddings:  59%|█████▉    | 53/90 [00:19<00:12,  2.85it/s]

Smorgasburg. Smorgasburg is an outdoor food festival featuring over 70 local vendors. It takes place in Lower Manhattan, Williamsburg, and Prospect Park from April through October. Attendees can enjoy a variety of cuisines and dishes from immigrant and family-owned businesses.


Creating embeddings:  60%|██████    | 54/90 [00:19<00:11,  3.05it/s]

"Breaking the Mold: Brooklyn Museum at 200" exhibit. The Brooklyn Museum is hosting an exhibition titled 'Breaking the Mold: Brooklyn Museum at 200' to celebrate its 200th anniversary. The exhibit features a wide range of artworks from its collection, including paintings, sculptures, and photographs that highlight the museum's history and community. It is open to the public and runs through February 22, 2026.


Creating embeddings:  61%|██████    | 55/90 [00:19<00:10,  3.35it/s]

WWII exhibit at the Intrepid Museum. The WWII exhibit at the Intrepid Museum showcases the history of the USS Intrepid and its crew during World War II. It features 50 artifacts, crew member oral histories, and a newly acquired FG-1D Corsair aircraft. Visitors can learn about the ship's evolution and the experiences of service members through various displays and multimedia presentations.


Creating embeddings:  62%|██████▏   | 56/90 [00:19<00:09,  3.54it/s]

"Stranger Things: The First Shadow" on Broadway. Stranger Things: The First Shadow is a theatrical prequel to the popular Netflix series, now showing on Broadway. The play explores the backstory of Henry Creel, a troubled teen connected to a secret government experiment. It features a large cast and incorporates supernatural themes, with a focus on character development and relationships.


Creating embeddings:  63%|██████▎   | 57/90 [00:20<00:09,  3.44it/s]

"Just in Time". Just in Time is a Broadway review featuring the music and life of pop crooner Bobby Darin, performed by Jonathan Groff. The show combines live music, dance, and a narrative that explores Darin's life from his childhood to his untimely death. It transforms the theater into a nightclub setting, offering an engaging and dynamic performance.


Creating embeddings:  64%|██████▍   | 58/90 [00:20<00:10,  3.09it/s]

Laser tag at mini-bowling at Area 53. Area 53 offers laser tag and mini-bowling in a two-story playscape designed for both kids and adults. Participants can engage in laser tag matches lasting 10-15 minutes and enjoy mini-bowling sessions for up to six people. The venue also features arcade games, making it a family-friendly entertainment option.


Creating embeddings:  66%|██████▌   | 59/90 [00:21<00:10,  2.87it/s]

INTER's interstellar immersive experience. INTER is an immersive multi-sensory museum experience located in Soho. It features over 10 interactive exhibits that use light, sound, and digital projection to create a space-themed adventure. Attendees can explore installations like a floral tunnel, an infinity room, and participate in a space scavenger hunt.


Creating embeddings:  67%|██████▋   | 60/90 [00:21<00:10,  2.93it/s]

"The New Yorker" exhibit. The New York Public Library is hosting an exhibit titled 'A Century of The New Yorker' to celebrate the magazine's 100th anniversary. The exhibit showcases the magazine's history through old covers, rare manuscripts, photographs, and cartoon art. Visitors can explore the evolution of one of America's most influential publications.


Creating embeddings:  68%|██████▊   | 61/90 [00:21<00:09,  3.08it/s]

The newly renovated Rockefeller Wing at The Met. The Metropolitan Museum of Art has reopened its Rockefeller Wing, featuring galleries dedicated to the arts of Africa, the Ancient Americas, and Oceania. The reopening includes a daylong festival with performances, live music, and art-making activities. The galleries showcase 1,800 artworks and new features for deeper engagement with the collections.


Creating embeddings:  69%|██████▉   | 62/90 [00:22<00:09,  2.94it/s]

Wicked-themed Book-a-Bath at Lush Spa. The Wicked-themed Book-a-Bath at Lush Spa offers a unique bathing experience inspired by the musical. Attendees can enjoy a private soak in a clawfoot tub with themed bath products and music. The spa provides a cozy atmosphere with personalized skincare consultations and a selection of Wicked-themed items.


Creating embeddings:  70%|███████   | 63/90 [00:22<00:08,  3.12it/s]

"Maybe Happy Ending" on Broadway. Maybe Happy Ending is a musical set in a near-future Korea, focusing on two Helperbots, Oliver and Claire, as they explore love and connection. The show features a blend of old-fashioned jazz standards and modern songs, highlighting the emotional journey of the characters. It runs for 1 hour and 45 minutes without an intermission.


Creating embeddings:  71%|███████   | 64/90 [00:22<00:08,  2.98it/s]

The Birch Trials exhibit at Fraunces Tavern. The Birch Trials exhibit at Fraunces Tavern Museum explores the historical evacuation of Black Loyalists at the end of the Revolutionary War. It features archival documents and recreations of the committee's meetings that determined the eligibility of individuals to evacuate with the British Army. The exhibit aims to educate visitors about this significant yet lesser-known aspect of American history.


Creating embeddings:  72%|███████▏  | 65/90 [00:23<00:10,  2.39it/s]

Catacombs by Candlelight Tour. The Catacombs by Candlelight Tour is an 80-minute guided experience at the Basilica of St. Patrick’s Old Cathedral. Participants will explore areas not open to the public, including cemeteries and the Catacombs. The tour is conducted by candlelight, providing a unique atmosphere.


Creating embeddings:  73%|███████▎  | 66/90 [00:23<00:09,  2.58it/s]

"The Subway Is..." exhibit. The event is an exhibition titled 'The Subway Is...' at the New York Transit Museum, focusing on the history and future of the subway system. It features artifacts, photos, multimedia installations, and train models. Attendees can explore how the subway transformed New York City since its opening in 1904.


Creating embeddings:  74%|███████▍  | 67/90 [00:24<00:09,  2.46it/s]

Friday Night Vibes at Time Out Market. Friday Night Vibes is an event at Time Out Market New York featuring music from DJs on the rooftop. Attendees can enjoy specialty cocktails and food from various kitchens while taking in views of the East River and NYC skyline. The event occurs on the fifth floor at 7pm on select Fridays.


Creating embeddings:  76%|███████▌  | 68/90 [00:24<00:07,  2.78it/s]

Xanadu Roller Arts. Xanadu Roller Arts is a new wooden roller skating rink and concert venue in NYC. It features a rinkside bar and a unique bathroom DJ setup. The venue is designed to accommodate up to 1,200 people.


Creating embeddings:  77%|███████▋  | 69/90 [00:24<00:06,  3.01it/s]

Second City comedy club. Second City is a comedy club in Brooklyn that features live sketch and improv performances every night. The venue offers two main shows: The First City Revue and The Best of The Second City, showcasing original and classic material. It also includes a restaurant and bar with a diverse menu and in-show dining options.


Creating embeddings:  78%|███████▊  | 70/90 [00:25<00:06,  2.89it/s]

Grand Bazaar NYC. Grand Bazaar NYC is a marketplace featuring over 150 local merchants selling vintage items, antiques, and handmade goods. The event takes place on Sundays from 10am to 5pm on the Upper West Side, with a different theme each week. It supports local public schools by donating 100% of booth rental profits.


Creating embeddings:  79%|███████▉  | 71/90 [00:25<00:05,  3.25it/s]

Puttery. Puttery is an adults-only mini-golf venue located in the Meatpacking District. It features two nine-hole courses with unique designs and includes an underground lounge and three bars. Guests can enjoy elevated bar food and creative cocktails while playing golf.


Creating embeddings:  80%|████████  | 72/90 [00:25<00:04,  3.61it/s]

The Big Bubble Experiment at NYSCI. The Big Bubble Experiment is an interactive exhibit at the New York Hall of Science where visitors can explore the science of bubbles through various hands-on activities. It features 10 stations with different tools for blowing, stretching, and popping bubbles, encouraging experimentation and observation. The exhibit is designed for kids of all ages and is included with general admission.


Creating embeddings:  81%|████████  | 73/90 [00:25<00:05,  3.07it/s]

The Friends Experience. The Friends Experience is an immersive, walk-through exhibit located in the Flatiron District. Attendees can interact with props from the show, take photos at various iconic locations, and view displays about the show's costume design and art. The event features both paid and free photo opportunities, making it accessible for fans of all ages.


Creating embeddings:  82%|████████▏ | 74/90 [00:26<00:05,  3.05it/s]

Chamber Magic. Chamber Magic features Steve Cohen performing high-class parlor magic in the Madison Room at the Lotte New York Palace. The event requires cocktail attire and includes a variety of magic routines, including signature acts and a card-trick finale. Tickets are available for purchase, with options for meet-and-greet experiences.


Creating embeddings:  83%|████████▎ | 75/90 [00:26<00:05,  2.59it/s]

Bonsai Bar. Bonsai Bar is an event where participants can learn the art of bonsai while enjoying drinks. Attendees will pot, prune, and design their own bonsai trees with guidance from professionals. The event takes place at various brewery locations across the city.


Creating embeddings:  84%|████████▍ | 76/90 [00:26<00:04,  2.99it/s]

Act & Sip NYC. Act & Sip is an acting workshop held in an Off-Broadway Theater where participants perform scenes from iconic NYC sitcoms or movies. The event is designed for adults, requiring attendees to be 21 or older and to bring their own beverages. It focuses on helping individuals overcome stage fright and explore their acting abilities.


Creating embeddings:  86%|████████▌ | 77/90 [00:27<00:04,  3.10it/s]

Wild Captives archery studio. Wild Captives is an archery studio in Brooklyn offering hour-long introduction to archery classes every weekend. The classes are designed for individuals over age 12 and focus on empowering participants through the sport. Each session includes learning about bow parts, safety, and a chance to shoot at targets.


Creating embeddings:  87%|████████▋ | 78/90 [00:27<00:04,  2.68it/s]

MoMA Before Hours Tour. The MoMA Before Hours Tour is a guided tour of the Museum of Modern Art before it opens to the public. Participants will see famous artworks and learn about their history from a professional art historian. After the tour, attendees can explore the museum at their own pace.


Creating embeddings:  88%|████████▊ | 79/90 [00:28<00:03,  2.75it/s]

LOOK Dine-in Cinemas. LOOK Dine-in Cinemas is a new movie theater in Hell's Kitchen that features eight auditoriums with advanced projection and sound technology. The venue offers a full restaurant with a diverse menu, allowing guests to dine in their seats. It aims to provide a family-friendly atmosphere with a variety of films, including blockbusters and kids' movies.


Creating embeddings:  90%|█████████ | 81/90 [00:28<00:02,  3.51it/s]

Free Black Women's Library in Brooklyn. The Free Black Women’s Library in Brooklyn is a community space that offers a collection of over 5,000 books written by Black women and non-binary authors. Visitors can read, work, or participate in various activities such as book clubs and workshops. The library also features a dedicated area for children's storytime, making it welcoming for families.
Off-beat walking tours with Purefinder. Purefinder offers walking tours in Manhattan that explore historical sites related to death and psychiatry. The tours cover about 2.5 miles in approximately two-and-a-half hours, focusing on unique narratives not typically found in guidebooks. Topics include the city's potter's fields, psychiatric history, and the Hell Gate area.


Creating embeddings:  91%|█████████ | 82/90 [00:28<00:02,  3.92it/s]

Museum of Broadway. The Museum of Broadway features three floors of displays celebrating the history and artistry of Broadway theater. Attendees can view original costumes, historical documents, and immersive exhibits related to famous musicals. The museum also includes a section dedicated to behind-the-scenes aspects of theater production.


Creating embeddings:  92%|█████████▏| 83/90 [00:29<00:01,  3.52it/s]

NYPL's free exhibit of rare artifacts. The New York Public Library is hosting a free exhibition featuring over 250 rare cultural artifacts spanning 4,000 years of history. Notable items include the only surviving letter from Christopher Columbus and the first Gutenberg Bible in the Americas. Timed tickets are required for entry.


Creating embeddings:  93%|█████████▎| 84/90 [00:29<00:01,  3.18it/s]

Sloomoo Institute slime museum. Sloomoo Institute is a slime museum located in SoHo that offers a sensory experience for all ages. Visitors can explore colorful vats of slime, participate in interactive displays, and create their own slime to take home. The venue provides biodegradable ponchos for those who want to experience getting slimed.


Creating embeddings:  94%|█████████▍| 85/90 [00:29<00:01,  3.05it/s]

Swingers NoMad. Swingers NoMad is a mini-golf entertainment complex featuring three nine-hole golf courses. It is designed for a 21-and-over audience, offering craft cocktails and gourmet street food from various vendors. The venue includes private rooms and live DJs, creating a unique nightlife experience.


Creating embeddings:  96%|█████████▌| 86/90 [00:30<00:01,  2.78it/s]

The Brooklyn Flea (Dumbo). The Brooklyn Flea is a weekend flea market in DUMBO featuring a variety of vendors selling vintage clothing, furniture, collectibles, and crafts. It also includes food vendors offering a selection of snacks. The market operates from March 15 through December, from 10am to 5pm.


Creating embeddings:  97%|█████████▋| 87/90 [00:30<00:01,  2.91it/s]

Shake Rattle & Roll Dueling Pianos. Shake Rattle & Roll Dueling Pianos is a live music event featuring two pianists who perform songs chosen by the audience. The event takes place every Saturday night at 10pm. Attendees are encouraged to sing along from home.


Creating embeddings:  98%|█████████▊| 88/90 [00:30<00:00,  3.20it/s]

The best fall activities in NYC to do with the arrival of Autumn. The event highlights various fall activities in NYC, including Halloween events and leaf-peeping opportunities. It focuses on experiences suitable for families and young children. Attendees can enjoy the seasonal atmosphere and participate in outdoor and indoor activities.


Creating embeddings:  99%|█████████▉| 89/90 [00:31<00:00,  3.03it/s]

The 50 best things to do in NYC for locals and tourists. This event highlights various activities and attractions in New York City suitable for both locals and tourists. It includes visits to art museums, Broadway shows, and dining experiences. The focus is on showcasing the best seasonal events and notable features of the city.


Creating embeddings: 100%|██████████| 90/90 [00:31<00:00,  2.85it/s]


✅ Created 90 embeddings
Embedding shape: (90, 1536)





## 4. Save Processed Data

Save the enriched data and embeddings to disk.


In [21]:
# Create processed directory if it doesn't exist
processed_dir = Path("../data/processed")
processed_dir.mkdir(parents=True, exist_ok=True)

# Save CSV (without embeddings - too large)
csv_path = processed_dir / "events_with_metadata.csv"
df_to_save = df.drop(columns=['embedding_text'])  # Remove temporary column
df_to_save.to_csv(csv_path, index=False)

# Save embeddings as numpy array
embeddings_path = processed_dir / "embeddings.npy"
np.save(embeddings_path, embeddings_array)

print(f"✅ Saved processed data to:")
print(f"  - CSV: {csv_path}")
print(f"  - Embeddings: {embeddings_path}")
print(f"\nFile sizes:")
print(f"  - CSV: {csv_path.stat().st_size / 1024:.1f} KB")
print(f"  - Embeddings: {embeddings_path.stat().st_size / 1024:.1f} KB")


✅ Saved processed data to:
  - CSV: ../data/processed/events_with_metadata.csv
  - Embeddings: ../data/processed/embeddings.npy

File sizes:
  - CSV: 309.3 KB
  - Embeddings: 1080.1 KB


## 5. Initialize Qdrant & Upload Events

Set up Qdrant vector database and upload all events with embeddings and metadata.

**✅✅✅ Vector Database Setup:**
- Using Qdrant local mode (no server needed)
- 1536-dimensional vectors (OpenAI embedding size)
- Cosine similarity for semantic search
- Metadata fields: event_id, title, description, url, baby_friendly


In [22]:
# Initialize Qdrant client (local mode)
qdrant_client = QdrantClient(path="../local_qdrant")

collection_name = "nyc_events"

# Delete collection if it exists (fresh start)
try:
    qdrant_client.delete_collection(collection_name)
    print(f"Deleted existing collection '{collection_name}'")
except:
    print(f"Collection '{collection_name}' doesn't exist yet")

# Create new collection
qdrant_client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(
        size=1536,  # OpenAI text-embedding-3-small dimension
        distance=Distance.COSINE
    )
)

print(f"\n✅ Created collection '{collection_name}'")


Deleted existing collection 'nyc_events'

✅ Created collection 'nyc_events'


In [23]:
processed_dir = Path("../data/processed")
embeddings_path = processed_dir / "embeddings.npy"

if embeddings_path.exists():
    print(f"✅ Loading embeddings from: {embeddings_path}")
    embeddings = np.load(embeddings_path)
    print(f"✅ Loaded {len(embeddings)} embeddings (shape: {embeddings.shape})")
    
    # Verify embeddings match dataframe length
    if len(embeddings) != len(df):
        raise ValueError(
            f"Embeddings count ({len(embeddings)}) doesn't match dataframe length ({len(df)}). "
            f"Please regenerate embeddings or load the correct dataframe."
        )
else:
    raise FileNotFoundError(
        f"Embeddings file not found at {embeddings_path}. "
        f"Please run the embedding generation cell first."
    )

✅ Loading embeddings from: ../data/processed/embeddings.npy
✅ Loaded 90 embeddings (shape: (90, 1536))


In [24]:
# Prepare points for upload
points = []

for idx, row in df.iterrows():
    point = PointStruct(
        id=idx,  # Use DataFrame index as ID
        vector=embeddings[idx],
        payload={
            "event_id": row['event_id'],
            "title": row['title'],
            "description": row['description'],
            "url": row['url'],
            "baby_friendly": bool(row['baby_friendly']),  # Ensure boolean type
            "is_free": bool(row['is_free']),  # Ensure boolean type
            "indoor_or_outdoor": str(row['indoor_or_outdoor'])  # "indoor", "outdoor", or "both"
        }
    )
    points.append(point)

# Upload to Qdrant
print(f"Uploading {len(points)} events to Qdrant...")
qdrant_client.upsert(
    collection_name=collection_name,
    points=points
)

print(f"\n✅ Successfully uploaded {len(points)} events to Qdrant!")

# Verify collection info
collection_info = qdrant_client.get_collection(collection_name)
print(f"\nCollection info:")
print(f"  - Vectors count: {collection_info.vectors_count}")
print(f"  - Points count: {collection_info.points_count}")


Uploading 90 events to Qdrant...

✅ Successfully uploaded 90 events to Qdrant!

Collection info:
  - Vectors count: None
  - Points count: 90


## 6. Test Retrieval

Test semantic search and metadata filtering.


In [25]:
def search_events(query: str, top_k: int = 5, baby_friendly_only: bool = False, free_only: bool = False, indoor_or_outdoor: str = None):
    """
    Search for events using semantic similarity.
    
    Args:
        query: Search query
        top_k: Number of results to return
        baby_friendly_only: If True, only return baby-friendly events
        free_only: If True, only return free events
        indoor_or_outdoor: Filter by location type ("indoor", "outdoor", or "both")
    
    Returns:
        List of search results
    """
    # Create embedding for query
    query_embedding = create_embedding(query)
    
    # Prepare filter if needed
    search_filter = None
    if baby_friendly_only or free_only or indoor_or_outdoor:
        from qdrant_client.models import Filter, FieldCondition, MatchValue
        
        conditions = []
        if baby_friendly_only:
            conditions.append(
                FieldCondition(
                    key="baby_friendly",
                    match=MatchValue(value=True)
                )
            )
        if free_only:
            conditions.append(
                FieldCondition(
                    key="is_free",
                    match=MatchValue(value=True)
                )
            )
        if indoor_or_outdoor:
            conditions.append(
                FieldCondition(
                    key="indoor_or_outdoor",
                    match=MatchValue(value=indoor_or_outdoor)
                )
            )
        
        search_filter = Filter(must=conditions)
    
    # Search Qdrant
    results = qdrant_client.search(
        collection_name=collection_name,
        query_vector=query_embedding,
        query_filter=search_filter,
        limit=top_k
    )
    
    return results

print("✅ Search function defined!")


✅ Search function defined!


In [26]:
# Test 1: General search
query1 = "fun outdoor activity for kids"
print(f"Query: '{query1}'\n")

results1 = search_events(query1, top_k=5)

for i, result in enumerate(results1, 1):
    print(f"{i}. {result.payload['title']}")
    print(f"   Score: {result.score:.3f}")
    print(f"   Baby-friendly: {result.payload['baby_friendly']}")
    print(f"   Description: {result.payload['description'][:100]}...\n")


Query: 'fun outdoor activity for kids'

1. Laser tag at mini-bowling at Area 53
   Score: 0.379
   Baby-friendly: True
   Description: Tucked away on Bridge Street in an old factory basement, this two-story playscape for kidsandadults ...

2. The best fall activities in NYC to do with the arrival of Autumn
   Score: 0.372
   Baby-friendly: True
   Description: The ultimate guide to fall in NYC, from leaf-peeping and apple picking to jack o' lantern festivals ...

3. “Gardens of Renewal” by Lily Kwong
   Score: 0.308
   Baby-friendly: True
   Description: When in need of a mental break, get yourself toMadison Square Parkto walk along a new meditative spi...

4. Wild Captives archery studio
   Score: 0.281
   Baby-friendly: False
   Description: Wild Captives, the nation’s first female- and LGBTQ-owned archery studio, is now open. It's a place ...

5. Torch & Crown's open-air beer garden
   Score: 0.273
   Baby-friendly: True
   Description: Torch & Crown Brewing Companyhas taken over pa

  results = qdrant_client.search(


In [27]:
# Test 2: Baby-friendly filtered search
query2 = "art museum"
print(f"Query: '{query2}' (baby-friendly only)\n")

results2 = search_events(query2, top_k=5, baby_friendly_only=True)

for i, result in enumerate(results2, 1):
    print(f"{i}. {result.payload['title']}")
    print(f"   Score: {result.score:.3f}")
    print(f"   Baby-friendly: {result.payload['baby_friendly']}")
    print(f"   Description: {result.payload['description'][:100]}...\n")


Query: 'art museum' (baby-friendly only)

1. Arte Museum
   Score: 0.583
   Baby-friendly: True
   Description: Lose yourself in immersive digital art, evocative soundscapes and custom-crafted scents at the new A...

2. The newly renovated Rockefeller Wing at The Met
   Score: 0.440
   Baby-friendly: True
   Description: After a four-year renovation, The Metropolitan Museum of Art has reopened its galleries dedicated to...

3. MoMA Before Hours Tour
   Score: 0.435
   Baby-friendly: True
   Description: On a typical visit to theMuseum of Modern Art, crowds surround the most precious paintings, and it c...

4. The Other Art Fair Brooklyn
   Score: 0.418
   Baby-friendly: True
   Description: Connect with artists in-person and explore hundreds of original artworks across various media includ...

5. "Breaking the Mold: Brooklyn Museum at 200" exhibit
   Score: 0.410
   Baby-friendly: True
   Description: The Brooklyn Museum is celebrating a big birthday. As the museum turns 200, it’s mark

  results = qdrant_client.search(


In [28]:
# Test 3: Semantic search (mood/vibe)
query3 = "romantic date night free"
print(f"Query: '{query3}'\n")
print("✅✅✅ Note: No explicit mood tags needed! Semantic search captures vibe naturally.\n")

results3 = search_events(query3, top_k=5)

for i, result in enumerate(results3, 1):
    print(f"{i}. {result.payload['title']}")
    print(f"   Score: {result.score:.3f}")
    print(f"   Baby-friendly: {result.payload['baby_friendly']}")
    print(f"   Description: {result.payload['description'][:100]}...\n")


Query: 'romantic date night free'

✅✅✅ Note: No explicit mood tags needed! Semantic search captures vibe naturally.

1. Friday Night Vibes at Time Out Market
   Score: 0.309
   Baby-friendly: False
   Description: Start your weekend off right at Time Out Market New York’s stunning rooftop! Friday Night Vibes gets...

2. Latin Mix Saturdays
   Score: 0.273
   Baby-friendly: False
   Description: Experience the ultimate Saturday night at Time Out Market’s Rooftop Latin Mix Party! Join resident D...

3. Rooftop Cinema Club
   Score: 0.272
   Baby-friendly: False
   Description: Rooftop Cinema Club takes movie-going to a whole new level—literally. This rooftop film series at a ...

4. LOOK Dine-in Cinemas
   Score: 0.265
   Baby-friendly: True
   Description: With a full restaurant, craft cocktails, comfy reclining seats and even more bells and whistles, thi...

5. Shake Rattle & Roll Dueling Pianos
   Score: 0.259
   Baby-friendly: False
   Description: Every Saturday night, two piano men

  results = qdrant_client.search(


In [29]:
# Test 4: Free events only
query4 = "art museums and exhibitions"
print(f"Query: '{query4}' (free only)\n")
print("✅✅✅ Testing free_only filter to find budget-friendly events.\n")

results4 = search_events(query4, top_k=5, free_only=True)

for i, result in enumerate(results4, 1):
    print(f"{i}. {result.payload['title']}")
    print(f"   Score: {result.score:.3f}")
    print(f"   Free: {result.payload.get('is_free', 'N/A')}")
    print(f"   Baby-friendly: {result.payload['baby_friendly']}")
    print(f"   Description: {result.payload['description'][:100]}...\n")


Query: 'art museums and exhibitions' (free only)

✅✅✅ Testing free_only filter to find budget-friendly events.

1. "Renoir’s Drawings" at The Morgan
   Score: 0.472
   Free: True
   Baby-friendly: True
   Description: Renoir’s sketchbook is moving into the spotlight. TheMorgan Library & Museumis about to do something...

2. The newly renovated Rockefeller Wing at The Met
   Score: 0.463
   Free: True
   Baby-friendly: True
   Description: After a four-year renovation, The Metropolitan Museum of Art has reopened its galleries dedicated to...

3. "Man Ray: When Objects Dream" at the Met
   Score: 0.436
   Free: True
   Baby-friendly: True
   Description: On a cold winter day in 1921, artist Man Ray placed some of his glass equipment on top of an unexpos...

4. NYPL's free exhibit of rare artifacts
   Score: 0.416
   Free: True
   Baby-friendly: True
   Description: The New York Public Library dug through its expansive and centuries-spanning archive to stage an imp...

5. "Robert Rauschen

  results = qdrant_client.search(


In [30]:
# Test 5: Combined filters - Baby-friendly AND Free
query5 = "family activities"
print(f"Query: '{query5}' (baby-friendly AND free)\n")
print("✅✅✅ Testing combined filters for budget-friendly family events.\n")

results5 = search_events(query5, top_k=5, baby_friendly_only=True, free_only=True)

if results5:
    for i, result in enumerate(results5, 1):
        print(f"{i}. {result.payload['title']}")
        print(f"   Score: {result.score:.3f}")
        print(f"   Free: {result.payload.get('is_free', 'N/A')}")
        print(f"   Baby-friendly: {result.payload['baby_friendly']}")
        print(f"   Description: {result.payload['description'][:100]}...\n")
else:
    print("No results found matching both criteria.")


Query: 'family activities' (baby-friendly AND free)

✅✅✅ Testing combined filters for budget-friendly family events.

1. The best fall activities in NYC to do with the arrival of Autumn
   Score: 0.338
   Free: True
   Baby-friendly: True
   Description: The ultimate guide to fall in NYC, from leaf-peeping and apple picking to jack o' lantern festivals ...

2. The 50 best things to do in NYC for locals and tourists
   Score: 0.273
   Free: True
   Baby-friendly: True
   Description: Every day, our staffers are eating, drinking, partying, gigging and generally appreciating their way...

3. FAD Market
   Score: 0.250
   Free: True
   Baby-friendly: True
   Description: Shop 'til you drop at FAD Market, a curated fashion, art and design pop-up marketplace, which is bac...

4. "Aunties" sculptures in Harlem
   Score: 0.246
   Free: True
   Baby-friendly: True
   Description: Three colorful figures are now brightening up the intersection of 124th Street and Lenox Avenue in H...

5. Reposo y

  results = qdrant_client.search(


In [31]:
# Test 5: Combined filters - Baby-friendly AND Free
query6 = "scary"
print(f"Query: '{query6}' (baby-friendly AND free AND indoors)\n")
print("✅✅✅ Testing combined filters for budget-friendly family events.\n")

results6 = search_events(query6, top_k=5, baby_friendly_only=True, free_only=True, indoor_or_outdoor='indoor')

if results6:
    for i, result in enumerate(results6, 1):
        print(f"{i}. {result.payload['title']}")
        print(f"   Score: {result.score:.3f}")
        print(f"   Free: {result.payload.get('is_free', 'N/A')}")
        print(f"   Baby-friendly: {result.payload['baby_friendly']}")
        print(f"   Indoor or outdoor: {result.payload['indoor_or_outdoor']}")
        print(f"   Description: {result.payload['description'][:100]}...\n")
else:
    print("No results found matching both criteria.")


Query: 'scary' (baby-friendly AND free AND indoors)

✅✅✅ Testing combined filters for budget-friendly family events.

1. "CHROMA: Tales Between Hues" immersive exhibit
   Score: 0.147
   Free: True
   Baby-friendly: True
   Indoor or outdoor: indoor
   Description: Leave the gray of the city behind and step into a colorful world of Korean folktales at Genesis Hous...

2. "Dress, Dreams and Desire: Fashion and Psychoanalysis" at The Museum at FIT
   Score: 0.135
   Free: True
   Baby-friendly: True
   Indoor or outdoor: indoor
   Description: In the year 2025, how we dress is still the highest form of free self expression—and the role that g...

3. The Big Bubble Experiment at NYSCI
   Score: 0.133
   Free: True
   Baby-friendly: True
   Indoor or outdoor: indoor
   Description: Beautiful, buoyant, beguiling bubbles are back at theNew York Hall of Science (NYSCI)in Queens. The ...

4. Canstruction
   Score: 0.120
   Free: True
   Baby-friendly: True
   Indoor or outdoor: indoor
   Descr

  results = qdrant_client.search(


## Summary & Next Steps

### ✅ Completed Tasks:

1. **Loaded raw data** from Notebook 1
2. **Extracted baby-friendly metadata** using GPT-4 Mini
   - Simple boolean flag: suitable for infants/toddlers
   - No mood or energy tags needed!
3. **Generated embeddings** using OpenAI text-embedding-3-small
   - 1536-dimensional vectors
   - Combined title + description for rich semantics
4. **Saved processed data**
   - CSV: `data/processed/events_with_metadata.csv`
   - Embeddings: `data/processed/embeddings.npy`
5. **Initialized Qdrant** vector database
   - Local mode (no server needed)
   - Cosine similarity search
6. **Uploaded all events** with metadata
7. **Tested retrieval** with multiple queries
   - General semantic search ✅
   - Baby-friendly filtering ✅
   - Mood/vibe queries ✅
8. **Created utility module** (`backend/vector_store.py`)

### 📊 Results:

- **Events processed:** Check cell output above
- **Baby-friendly events:** Check percentage above
- **Vector database:** Operational and ready
- **Search quality:** Semantic search captures nuance without explicit tags!

### 🎯 Next Steps:

**Move to Notebook 3:** Agentic RAG Pipeline
- Build 2-agent system (Retrieval + Response)
- Create FastAPI backend
- Implement LangSmith logging
- Test end-to-end queries

---

**✅✅✅ Key Insight:** By keeping metadata extraction simple (just baby-friendly), we saved significant time while still enabling powerful semantic search. The embeddings naturally capture mood, vibe, and context without explicit tagging!
