Skip to content

A Next.js application that uses Browserbase to visit websites and create embeddings of them to upsert to Qdrant.

Notifications You must be signed in to change notification settings

jdamiba/qdrant-browserbase-upsert

Repository files navigation

Pitchfork Album Reviews Vector Search

This project uses Browserbase and Stagehand to scrape Pitchfork album reviews, generate embeddings using OpenAI's text-embedding-3-small model, and store them in Qdrant for semantic search capabilities.

Features

  • Automated scraping of Pitchfork album reviews using Browserbase and Stagehand
  • Generation of embeddings for full review text using OpenAI's text-embedding-3-small model
  • Storage of reviews and embeddings in Qdrant for vector similarity search
  • Unique UUID-based identification for each review
  • Browser-like behavior to avoid detection
  • Error handling and graceful failure recovery

Prerequisites

  • Node.js (v18 or higher)
  • npm or yarn
  • OpenAI API key
  • Browserbase API key and Project ID
  • Qdrant API key and URL

Environment Variables

Create a .env file in the root directory with the following variables:

BROWSERBASE_API_KEY=your_browserbase_api_key
BROWSERBASE_PROJECT_ID=your_browserbase_project_id
OPENAI_API_KEY=your_openai_api_key
QDRANT_URL=your_qdrant_url
QDRANT_API_KEY=your_qdrant_api_key

Installation

  1. Clone the repository:
git clone <repository-url>
cd pitchfork-reviews-vector-search
  1. Install dependencies:
npm install

Usage

  1. Start the development server:
npm run dev
  1. Open your browser and navigate to http://localhost:3000

  2. Click the "Run Stagehand" button to start the review collection process

The application will:

  • Navigate to predefined Pitchfork album review URLs
  • Extract the full review content
  • Generate embeddings for the review text
  • Store the reviews and embeddings in Qdrant

Project Structure

  • app/api/stagehand/main.ts - Main scraping and processing logic
  • app/lib/qdrant.ts - Qdrant client and collection management
  • app/page.tsx - Frontend interface
  • stagehand.config.ts - Browserbase and Stagehand configuration

Adding More Reviews

To add more reviews, update the ALBUM_REVIEW_URLS array in app/api/stagehand/main.ts:

const ALBUM_REVIEW_URLS = [
  "https://pitchfork.com/reviews/albums/your-review-url-1/",
  "https://pitchfork.com/reviews/albums/your-review-url-2/",
  // Add more URLs as needed
];

Technologies Used

License

MIT

About

A Next.js application that uses Browserbase to visit websites and create embeddings of them to upsert to Qdrant.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published