PDF Scraper App

What You Need to Run the Project

Prerequisites

Node.js 20.9+ (Due to Next.js 16)
PostgreSQL database (via Supabase)
OpenAI API key
Supabase account
Stripe account

Environment Variables Required

# Database
DATABASE_URL="your-supabase-database-url"
DIRECT_URL="your-supabase-database-url"

# NextAuth
NEXTAUTH_SECRET="your-nextauth-secret-here"
NEXTAUTH_URL="http://localhost:3000"

# OpenAI (required for resume parsing)
OPENAI_API_KEY="your-openai-api-key-here"

# Supabase (required for image storage)
NEXT_PUBLIC_SUPABASE_URL="your-supabase-project-url"
NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY="your-supabase-publishable-key"

# Stripe
STRIPE_SECRET_KEY="sk_test_..."
STRIPE_WEBHOOK_SECRET="whsec_..."
STRIPE_PRICE_BASIC="price_..."
STRIPE_PRICE_PRO="price_..."
STRIPE_PUBLIC_KEY="pk_test_..."

PDF Processing Pipeline

The application follows a sophisticated pipeline to handle PDF processing:

1. Client-Side PDF Conversion

Technology: PDF.js worker (public/scripts/pdfjs-worker.js)
Process: PDF pages are converted to images directly in the browser
Benefits: Client-side processing ensures privacy and reduces server load

2. Image Upload to Supabase

Storage: Images are uploaded to Supabase Storage bucket (pdf-scraper)
Security: Temporary signed URLs (10-minute expiration) are generated
Purpose: Secure access to uploaded images for processing

3. Signed URL Generation

Method: Supabase client generates signed URLs with 10-minute expiration
Security: Prevents unauthorized access while allowing temporary processing access
Usage: URLs are used for OpenAI API calls without exposing permanent links

4. OpenAI Vision Processing

API: OpenAI GPT-4o vision model
Input: Signed image URLs are sent to OpenAI for structured data extraction
Output: JSON data following the exact schema specified in the assignment

5. Database Storage

Location: Extracted resume data stored in Supabase database
Structure: JSON data stored under authenticated user accounts
History: Complete processing history maintained for each user

Supabase Bucket Configuration

Creating the Storage Bucket

Create a new Supabase project at supabase.com
Navigate to Storage in your Supabase dashboard
Create a new bucket named pdf-scraper

Setting Up Policies

Enable Row Level Security (RLS) in the bucket settings and create the following policies:

Policy 1: Public Reading Access

CREATE POLICY "Public read access" ON storage.objects
FOR SELECT USING (bucket_id = 'pdf-scraper');

Policy 2: Public Upload Access

CREATE POLICY "Public upload access" ON storage.objects
FOR INSERT WITH CHECK (bucket_id = 'pdf-scraper');

Quick Start

Clone the repository:

git clone <your-github-repo-url>
cd pdf-scraper

Install dependencies:

npm install

Set up Supabase (see bucket configuration above)
Configure environment variables in .env.local
Set up the database:

npx prisma migrate dev
npx prisma generate

Run the development server:

npm run dev

Open http://localhost:3000 with your browser.

Stripe Setup (Test Mode)

1. Create a Stripe Test Account

Go to stripe.com and create a new account
Switch to Test Mode in the dashboard (toggle in the top right)
Note: All operations must remain in test mode - no live transactions

2. Get API Keys

Navigate to Developers > API keys in your Stripe dashboard
Copy your Publishable key (starts with pk_test_)
Copy your Secret key (starts with sk_test_)

3. Create Products and Prices

Go to Products in your Stripe dashboard
Create two products:

Basic Plan
- Name: "Basic Plan"
- Price: $10/month
- Description: "Default entry plan"
Pro Plan
- Name: "Pro Plan"
- Price: $20/month
- Description: "Higher limit for advanced users"
Note the Price IDs (start with price_) for each plan

4. Set Up Webhooks

Go to Developers > Webhooks
Add endpoint: https://your-domain.com/api/webhooks/stripe
Select events:
- invoice.payment_succeeded
- customer.subscription.updated
- customer.subscription.deleted
Copy the Webhook signing secret (starts with whsec_)

5. Environment Variables

Add these to your .env.example file:

# Stripe Configuration
STRIPE_SECRET_KEY="sk_test_your_secret_key_here"
STRIPE_WEBHOOK_SECRET="whsec_your_webhook_secret_here"
STRIPE_PRICE_BASIC="price_your_basic_price_id"
STRIPE_PRICE_PRO="price_your_pro_price_id"
STRIPE_PUBLIC_KEY="pk_test_your_publishable_key_here"

Subscription and Upgrade Instructions

Credit System Overview

Cost per resume: 100 credits
Basic Plan: 10,000 credits ($10/month)
Pro Plan: 20,000 credits ($20/month)

For Users

Free (Initial): New users start with 0 credits
Subscribe: Go to /dashboard/settings and click "Subscribe to Basic Plan"
Upgrade: Click "Upgrade to Pro Plan" to get 20,000 credits
Manage Billing: Use "Manage Billing" to access Stripe Customer Portal

For Developers

Credits are automatically added when subscriptions are created/renewed
Webhooks handle real-time credit updates
All subscription events are logged for debugging

Credit Consumption During PDF Scraping

How Credits Work

Every time a user uploads and processes a resume, the system:

Pre-Check: Verifies user has ≥100 credits before starting OpenAI processing
Processing: Calls OpenAI API and extracts structured data
Deduction: Subtracts exactly 100 credits from user's balance
Storage: Saves extracted JSON data to database

Insufficient Credits Flow

If a user has fewer than 100 credits:

Upload is blocked with friendly error message
Toast notification suggests upgrading subscription
Redirect to settings page offered

Credit Balance Display

Real-time credit balance shown in dashboard header
Settings page shows current plan and remaining credits
Automatic credit top-ups on subscription renewal

Example Usage

User uploads resume (500KB PDF)
→ System checks: 200 credits available ✓
→ PDF converted to images via PDF.js
→ Images uploaded to Supabase storage
→ Signed URLs generated (10min expiry)
→ OpenAI processes images → structured JSON
→ 100 credits deducted (200 → 100 remaining)
→ Resume data stored in database
→ Success toast + history updated

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
app		app
components		components
lib		lib
prisma		prisma
public/scripts		public/scripts
types		types
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
proxy.ts		proxy.ts
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

PDF Scraper App

What You Need to Run the Project

Prerequisites

Environment Variables Required

PDF Processing Pipeline

1. Client-Side PDF Conversion

2. Image Upload to Supabase

3. Signed URL Generation

4. OpenAI Vision Processing

5. Database Storage

Supabase Bucket Configuration

Creating the Storage Bucket

Setting Up Policies

Policy 1: Public Reading Access

Policy 2: Public Upload Access

Quick Start

Stripe Setup (Test Mode)

1. Create a Stripe Test Account

2. Get API Keys

3. Create Products and Prices

4. Set Up Webhooks

5. Environment Variables

Subscription and Upgrade Instructions

Credit System Overview

For Users

For Developers

Credit Consumption During PDF Scraping

How Credits Work

Insufficient Credits Flow

Credit Balance Display

Example Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages