Skip to content

nexvou/scripts

Repository files navigation

🛒 Nexvou Scraper

A powerful, scalable, and production-ready coupon scraping system for major e-commerce platforms in Indonesia.

🚀 Features

  • Multi-Platform Support: Shopee, Tokopedia, Lazada, Blibli, Traveloka, Grab
  • Smart Fallback System: HTTP scraping → Browser scraping → Mock data
  • Multiple Database Support: MySQL, PostgreSQL, SQLite
  • Robust Browser Support: Puppeteer with Brave/Chrome integration
  • Vercel Ready: Optimized for serverless deployment
  • Scalable Architecture: Clean, modular, and easy to extend
  • Production Grade: Error handling, logging, monitoring

📁 Project Structure

├── config/              # Configuration files
├── core/                # Core system components
│   ├── BrowserManager.js    # Browser automation
│   ├── ScraperManager.js    # Scraping orchestration
│   └── SQLiteManager.js     # Database management
├── scrapers/            # Platform-specific scrapers
├── utils/               # Utility functions
├── api/                 # Vercel API endpoints
├── migrations/          # Database migrations
├── tests/              # Test files
├── docs/               # Documentation
└── index.js            # Main entry point

🛠️ Tech Stack

  • Runtime: Node.js (compatible with Bun)
  • Language: JavaScript (CommonJS)
  • Database: SQLite (primary), MySQL, PostgreSQL support
  • Browser: Puppeteer with Brave Browser integration
  • Deployment: Vercel Serverless
  • Scraping: HTTP-first with browser fallback
  • Monitoring: Winston logging with structured output

🚀 Quick Start

  1. Clone & Install

    git clone <repo>
    cd scripts
    npm install
  2. Configure Environment

    cp .env.example .env
    # Edit .env with your database credentials (optional for SQLite)
  3. Run Database Setup

    npm run migrate
  4. Start Scraping

    # Run all scrapers
    npm start
    
    # Run single platform
    npm start -- --single shopee
    
    # Run tests
    npm start -- --test
    
    # Run with scheduling
    npm start -- --schedule
  5. Deploy to Vercel

    vercel deploy

📊 Database Schema

Tables

  • platforms - E-commerce platform configurations
  • merchants - Merchant/brand information
  • coupons - Coupon and promotion data
  • scrape_sessions - Scraping session logs
  • scrape_metrics - Performance metrics

🔧 Configuration

Database Adapters

  • MySQL: mysql://user:pass@host:port/db
  • PostgreSQL: postgresql://user:pass@host:port/db
  • SQLite: sqlite://./data/coupons.db
  • Supabase: supabase://project:key@api.supabase.co

Scraping Configuration

  • Refresh interval: 15 minutes (configurable in config/scraper.config.js)
  • Platform delays: 2 seconds between platforms
  • Timeout: 15-30 seconds per platform
  • Retry attempts: 2 for browser, fallback to mock data
  • Smart fallback: HTTP → Browser → Mock data

📈 Current Status

Working Platforms ✅

  • Shopee: 9 items per run (HTTP scraping)
  • Lazada: 6 items per run (HTTP scraping)
  • Traveloka: 3 items per run (HTTP scraping)

Mock Data Platforms 🎭

  • Tokopedia: 15 items per run (smart mock data)
  • Blibli: 15 items per run (smart mock data)
  • Grab: 10 items per run (smart mock data)

Performance Metrics

  • Total items per run: ~58 items
  • Execution time: ~3-4 minutes
  • Success rate: 100% (with fallbacks)
  • Browser compatibility: Brave, Chrome, Puppeteer Chrome

🔒 Anti-Detection Features

  • User agent rotation: 5 different realistic user agents
  • Smart delays: Random delays between requests
  • Stealth mode: Removes automation indicators
  • Browser fingerprinting: Mimics real browser behavior
  • Fallback system: HTTP → Browser → Mock data
  • Rate limiting: Configurable delays between platforms

💻 Command Line Usage

# Run all scrapers
node index.js

# Run single platform
node index.js --single shopee
node index.js --single tokopedia

# Run system tests
node index.js --test

# Run with scheduling (cron)
node index.js --schedule

# Show help
node index.js --help

📝 API Endpoints

  • GET /api/coupons - Get all active coupons
  • GET /api/coupons/platform/:platform - Get coupons by platform
  • GET /api/metrics - Get scraping metrics
  • POST /api/scrape/trigger - Trigger manual scrape

See API Documentation for detailed endpoint documentation.

🧪 Testing

npm test                # Run all tests
npm run test:unit       # Unit tests only
npm run test:integration # Integration tests
npm run test:e2e        # End-to-end tests

# Manual testing
node index.js --test    # Test all scrapers
node test-browser.js    # Test browser setup

📚 Documentation

🤝 Contributing

  1. Fork the repository
  2. Create feature branch
  3. Add tests for new features
  4. Submit pull request

📄 License

MIT License - see LICENSE file for details.

About

A powerful, scalable, and production-ready coupons scraping system for major e-commerce platforms.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors