A modern full-stack web scraping application built with Next.js, designed to extract, analyze, and export website data effortlessly.
It offers server-side HTML parsing, intelligent data extraction, and multi-format export capabilities — all within a beautiful and responsive interface.
- 🌐 Universal Web Scraping – Extract structured data from any publicly accessible website
- 📊 Structured Data Extraction – Parse headings, paragraphs, links, images, tables, and metadata automatically
- 💾 Multi-Format Export – Download data as JSON, CSV, or Excel files
- 🎯 Intelligent URL Resolution – Automatically converts relative URLs into absolute paths
- ⚡ Real-Time Processing – Instant feedback with progress indicators and loading states
- 🎨 Modern UI – Responsive, minimal, and dark-mode ready (built with TailwindCSS)
- 🛡️ Ethical Scraping – Built-in rate limiting and User-Agent rotation
- 📱 Mobile Friendly – Works seamlessly on all screen sizes
- Next.js 14 – React framework with App Router
- TypeScript – Type-safe development
- TailwindCSS – Utility-first CSS framework
- Lucide React – Icon system
- Shadcn/ui – Reusable UI components
- Next.js API Routes – Serverless endpoints
- Cheerio – Fast HTML parser
- XLSX – Excel file generator
- Node.js 18+
- npm or yarn
# Clone repository
git clone https://github.com/yourusername/web-scraper-pro.git
cd web-scraper-pro
# Install dependencies
pnpm install
# or
npm install
# Run development server
pnpm run dev
# or
npm devThen, open http://localhost:3000 in your browser.
- Enter a website URL (e.g.,
https://example.com) - Click "Scrape Website"
- Wait for completion and view organized data
- Export results as JSON
Scrapes a website and returns structured data.
Request Body
{ "url": "https://example.com" }Response
{
"url": "https://example.com",
"title": "Example Domain",
"description": "Example website description",
"headings": ["Heading 1", "Heading 2"],
"paragraphs": ["Paragraph text..."],
"links": [{ "text": "Link text", "href": "https://example.com/link" }],
"images": ["https://example.com/image.jpg"],
"tables": [{ "headers": ["Col 1", "Col 2"], "rows": [["Data 1", "Data 2"]] }]
}Error Response
{ "error": "Failed to scrape website: HTTP 404" }web-scraper-pro/
├── app/
│ ├── api/
│ │ └── scrape/
│ │ └── route.ts # API endpoint
│ ├── layout.tsx # Root layout
│ └── page.tsx # Home page
├── components/
│ ├── ui/ # UI components
│ ├── data-display.tsx # Data visualization
│ ├── footer.tsx # Footer
│ └── url-form.tsx # URL input form
├── lib/
│ └── utils.ts # Utility functions
├── public/ # Static assets
├── package.json
├── tailwind.config.ts
├── tsconfig.json
└── README.md
This project is for educational purposes only. Please follow ethical scraping practices:
✅ Scrape only public data
✅ Respect robots.txt and site Terms of Service
✅ Implement rate limiting
❌ Do not scrape personal/sensitive data
❌ Do not bypass authentication or paywalls
❌ Do not republish copyrighted content
Disclaimer: You are responsible for ensuring compliance with all applicable laws.
- User-Agent headers for scraper requests
- Graceful error handling
- Configurable rate limiting
- Content size protection
- URL validation and normalization
Licensed under the MIT License. See the LICENSE file for details.
Sandip Singha
- GitHub: @myselfsandip
⭐ If you found this project helpful, please give it a star!