The AI-powered way to fetch web data
Grabbit is a modern web scraping application that combines artificial intelligence with traditional extraction methods to make it easy to extract business information from Google search results and custom websites. Built with Next.js and featuring a clean, minimalist interface inspired by modern design principles.
- Search for businesses by type and location
- Extract business names, categories, addresses, and basic contact info
- Paginated results with smooth navigation
- Intelligent retry logic for rate limiting
- Scrape multiple websites simultaneously
- Extract emails and phone numbers from business websites
- Support for contact page discovery
- Batch processing with concurrency limits
- Smart Business Detection: AI identifies and extracts business information with high accuracy
- Intelligent Contact Discovery: Advanced algorithms find hidden contact information
- Context-Aware Processing: AI understands business context for better extraction
- Fallback System: Traditional regex-based extraction when AI is unavailable
- Email Discovery: Finds emails in page content, mailto links, and contact pages
- Phone Number Extraction: Supports multiple formats (US, international, mobile)
- Smart Filtering: Removes placeholder and invalid contact information
- Duplicate Detection: AI-powered similarity matching to remove duplicates
- Export to CSV, JSON, or TXT formats
- Customizable field selection
- Filter options (e.g., businesses without websites)
- Bulk download capabilities
- Minimalist interface with sage green color scheme
- Geist Mono typography for technical aesthetic
- Outline-only design with no shadows or backgrounds
- Responsive layout for all devices
- Node.js 18+
- npm, yarn, pnpm, or bun
- OpenAI API key (optional, for AI features)
-
Clone the repository ```bash git clone https://github.com/madegit/grabbit.git cd grabbit ```
-
Install dependencies ```bash npm install
yarn install
pnpm install ```
-
Set up environment variables (optional) ```bash
OPENAI_API_KEY=your_openai_api_key_here ```
-
Run the development server ```bash npm run dev
yarn dev
pnpm dev ```
-
Open your browser Navigate to http://localhost:3000
- Framework: Next.js 14 (App Router)
- Language: TypeScript
- Styling: Tailwind CSS
- UI Components: shadcn/ui
- Web Scraping: Cheerio
- AI Integration: OpenAI GPT-4
- Icons: Lucide React
- Font: Geist Mono
- Enter Business Type: e.g., "wedding photographers", "restaurants"
- Add Location (optional): e.g., "New York", "London"
- Click Search: Grabbit will fetch results from Google
- Extract Contacts: Use the "Extract Emails" and "Extract Phones" buttons
- Export Data: Choose your preferred format and download
- Enter Website URLs: One per line (http/https optional)
- Add Context (optional): Business type and location for better AI extraction
- Click Scrape: Grabbit will process all websites with AI assistance
- Review Results: Check extracted business information
- Export Data: Download in your preferred format
Grabbit uses OpenAI's GPT-4 to:
- Understand Context: Analyzes website content to identify business information
- Extract Structured Data: Converts unstructured text into organized business profiles
- Validate Information: Ensures extracted data is accurate and properly formatted
- Handle Edge Cases: Processes complex layouts and non-standard formats
- Hybrid Approach: AI-first with traditional regex fallback
- Graceful Degradation: Works even without OpenAI API key
- Cost Optimization: Efficient token usage and caching
- Error Recovery: Automatic retry with different strategies
Grabbit includes intelligent retry logic for Google searches:
- Retry Attempts: 3 attempts with exponential backoff
- Backoff Delays: 2s β 5s β 10s
- Concurrency Limits: 3 websites processed simultaneously
- Model: GPT-4 for optimal accuracy
- Token Optimization: Efficient prompt engineering
- Caching: Results cached to reduce API calls
- Fallback: Traditional extraction when AI unavailable
- Email Patterns: Supports standard email formats with spam filtering
- Phone Patterns: US, international, and mobile number formats
- Timeout Limits: 15s for main pages, 10s for contact pages
``` grabbit/ βββ app/ β βββ actions.ts # Server actions for Google search β βββ ai-enhanced-extractor.ts # AI-powered extraction logic β βββ email-extractor.ts # Email extraction logic β βββ phone-extractor.ts # Phone number extraction β βββ custom-website-scraper.ts # Custom website scraping β βββ data-validator.ts # Data validation utilities β βββ duplicate-detector.ts # AI-powered duplicate detection β βββ cache-manager.ts # Intelligent caching system β βββ types.ts # TypeScript interfaces β βββ globals.css # Global styles β βββ layout.tsx # Root layout β βββ page.tsx # Main application β βββ api/ β βββ scrape-emails/ # Email extraction API β βββ scrape-phones/ # Phone extraction API β βββ scrape-custom-websites/ # Custom scraping API βββ components/ β βββ ui/ # shadcn/ui components β βββ export-dialog.tsx # Export functionality β βββ footer.tsx # Footer component βββ README.md ```
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- Use TypeScript for all new code
- Follow the existing code style and formatting
- Add appropriate error handling
- Test your changes thoroughly
- Update documentation as needed
- Consider AI integration opportunities
- Respect robots.txt: Always check website scraping policies
- Rate Limiting: Don't overwhelm servers with requests
- Data Usage: Only use scraped data for legitimate purposes
- Privacy: Handle personal information responsibly
- Terms of Service: Comply with website terms and conditions
- AI Ethics: Use AI responsibly and transparently
429 Rate Limiting Errors
- Grabbit automatically retries with backoff
- If persistent, wait a few minutes before trying again
No Results Found
- Try different search terms or locations
- Check if the business type is too specific
AI Extraction Fails
- Verify OpenAI API key is set correctly
- Check API quota and billing status
- Traditional extraction will be used as fallback
Email/Phone Extraction Fails
- Some websites block automated access
- Contact pages may require JavaScript rendering
- AI extraction may help with complex layouts
Export Not Working
- Ensure you have selected at least one field
- Check that there are results to export
- Provide Context: Add business type and location for better AI understanding
- Quality URLs: Use direct business website URLs rather than directory listings
- Batch Processing: Process multiple similar businesses together for consistency
- Contact Pages: Many businesses have dedicated contact pages with more information
- About Pages: Often contain business details and contact information
- Footer Sections: Check website footers for contact details
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI: For providing powerful AI capabilities
- Design Inspiration: Modern mobile app interfaces
- shadcn/ui: Beautiful UI components
- Vercel: Hosting and deployment platform
- Next.js Team: Amazing React framework
If you encounter any issues or have questions:
- Check the Issues page
- Create a new issue with detailed information
- Include error messages and steps to reproduce
Made with π€ AI + β€οΈ by the Grabbit team
Grabbit - The AI-powered way to fetch web data π°β¨