Web Scraping Suite in Go

This project is a comprehensive web scraping toolkit developed in Go, leveraging the Colly framework for efficient and concurrent web scraping operations.

Features

E-commerce Product Scraper (scrapper1.go)
- Scrapes product information from an e-commerce website
- Handles pagination automatically
- Saves data to a CSV file
G2.com Review Scraper(scrapper2.go)
- Attempts to scrape reviews from G2.com
- Utilizes proxy support for enhanced anonymity
ZenRows API Integration(scrapper3.go)
- Fetches and saves HTML content from G2.com using the ZenRows API
- Demonstrates integration with third-party services for web scraping
Parallel Scraping(scrapper4.go)
- Implements concurrent scraping of multiple pages
- Showcases Go's powerful concurrency features

Technologies Used

Go programming language
Colly web scraping framework
Standard Go libraries: encoding/csv, log, os, sync, net/http, io

How to Run

Ensure you have Go installed on your system.

Clone this repository:

git clone https://github.com/your-username/web-scraping-suite.git

Navigate to the project directory:
```
cd web-scraping-suite
```
Install dependencies:
```
go mod tidy
```

Run the desired scraper:

go run ecommerce_scraper.go
go run g2_review_scraper.go
go run zenrows_scraper.go
go run parallel_scraper.go

Note: Make sure to replace any API keys or proxies with your own before running the scripts.

Potential Enhancements

Implement more robust error handling and logging
Add command-line arguments for flexible configuration
Develop a unified interface to select and run different scrapers
Incorporate database storage for scraped data
Implement rate limiting to respect website terms of service
Add unit tests for each scraper function
Create a web interface for easy management and visualization of scraped data

Disclaimer

This project is for educational purposes only. Always respect website terms of service and robots.txt files when scraping. Ensure you have permission to scrape any website before doing so.

Contributing

Contributions, issues, and feature requests are welcome. Feel free to check issues page if you want to contribute.

graph TD
    A[📁 Source Code Repository] --> B[🔍 Semgrep SAST Scan]
    B --> C[⚠️ Raw Vulnerability Findings<br/>~750 alerts]
    
    C --> D[🧠 AI-SAST Processing Pipeline]
    
    D --> E[🔍 Stage 1: Dead Code Detection]
    E --> F[🔗 Stage 2: Context Extraction]
    F --> G[🤖 Stage 3: LLM Analysis]
    G --> H[📊 Stage 4: Smart Classification]
    
    H --> I[🔴 Must Fix<br/>Critical & Reachable]
    H --> J[🟡 Good to Fix<br/>Minor & Reachable] 
    H --> K[⚪ False Positive<br/>Dead Code & Safe Patterns]
    
    I --> L[🚨 Priority Alert to Developer]
    J --> M[📋 Backlog for Security Review]
    K --> N[🗑️ Filtered Out]
    
    style A fill:#e1f5fe
    style D fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
    style I fill:#ffebee,stroke:#f44336,stroke-width:2px
    style J fill:#fff3e0,stroke:#ff9800,stroke-width:2px
    style K fill:#f1f8e9,stroke:#4caf50,stroke-width:2px

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
amazon-scrapper.go		amazon-scrapper.go
amazonproducts.csv		amazonproducts.csv
products.csv		products.csv
response_body.txt		response_body.txt
scrapper.go		scrapper.go
scrapper2.go		scrapper2.go
scrapper3.go		scrapper3.go
scrapper4.go		scrapper4.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraping Suite in Go

Features

Technologies Used

How to Run

Potential Enhancements

Disclaimer

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

G1r00t/web-scrapper-go

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Suite in Go

Features

Technologies Used

How to Run

Potential Enhancements

Disclaimer

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages