d2gscp is a command-line tool for scraping images from DuckDuckGo, designed to help in training AI models by collecting categorized images efficiently.
- Supports categorized searches using
category=term1,term2format. - Saves images in structured folders.
- Uses Selenium and DuckDuckGo for efficient scraping.
- Command-line interface for easy usage.
pip install -r requirements.txtRun the following command to download images:
python main.py "cars=tesla,ferrari" 50This will download 50 images each for "tesla" and "ferrari" inside a "cars" folder.
If no categories are provided, images are stored directly:
python main.py "dog,cat" 30This will download 30 images each for "dog" and "cat" in separate folders.
To disable subfolders inside categories:
python main.py "cars=tesla,ferrari" 50 --no-subfoldersThis will save all images inside the "cars" folder without subdirectories.
You can also use DuckImg Scraper in a Python script:
from dockimg import download_images
download_images("fruits=apple,banana", 20, use_subfolders=False)Image-classification - https://github.com/Sem-Segredos-Tech/image-classification
DuckDuckGo may temporarily block repeated scraping requests. To mitigate this:
- Use a delay between requests.
- Utilize proxy services.
- Avoid running large batches in a short time.
Happy Scraping! 🦆