Skip to content

quqixun/ImageCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Crawler

Crawl images from baidu, bing, google (BBG)
according to keyword using browser.

1. Install Dependencies

1.1 Create virtual environment using conda.
Only test in Python 3.6 and 3.7.

conda create --name crawler python=3.7
source activate crawler

1.2 Download browser driver for selenium.
Find and download the driver of the browser you use.
https://www.seleniumhq.org/download/

1.3 Add the driver's folder's path to SYSTEM PATH.

2. Get Code

Clone the repo and install packages.

git clone https://github.com/quqixun/ImageCrawler.git
cd crawler
pip install -r requires.txt

Run test code to crawl my favourite pandas from BBG.

cd src
python panda.py

3. Explaination

  • ImageCrawler: obtain HTTP path of desired images;
  • ImageDownloader: download images to local directory.
import os

from image_crawler import ImageCrawler
from image_downloader import ImageDownloader

4 parameters for ImageCrawler and ImageDownloader:

  • keyword: you know, the keyword;
  • n_scroll: number of scrolling in brower;
  • link_save_dir: holds all links of images;
  • image_save_dir: where to find all cute pandas.
n_scroll = 5
keyword = 'panda'

link_save_dir = os.path.join('../data/links', keyword)
image_save_dir = os.path.join('../data/images', keyword)

Crawl images' links using Baidu

engine = 'baidu'
baidu_links_name = 'baidu_links.csv'

baidu_ic = ImageCrawler(engine)
baidu_ic.run(keyword, n_scroll)
baidu_ic.save_links(link_save_dir, baidu_links_name)

Crawl images' links using Bing

engine = 'bing'
bing_links_name = 'bing_links.csv'

bing_ic = ImageCrawler(engine)
bing_ic.run(keyword, n_scroll)
bing_ic.save_links(link_save_dir, bing_links_name)

Crawl images' links using Google

engine = 'google'
google_links_name = 'google_links.csv'

google_ic = ImageCrawler(engine)
google_ic.run(keyword, n_scroll)
google_ic.save_links(link_save_dir, google_links_name)

Download images

ider = ImageDownloader(link_save_dir)
ider.run(image_save_dir)

4. Panda

Here is a panda we found.

About

Crawl images from baidu, bing, google according to keyword using browser.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages