Skip to content

glebtk/Yandex-Images-Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yandex Images Parser

This is a simple parser for Yandex Images. It allows searching by text query or image.

When searching, you can specify parameters such as:

  • Size
  • Orientation
  • Number of images
  • Type (photo, clipart, etc.)
  • Color (colorful, b/w, red, orange, etc.)
  • Format (jpg, png, gif)
  • Site

Delays between requests are automatically randomized in a range of +-15%.

Since Selenium is used for searching, there is no limit of 30 or 300 images in this parser.

It requires installation of the Mozilla Firefox browser!

Contents


Technologies

Getting Started

  1. Clone the repository:
$ git clone https://github.com/glebtk/yandex_images_parser.git
  1. Before using, you need to install the project requirements:
$ pip install -r requirements.txt
  1. Ensure that all requirements are successfully installed.

  2. Ensure that Mozilla Firefox is installed.

  3. To test the functionality, you can run example.py.

Usage Examples

Let's start by creating an instance of the parser class:

from yandex_images_parser import Parser

parser = Parser()

Search

  1. Let's say we want to find one cat image. Let's do it!
# Call the "query_search" function - search by query:
#   the "query" parameter contains the text query
#   the "limit" parameter defines the desired number of images

one_cat = parser.query_search(query="cat", limit=1)

# Since the query_search function returns a list, we will extract the zero-th element:
one_cat_url = one_cat[0]

Done! Cat is here:

Really a cat

  1. Let's find 10 similar cat images using the image_search function:
# Call the "image_search" function - search by image:
#   pass the link to the found image through the "url" parameter
#   set limit to 10

similar_cats = parser.image_search(url=one_cat_url, limit=10)

The search result is a list of url to similar cats:

Еще коты

  1. In addition to the limit parameter, you can use parameters such as:
  • delay - the delay time between requests (in seconds)
  • size - the size of the images
  • orientation - the orientation of the images
  • image_type - the type of the images (photo, illustration, etc.)
  • color - color
  • image_format - the format of the images (jpg, png, gif)
  • site - the site where the images are located

For example, if you need to find 128 paintings of famous painters in png format, use this code:

paintings = parser.query_search(query="paintings of famous painters",
                                limit=128,
                                image_format=parser.format.png)

And this code finds 30 b/w face images, with a vertical orientation, medium size, and jpg format.

faces = parser.query_search(query="face",
                            limit=30,
                            size=parser.size.medium,
                            color=parser.color.gray,
                            image_type=parser.image_type.face,
                            image_format=parser.format.jpg,
                            orientation=parser.orientation.vertical)

Cleaning Results

Sometimes, during a complex search, the results may contain duplicate images (with the same URL). To remove such URLs in advance, there is a special function called remove_duplicates() in utils.py.

Import it from utils:

from utils import remove_duplicates

Remove duplicate URLs from the paintings list:

paintings = remove_duplicates(paintings)

Saving Images

Import the save_images() function from utils:

from utils import save_images

We will pass to the function a list of urls and the path by which we want to save the images:

save_images(urls=paintings, dir_path="./images/paintings")

Done!

Sources

Contact Information

If you have any suggestions or feedback, feel free to contact me by email or via telegram!

Mail E-mail: tutikgv@gmail.com

Telegram Telegram: https://t.me/glebtutik

About

Simple and flexible image parser from Yandex Images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages