Skip to content

soroushj/image-dataset-loader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image-dataset-loader: Load image datasets as NumPy arrays

PyPI MIT license

Installation

pip install image-dataset-loader

Overview

Suppose you have an image dataset in a directory which looks like this:

data/
  train/
    cats/
      cat0001.jpg
      cat0002.jpg
      ...
    dogs/
      dog0001.jpg
      dog0002.jpg
      ...
  test/
    cats/
      cat0001.jpg
      cat0002.jpg
      ...
    dogs/
      dog0001.jpg
      dog0002.jpg
      ...

You can use the image_dataset_loader.load function to load this dataset as NumPy arrays:

import image_dataset_loader

(x_train, y_train), (x_test, y_test) = image_dataset_loader.load('path/to/data', ['train', 'test'])

The shape of the x_* arrays will be (instances, rows, cols, channels) for color images and (instances, rows, cols) for grayscale images. Also, the shape of the y_* arrays will be (instances,).

All images in the dataset must have the same shape. Also, all data subsets (i.e., train and test in this example) must contain the same set of classes. Class names will be sorted alphabetically. So, in this example, cats and dogs will be represented by 0 and 1, respectively.

You can also load a single data subset. For example:

(x_train, y_train), = image_dataset_loader.load('path/to/data', ['train'])

Note that the comma after (x_train, y_train) is required, because the function always returns a tuple of tuples.

API

load(dataset_path, set_names,
     shuffle=True, seed=None,
     x_dtype='uint8', y_dtype='uint32')
  • dataset_path: Path to the dataset directory.
  • set_names: List of the data subsets (subdirectories of the dataset directory).
  • shuffle: Whether to shuffle the samples. If false, instances will be sorted by class name and then by file name.
  • seed: Random seed used for shuffling (see the docs).
  • x_dtype: NumPy data type for the X arrays (see the docs).
  • y_dtype: NumPy data type for the Y arrays (see the docs).
  • Returns a tuple of (x, y) tuples corresponding to set_names.