# Unsplash Dataset Creator

![](https://drive.google.com/uc?export=view&id=1TR9ydn_Kw180cQzP-LDvT52VwtNyxgr6)

This *Jupyter Notebook* creates a dataset containing locally (on a hard drive) stored images, downloaded from the official [Unsplash Dataset](https://unsplash.com/data). It can be used with the *Lite* and the *Full* version, but the official dataset has to be downloaded or requested separately.

Import `DatasetCreator` from `dataset_creation_helpers.py`.

In [1]:
from dataset_creation_helpers import DatasetCreator

A new instance of `DatasetCreator` is created and the `photos.tsv000` file from the official [Unsplash Dataset](https://unsplash.com/data) are loaded.

>| Parameter    | Description |
>|--------------|-------------|
>|path_in       | Path to the official [Unsplash Dataset](https://unsplash.com/data).
>|path_out      | Indicates where the downloaded dataset should be stored.
>|sizes         | List with the resolutions the images should be downloaded (All the images are squares).
>|dataset_size  | Amount of images the user wants to use in a future project.
>|download_Ratio| Ratio between the images, who are downloaded in `download_images()` and those who are just stored in `cache.csv`.

In [2]:
dataset_creator = DatasetCreator('D:\\UNSPLASH Dataset Lite', 'D:\\UNSPLASH Dataset', sizes = (1024, 512, 256, 128, 64, 32), dataset_size=25e3)
dataset_creator.load_dataframe()

As many *photo_image_urls* as specified in `dataset_size` are loaded(`search_images()`) and then downloaded(`download_images()`).\
<sup>*Note: The search_images() function might be updated, so that it is possible to narrow down the download with keywords.*<sup>

In [None]:
dataset_creator.search_images()
dataset_creator.download_images()

In `save_cache()` all the image URLs will be stored in `cache.csv`. Then with `set_infos()` the user can specify additional information which will be stored in a `README.md` file. This file with be created through the call of `make_README()`.

In [None]:
dataset_creator.save_cache()

dataset_creator.set_infos(dataset_type='Unsplash Dataset Lite', author_name='Lukas Hueglin')
dataset_creator.make_README()