# 🗂 Load Unpaired Image Data with the Unsplash API

Documentation for Unplash [link](https://unsplash.com/documentation#getting-started)

Python with Unsplash [tutorial](https://dev.to/okeeffed/unsplash-api-with-python-3p9p)

Documentation for [pyunsplash library](https://pyunsplash.readthedocs.io/en/latest/)

NOTE: DON'T run these cells many times, there is a rate limit on our Unsplash Account

In [None]:
import os
from os.path import join, dirname, abspath
from dotenv import load_dotenv
from pyunsplash import PyUnsplash
import requests
from PIL import Image

### Load environment variables

Probably not a great idea to have the access key in plaintext in a git-tracked file, but it's just a demo

Annamira's access key: `soccs2fmeCHj1lY2rae7-nyKA2KfRlKN5edXOBWLC_g`

Instead, we want:

`UNSPLASH_ACCESS_KEY = os.environ.get("UNSPLASH_ACCESS_KEY")`

And in a file called `.env.local` in the project root directory store, we define:

`UNSPLASH_ACCESS_KEY=your_access_key_here`

In [None]:
dotenv_path = join(dirname(abspath("__file__")), '../.env.local')
load_dotenv(dotenv_path)

UNSPLASH_ACCESS_KEY = "soccs2fmeCHj1lY2rae7-nyKA2KfRlKN5edXOBWLC_g" 

unsplash_git_lfs_path = '../data/unsplash/'
example_path = '../imgs/'

In [None]:
print(UNSPLASH_ACCESS_KEY)

#### Some helper functions for saving and displaying images returned by `pyunsplash` requests

In [None]:
def save_image(path: str, photo: pyunsplash.src.photos.Photo):
    """
    Save the image from the given Unsplash photo object to the specified path.
    
    Args:
        path (str): The path where the image should be saved.
        photo (pyunsplash.src.photos.Photo): The Unsplash photo object.
    
    Returns:
        str: The full path and filename of the saved image.
    """
    filename = path + 'unsplash_' + photo.body['slug'] + '.png'
    response = requests.get(photo.link_download, allow_redirects=True)
    open(filename, 'wb').write(response.content)
    return filename

def display_image(filename: str):
    """
    Display an image given its filename.
    
    Parameters:
    filename (str): The path to the image file.
    """
    display(Image.open(filename))


### Example API usage

#### Download our first image -- running these cells causes ONE API request

In [None]:
# instantiate PyUnsplash object
pu = PyUnsplash(api_key=UNSPLASH_ACCESS_KEY)

photos = pu.photos(type_='random', count=1, featured=True, query="splash")
[photo] = photos.entries
print(photo.id, photo.link_download)

#### Save and display the image

In this simple case, we just save the image to `.../imgs/` (won't be tagged by `git lfs`)

In [None]:

example_filename = save_image(example_path, photo)
display_image(example_filename)

### Example API usage with search terms
#### Download Cinestill-tagged images to `data/unsplash` directory, so that they are stored with `git lfs`

NOTE: `pu.search` causes a request, and the `per_page` input will cause more image results PER REQUEST

#### Helper function to filter photos based on Cinestill 800T relevancy 

In [None]:
# filter entries based on relevancy, returns boolean
def filter_relevancy(entry: pyunsplash.src.photos.Photo) -> bool:
    """
    Filter function to determine the relevancy of a photo entry based on its description. 
    Filter for the inclusion of both 'cinestill' and '800' in the description.
    
    Args:
        entry (pyunsplash.src.photos.Photo): The photo entry to be filtered.
        
    Returns:
        bool: True if the photo entry is relevant, False otherwise.
    """
    if entry.body["description"]:
        des = entry.body["description"].lower()
        return 'cinestill' in des and '800' in des
    else:
        return False

#### Example request of many photos based on a search term
this cell causes a new API request!

In [None]:
# THIS CELL CAUSES A REQUEST TO UNSPLASH API
search_result_photos = pu.search(type_='photos', query='cinestill', per_page=10000)


#### Let's see what the results are and print their URLS
This cell doesn't cause a new API request


In [None]:
# THIS CELL DOES NOT CAUSE A REQUEST 
# filter entries based on relevancy, print out urls
filtered_photos = filter(lambda entry: filter_relevancy(entry), search_result_photos.entries)
# Go through filtered photos and save them to our unsplash data
filtered_photos_lst = list(filtered_photos)
print("After filtering, we have a total of", len(filtered_photos_lst), "relevant photos.")
for entry in filtered_photos_lst:
    im_name = entry.body['slug']
    print(f"Click to view image {im_name[:30]}... online: ", entry.link_html)


#### If we like the result, let's save the images to `../data/unsplash/`

In [None]:
for entry in filtered_photos_lst:
    filename = save_image(unsplash_git_lfs_path, entry)
    print("Saved image to", filename)
    # display_image(f) # very large display, uncommenting is not recommended