## UBC Intro to Machine Learning

###  APIs
Instructor: Socorro Dominguez  
February 05, 2022

## Exercise to try in your local machine

## Motivation

For our ML class, we want to do a Classifier that differentiates images from dogs and cats.

## Problem
We need a dataset to do this. Our friends don't have enough cats and dogs. 
Let's take free, open and legal data from the [Unsplash Image API](https://unsplash.com/developers).

## Caveats
Sometimes, raw data is unsuitable for machine learning algorithms. For instance, we may want:
- Only images that are landscape (i.e. width > height)
- All our images to be of the same resolution

---
## Step 1: Get cat and dog image URLs from the API
We will use the [`search/photos` GET method](https://unsplash.com/documentation#search-photos).

In [2]:
import requests
import config as cfg

# API variables
root_endpoint = 'https://api.unsplash.com/'
client_id = cfg.splash['key']

# Wrapper function for making API calls and grabbing results
def search_photos(search_term):
    api_method = 'search/photos'
    endpoint = root_endpoint + api_method
    response = requests.get(endpoint, 
                      params={'query': search_term, 'per_page': 30, 'client_id': client_id})
    status_code, result = response.status_code, response.json()
    
    if status_code != 200:
        print(f'Bad status code: {status_code}')
        
    image_urls = [img['urls']['small'] for img in result['results']]
    
    return image_urls

In [3]:
dog_urls = search_photos('dog')
cat_urls = search_photos('cat')

In [5]:
cat_urls

['https://images.unsplash.com/photo-1526336024174-e58f5cdd8e13?crop=entropy&cs=tinysrgb&fit=max&fm=jpg&ixid=MnwxOTY1NDl8MHwxfHNlYXJjaHwxfHxjYXR8ZW58MHx8fHwxNjQ0MDQxMzQz&ixlib=rb-1.2.1&q=80&w=400',
 'https://images.unsplash.com/photo-1514888286974-6c03e2ca1dba?crop=entropy&cs=tinysrgb&fit=max&fm=jpg&ixid=MnwxOTY1NDl8MHwxfHNlYXJjaHwyfHxjYXR8ZW58MHx8fHwxNjQ0MDQxMzQz&ixlib=rb-1.2.1&q=80&w=400',
 'https://images.unsplash.com/photo-1548247416-ec66f4900b2e?crop=entropy&cs=tinysrgb&fit=max&fm=jpg&ixid=MnwxOTY1NDl8MHwxfHNlYXJjaHwzfHxjYXR8ZW58MHx8fHwxNjQ0MDQxMzQz&ixlib=rb-1.2.1&q=80&w=400',
 'https://images.unsplash.com/photo-1495360010541-f48722b34f7d?crop=entropy&cs=tinysrgb&fit=max&fm=jpg&ixid=MnwxOTY1NDl8MHwxfHNlYXJjaHw0fHxjYXR8ZW58MHx8fHwxNjQ0MDQxMzQz&ixlib=rb-1.2.1&q=80&w=400',
 'https://images.unsplash.com/photo-1561948955-570b270e7c36?crop=entropy&cs=tinysrgb&fit=max&fm=jpg&ixid=MnwxOTY1NDl8MHwxfHNlYXJjaHw1fHxjYXR8ZW58MHx8fHwxNjQ0MDQxMzQz&ixlib=rb-1.2.1&q=80&w=400',
 'https://images.unsp

---
## Step 2: Download  the images from the URLs
(Step 2a: Google [how to download an image from a URL in Python](https://stackoverflow.com/a/40944159))

We'll just define the function to download an image for now. Later on, we'll use it on images one at a time (but after doing some processing).

In [8]:
from PIL import Image

def download_image(url):
    image = Image.open(requests.get(url, stream=True).raw)
    return image

In [9]:
test_img = download_image(cat_urls[0])
test_img.show()

---
## Step 3: Download and save images that meet our requirements
We'll need to know how to work with the [PIL Image data type](https://pillow.readthedocs.io/en/stable/reference/Image.html), which is what our `download_image(url)` function returns. Namely, we need to be able to a) get it's resolution and b) resize it.

In [None]:
import os

def is_landscape(image):
    return image.width > image.height


def save_category_images(urls, category_name, resolution=(256, 256)):
    save_folder = f'saved_images/{category_name}'
    if not os.path.exists(save_folder):
        os.mkdir(save_folder)
        
    for i, url in enumerate(urls):
        image = download_image(url)
        if is_landscape(image):
            image = image.resize(resolution)
            filename = f'{i:05d}.jpg'
            image.save(os.path.join(save_folder, filename))

In [None]:
save_category_images(dog_urls, 'dogs')
save_category_images(cat_urls, 'cats')