# Unsplash Image Search

Using this notebook you can search images from the [Full Unsplash Dataset](https://unsplash.com/data) using natural text descriptions. The search is powered by [OpenAI's CLIP neural network](https://github.com/openai/CLIP).

This notebook uses the precomputed feature vectors for almost 2 million images from the Unsplash dataset. If you want to compute the features yourself, take a look at my other notebook.

This project was created by [Vladimir Haltakov](https://twitter.com/haltakov) and the full code is open-source on [GitHub](https://github.com/haltakov/unsplash-image-search).

## Setup Environment

In this section we will setup the environment.

First we need to install some dependencies of CLIP that are not preinstalled in Colab.

In [None]:
!pip install ftfy==5.8

Now we need to checkout the code of the CLIP model from OpenAI's GutHib repository and move the required files in the root folder, so we can import them.

In [None]:
# Clone to code from the CLIP repository
!git clone https://github.com/openai/CLIP.git

# Move the Python files and the vocabulary archive
!mv CLIP/*.py .
!mv CLIP/*.gz .

We need to load the pretrained public CLIP model

In [None]:
import clip
import torch

# Load the open CLIP model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

## Download the Precomputed Data

In this section the precomputed feature vectors for all photos are downloaded.

In order to compare the photos from the Unsplash dataset to a text query, we need to compute the feature vector of each photo using CLIP. This is a time consuming task, so you can use the feature vectors that I precomputed and uploaded to Google Drive (with the permission from Unsplash).

If you want to compute the features yourself, take a look at my other notebook.

We need to download two files:
* `photo_ids.csv` - a list with the photo IDs for all images in the dataset. The photo ID can be used to get the actual photo from Unsplash.
* `features.npy` - a matrix containing the precomputed 512 element feature vector for each photo in the dataset.

In [None]:
# Create a folder for the precomputed features
!mkdir unsplash-dataset

# Download the photo IDs and the feature vectors
!gdown --id 1FdmDEzBQCf3OxqY9SbU-jLfH_yZ6UPSj -O unsplash-dataset/photo_ids.csv
!gdown --id 1L7ulhn4VeN-2aOM-fYmljza_TQok-j9F -O unsplash-dataset/features.npy

After the files are downloaded we need to load them using `pandas` and `numpy`.

In [None]:
import pandas as pd
import numpy as np

# Load the photo IDs
photo_ids = pd.read_csv("unsplash-dataset/photo_ids.csv")
photo_ids = list(photo_ids['photo_id'])

# Load the features vectors
photo_features = np.load("unsplash-dataset/features.npy")

# Print some statistics
print(f"Photos loaded: {len(photo_ids)}")

## Define Functions

Here, some important functions for processing the data are defined.



The `encode_search_query` function takes a text description and encodes it into a feature vector using the CLIP model.

In [11]:
def encode_search_query(search_query):
  with torch.no_grad():
    # Encode and normalize the search query using CLIP
    text_encoded = model.encode_text(clip.tokenize(search_query).to(device))
    text_encoded /= text_encoded.norm(dim=-1, keepdim=True)

  # Retrieve the feature vector from the GPU and convert it to a numpy array
  return text_encoded.cpu().numpy()

The `find_best_matches` function compares the text feature vector to the feature vectors of all images and finds the best matches. The function returns the IDs of the best matching photos.

In [12]:
def find_best_matches(text_features, photo_features, photo_ids, results_count=3):
  # Compute the similarity between the search query and each photo using the Cosine similarity
  similarities = list((text_features @ photo_features.T).squeeze(0))

  # Sort the photos by their similarity score and attach the photo ID to the score
  best_photos = sorted(zip(similarities, photo_ids), key=lambda x: x[0], reverse=True)

  # Return the photo IDs of the best matches
  return [best_photos[i][1] for i in range(results_count)]

The `display_photo` function displays a photo from Unsplash given its ID. 

This function needs to call the Unsplash API to get the URL of the photo and some metadata about the photographer. Since I'm [not allowed](https://help.unsplash.com/en/articles/2511245-unsplash-api-guidelines) to share my Unsplash API access key publicly, I created a small proxy that queries the Unsplash API and returns the data (see the code [here](https://github.com/haltakov/unsplash-image-search/tree/main/unsplash-proxy)). In this way you can play around without creating a developer account at Unsplash, while keeping my key private. I hope I don't hit the API rate limit.

If you already have an Unsplash developer account, you can uncomment the relevant code and plugin your own access key.

In [27]:
from IPython.display import Image
from IPython.core.display import HTML
from urllib.request import urlopen
import json

def display_photo(photo_id):
  # Proxy for the Unsplash API so that I don't expose my access key
  unsplash_api_url = f"https://haltakov.net/unsplash-proxy/{photo_id}"
  
  # Alternatively, you can use your own Unsplash developer account with this code
  # unsplash_api_url = f"https://api.unsplash.com/photos/{photo_id}?client_id=YOUR_UNSPLASH_ACCESS_KEY"
  
  # Fetch the photo metadata from the Unsplash API
  photo_data = json.loads(urlopen(unsplash_api_url).read().decode("utf-8"))

  # Get the URL of the photo resized to have a width of 480px
  photo_image_url = photo_data["urls"]["raw"] + "&w=320"

  # Display the photo
  display(Image(url=photo_image_url))

  # Display the attribution text
  display(HTML(f'Photo by <a target="_blank" href="https://unsplash.com/@{photo_data["user"]["username"]}?utm_source=ml_image_search&utm_medium=referral">{photo_data["user"]["name"]}</a> on <a target="_blank" href="https://unsplash.com/?utm_source=ml_image_search&utm_medium=referral">Unsplash</a>'))
  print()

Putting it all together in one function.

In [28]:
def search_unslash(search_query, photo_features, photo_ids, results_count=3):
  # Encode the search query
  text_features = encode_search_query(search_query)

  # Find the best matches
  best_photo_ids = find_best_matches(text_features, photo_features, photo_ids, results_count)

  # Display the best photos
  for photo_id in best_photo_ids:
    display_photo(photo_id)


## Search Unsplash

Now we are ready to search the dataset using natural language. Check out the examples below and feel free to try out different queries.

### "Two dogs playing in the snow"

In [34]:
search_query = "Two dogs playing in the snow"

search_unslash(search_query, photo_features, photo_ids, 3)










### "The word love written on the wall"

In [30]:
search_query = "The word love written on the wall"

search_unslash(search_query, photo_features, photo_ids, 3)










### "The feeling when your program finally works"

In [31]:
search_query = "The feeling when your program finally works"

search_unslash(search_query, photo_features, photo_ids, 3)










### "The Syndey Opera House and the Harbour Bridge at night"

In [32]:
search_query = "The Syndey Opera House and the Harbour Bridge at night"

search_unslash(search_query, photo_features, photo_ids, 3)








