# Unsplash Image Search

Using this notebook you can search for images from the [Unsplash Dataset](https://unsplash.com/data) using natural language queries. The search is powered by OpenAI's [CLIP](https://github.com/openai/CLIP) neural network.

This notebook uses the precomputed feature vectors for almost 2 million images from the full version of the [Unsplash Dataset](https://unsplash.com/data). If you want to compute the features yourself, see [here](https://github.com/haltakov/natural-language-image-search#on-your-machine).

This project was created by [Vladimir Haltakov](https://twitter.com/haltakov) and the full code is open-sourced on [GitHub](https://github.com/haltakov/natural-language-image-search).

## Setup Environment

In this section we will setup the environment.

First we need to install CLIP and then make sure that we have torch 1.7.1 with CUDA support.

In [1]:
!pip install git+https://github.com/openai/CLIP.git
!pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 -f https://download.pytorch.org/whl/torch_stable.html

Collecting git+https://github.com/openai/CLIP.git
  Cloning https://github.com/openai/CLIP.git to /tmp/pip-req-build-av4qe7qp
  Running command git clone -q https://github.com/openai/CLIP.git /tmp/pip-req-build-av4qe7qp
Collecting ftfy
  Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 542 kB/s 
Building wheels for collected packages: clip
  Building wheel for clip (setup.py) ... [?25l[?25hdone
  Created wheel for clip: filename=clip-1.0-py3-none-any.whl size=1369387 sha256=8b35e6ce1c7373948ddc6cf75facfd7fba83a2d085e05fba7e8693ce675c2fe8
  Stored in directory: /tmp/pip-ephem-wheel-cache-59cthgiv/wheels/fd/b9/c3/5b4470e35ed76e174bff77c92f91da82098d5e35fd5bc8cdac
Successfully built clip
Installing collected packages: ftfy, clip
Successfully installed clip-1.0 ftfy-6.1.1
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.7.1+cu101
  Downloading https://download.pytorch.org/whl/cu101/torch-1.7.1%2Bcu101

We can now load the pretrained public CLIP model.

In [2]:
import clip
import torch

# Load the open CLIP model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

100%|████████████████████████████████████████| 338M/338M [00:01<00:00, 212MiB/s]


## Download the Precomputed Data

In this section the precomputed feature vectors for all photos are downloaded.

In order to compare the photos from the Unsplash dataset to a text query, we need to compute the feature vector of each photo using CLIP. This is a time consuming task, so you can use the feature vectors that I precomputed and uploaded to Google Drive (with the permission from Unsplash). If you want to compute the features yourself, see [here](https://github.com/haltakov/natural-language-image-search#on-your-machine).

We need to download two files:
* `photo_ids.csv` - a list of the photo IDs for all images in the dataset. The photo ID can be used to get the actual photo from Unsplash.
* `features.npy` - a matrix containing the precomputed 512 element feature vector for each photo in the dataset.

The files are available on [Google Drive](https://drive.google.com/drive/folders/1WQmedVCDIQKA2R33dkS1f980YsJXRZ-q?usp=sharing).

In [3]:
from pathlib import Path

# Create a folder for the precomputed features
!mkdir unsplash-dataset

# Download from Github Releases
if not Path('unsplash-dataset/photo_ids.csv').exists():
  !wget https://github.com/haltakov/natural-language-image-search/releases/download/1.0.0/photo_ids.csv -O unsplash-dataset/photo_ids.csv

if not Path('unsplash-dataset/features.npy').exists():
  !wget https://github.com/haltakov/natural-language-image-search/releases/download/1.0.0/features.npy -O unsplash-dataset/features.npy
  

--2022-05-16 21:14:07--  https://github.com/haltakov/natural-language-image-search/releases/download/1.0.0/photo_ids.csv
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/330162907/ea59cda9-85ee-4657-9fb5-ddad20060ccb?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220516%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220516T211407Z&X-Amz-Expires=300&X-Amz-Signature=9974e41f05dcf00ed7fedf1ff60acd1ba546af80b561247ac98330a60e4241a5&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=330162907&response-content-disposition=attachment%3B%20filename%3Dphoto_ids.csv&response-content-type=application%2Foctet-stream [following]
--2022-05-16 21:14:07--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/330162907/ea59cda9-85ee-4657-9fb5-

After the files are downloaded we need to load them using `pandas` and `numpy`.

In [4]:
import pandas as pd
import numpy as np

# Load the photo IDs
photo_ids = pd.read_csv("unsplash-dataset/photo_ids.csv")
photo_ids = list(photo_ids['photo_id'])

# Load the features vectors
photo_features = np.load("unsplash-dataset/features.npy")

# Convert features to Tensors: Float32 on CPU and Float16 on GPU
if device == "cpu":
  photo_features = torch.from_numpy(photo_features).float().to(device)
else:
  photo_features = torch.from_numpy(photo_features).to(device)

# Print some statistics
print(f"Photos loaded: {len(photo_ids)}")

Photos loaded: 1981161


## Define Functions

Some important functions for processing the data are defined here.



The `encode_search_query` function takes a text description and encodes it into a feature vector using the CLIP model.

In [5]:
def encode_search_query(search_query):
  with torch.no_grad():
    # Encode and normalize the search query using CLIP
    text_encoded = model.encode_text(clip.tokenize(search_query).to(device))
    text_encoded /= text_encoded.norm(dim=-1, keepdim=True)

  # Retrieve the feature vector
  return text_encoded

The `find_best_matches` function compares the text feature vector to the feature vectors of all images and finds the best matches. The function returns the IDs of the best matching photos.

In [6]:
def find_best_matches(text_features, photo_features, photo_ids, results_count=3):
  # Compute the similarity between the search query and each photo using the Cosine similarity
  similarities = (photo_features @ text_features.T).squeeze(1)

  # Sort the photos by their similarity score
  best_photo_idx = (-similarities).argsort()

  # Return the photo IDs of the best matches
  return [photo_ids[i] for i in best_photo_idx[:results_count]]

The `display_photo` function displays a photo from Unsplash given its ID and link to the original photo on Unsplash. 

In [7]:
from IPython.display import Image
from IPython.core.display import HTML

def display_photo(photo_id):
  # Get the URL of the photo resized to have a width of 320px
  photo_image_url = f"https://unsplash.com/photos/{photo_id}/download?w=320"

  # Display the photo
  display(Image(url=photo_image_url))

  # Display the attribution text
  display(HTML(f'Photo on <a target="_blank" href="https://unsplash.com/photos/{photo_id}">Unsplash</a> '))
  print()

Putting it all together in one function.

In [8]:
def search_unslash(search_query, photo_features, photo_ids, results_count=3):
  # Encode the search query
  text_features = encode_search_query(search_query)

  # Find the best matches
  best_photo_ids = find_best_matches(text_features, photo_features, photo_ids, results_count)

  # Display the best photos
  for photo_id in best_photo_ids:
    display_photo(photo_id)


## Search Unsplash

Now we are ready to search the dataset using natural language. Check out the examples below and feel free to try out your own queries.

### "Two dogs playing in the snow"

In [9]:
search_query = "Two dogs playing in the snow"

search_unslash(search_query, photo_features, photo_ids, 3)










### "The word love written on the wall"

In [10]:
search_query = "The word love written on the wall"

search_unslash(search_query, photo_features, photo_ids, 3)










### "The feeling when your program finally works"

In [11]:
search_query = "The feeling when your program finally works"

search_unslash(search_query, photo_features, photo_ids, 3)










### "The Syndey Opera House and the Harbour Bridge at night"

In [12]:
search_query = "The Syndey Opera House and the Harbour Bridge at night"

search_unslash(search_query, photo_features, photo_ids, 3)










## Combine Text and Photo Seach Queries

This is another experiment to combine a text query with another photo.

The idea here is to do a text search for a photo and then modify the search query by adding another photo to the search query in order to transfer some of the photo features to the search.

This works by adding the features of the photo to the features of the text query. The photo features are multiplied with a weight in order to reduce the influence so that the text query is the main source.

The results are somewhat sensitive to the prompt...

In [13]:
def search_by_text_and_photo(query_text, query_photo_id, photo_weight=0.5):
  # Encode the search query
  text_features = encode_search_query(query_text)

  # Find the feature vector for the specified photo ID
  query_photo_index = photo_ids.index(query_photo_id)
  query_photo_features = photo_features[query_photo_index]

  # Combine the test and photo queries and normalize again
  search_features = text_features + query_photo_features * photo_weight
  search_features /= search_features.norm(dim=-1, keepdim=True)

  # Find the best match
  best_photo_ids = find_best_matches(search_features, photo_features, photo_ids, 1)

  # Display the results
  print("Test search result")
  search_unslash(query_text, photo_features, photo_ids, 1)

  print("Photo query")
  display(Image(url=f"https://unsplash.com/photos/{query_photo_id}/download?w=320"))

  print("Result for text query + photo query")
  display_photo(best_photo_ids[0])

## Results Combining Text and Photo Seach Queries

Now some results for combining text and photo queries

### Sydney Opera House + night photo

In [14]:
search_by_text_and_photo("Sydney Opera house", "HSsOC5nqurA")

Test search result



Photo query


Result for text query + photo query





### Sydney Opera House + mist photo

In [15]:
search_by_text_and_photo("Sydney Opera house", "MaerUPAjPbs")

Test search result



Photo query


Result for text query + photo query





### Sydney Opera House + rain photo

In [16]:
search_by_text_and_photo("Sydney Opera house", "1pNBJ2zUfn4", 0.4)

Test search result



Photo query


Result for text query + photo query





### Sydney Opera House + sea photo

In [17]:
search_by_text_and_photo("Sydney Opera house", "jnBDclcdZ7A", 0.4)

Test search result



Photo query


Result for text query + photo query



