# Ukraine War Images Search 
### (Powered by CLIP)

## Overview

This example implements search for Ukraine War Images

* You'll be able to use **natural text** queries for image search
* The search will work even **when the image has no such info in the captions** or metadata
* The search will work even when you make a ~typpo~ typo

**Example natural text queries and results**


[![WcUknj.md.png](https://iili.io/WcUknj.md.png)](https://freeimage.host/i/WcUknj)


Possibilities are endless. As long as the data has the images matching your natural text, you get sruprisingly accurate results.

### Install prerequisites

In [None]:
!python --version
!pip install -q clip-client 
!pip install -q ipywidgets # look nice in notebook

### Some nice to have dependencies

In [None]:
# Disables warnings otherwise screen gets flashy
import warnings
warnings.filterwarnings('ignore')

# Install matplotlib for sprite render
!pip install -q matplotlib

## Download the dataset - "Ukraine War Images"


For this example, we are going to use the "Ukraine War Images" dataset available at https://www.kaggle.com/datasets/mathurinache/ukraine-war-images. At the time of coding this example(Apr 2022), this dataset has 2000+ images. 

Here, we are downloading dataset from Kaggle.


**A. If you are not using this notebook on Kaggle Notebooks**

We will need [Kaggle API credentials](https://github.com/Kaggle/kaggle-api#api-credentials).

Download the credentials(`kaggle.json`) from your Kaggle account and upload it to this Google Colab's folder. Once you're done with that, run the following code.

In [None]:
#!pip install kaggle # Install kaggle library
#!mkdir ~/.kaggle # Make a directory for kaggle configs/credentials
#!cp kaggle.json ~/.kaggle/ # Copy kaggle api credentials in .kaggle
#!chmod 600 ~/.kaggle/kaggle.json # Permissions

Download the data using `kaggle` package commands

In [None]:
# !kaggle datasets download mathurinache/ukraine-war-images
# !unzip ukraine-war-images

**B. If you're using this notebook on Kaggle Notebooks**

Nothing else to do here. Just reference the data at `../input/ukraine-war-images/` folder using (`+Add Data` button).

If you face any issues, [refer this video](https://youtu.be/VaJEK6fycwM).

### Connect to CLIP server

CLIP server is the "backend of the search", responsible for encoding the image data and matching them with the query. It uses the pretrained CLIP Neural Network model.

I'm going to use a hosted demo instance of CLIP server, so all you need to do is run the following code.

In [None]:
from clip_client import Client

host = "grpc://demo-cas.jina.ai:51000"

c = Client(host)

> Alternatively, you could run your own [CLIP server](https://clip-as-service.jina.ai/user-guides/server/) by simply running `python -m clip_server` and replacing the `demo-cas.jina.ai:51000` with your server's IP

## Load the image data in DocumentArray
DocumentArray is a data structure for unstructured data such as images. It comes with the `clip-client` and will be helpful in implementing a scalable search solution in just few lines of code.

In [None]:
from docarray import DocumentArray
img_da = DocumentArray.from_files('../input/ukraine-war-images/ukraine_war/*.png')
# Let's see how does our data look like
img_da.plot_image_sprites()

## Encode the image data
Before we implement search, we need to encode the image data. It means creating vector representations of the images(also known as embeddings).

In [None]:
try:
    # If we already have saved the encoded data on cloud, let's just get it directly from there
    da = DocumentArray.pull('saved_ukraine_war_image_embeddings_on_cloud', show_progress=True, local_cache=True)
except BaseException:
    # Otherwise let's encode the data
    da = c.encode(img_da, batch_size=8, show_progress=True)
    # If you want to reuse the embeddings later, push the embeddings to cloud. 
    da.push('saved_ukraine_war_image_embeddings_on_cloud', show_progress=True)
    # Note: cloud stores data only for 1 week

# Show the embeddings data sprites
da.plot_image_sprites()

## Sample search

In [None]:
input_texts = [
    "tanks",
    "rockets",
    "destroyed vehicles",
    "destroyed buildings",
    "fight in the snow",
    "different vehicles used in the war",
    "different weapons used in the war",
    "two tanks in front of each other",
    "man holding weapon"
]

for txt in input_texts:
    print(txt)
    vec = c.encode([txt])
    r = da.find(query=vec, limit=4)
    r.plot_image_sprites()
    print("-----")

## Conclusion

We implemented natural semantic text to image search for 2000+ images of "Ukraine War Images Data". We used `CLIP` pre trained Neural Network model to do that. [`CLIP-as-service`](https://github.com/jina-ai/clip-as-service) made it easy to implement a scalable solution. This technique of search is called Neural Search.


## Resources
* [What is Neural Search](https://www.kdnuggets.com/2021/05/what-neural-search.html)
* [Clip-as-service python package](https://github.com/jina-ai/clip-as-service)
* [DocArray - Data structure for unstructured data](https://docarray.jina.ai)