# __Search image by text__

## Preface

- Tutorial Difficulty : ★★☆☆☆
- 7 min read
- Languages : [SQL](https://en.wikipedia.org/wiki/SQL) (100%)
- File location : tutorial_en/thanosql_search/search_image_by_text.ipynb
- References : [Unsplash Dataset - Lite](https://unsplash.com/data), [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)

## Tutorial Introduction

<div class="admonition note">
    <h4 class="admonition-title">Understanding Text Digitization Techniques</h4>
    <p>For computers to understand natural language, natural language must be quantified. Recently, studies on pre-learning models such as <a href="https://en.wikipedia.org/wiki/BERT_(language_model)">BERT</a> and <a href="https://en.wikipedia.org/wiki/GPT-3">GPT-3</a> have been actively carried out, showing remarkable results. These models identify the meaning of each sentence based on <a href="https://en.wikipedia.org/wiki/Self-supervised_learning">Self-Supervised Learning</a>, and numerically express each sentence with a similar meaning in a low-dimensional space so that they are closely located. It supports learning without labeling by determining whether each sentence/context is true/false by randomly shuffling the order between sentences or masking some words.</p>
</div>

The problem of handling different forms of input together, such as text and images, is called multi-modal. **"CLIP: Connecting Text and Image"** deals with the understanding of low-dimensional space quantified with a representative multi-modal model. If the existing model learned only the <a href="https://en.wikipedia.org/wiki/Feature_(machine_learning)">features</a> of the image itself, the multi-modal model can use both images and text as input data while simultaneously learning the features of the text describing the image. In addition, by placing text and images together in a low-dimensional space, it is possible to judge the similarity between text and images, and by applying this, it can be used as a search algorithm.

ThanoSQL uses artificial intelligence algorithms to quantify datasets. The digitized data is stored in the DB column, and is used to search for similar images through digitization results and similarity calculations of the input text.

__The following is an example and application of ThanoSQL text-image search algorithm.__

- Describe the desired scene in text in the image or video you have and search for the image that is most similar to it. Hear text-based, rather than keywords, descriptions of the products users are searching for and expose the most similar product images.
- Search for the time you want to place the advertisement you want in YouTube videos, etc. In order to place travel advertisements, you can easily search for scenes with mountains or camping scenes and insert advertisements.

<div class="admonition note">
    <h4 class="admonition-title">In this tutorial</h4>
    <p>👉 Unsplash released images of more than 200,000 photographers for free as a dataset for AI. <code>Unsplash Dataset - Lite</code> consists of 25,000 nature-themed images and comes with 25,000 keywords. </p>
</div>

In this tutorial, we will use the text-image search model to search for the desired image in text from 25,000 images in the `Unsplash Dataset - Lite` dataset in the ThanoSQL DB.

## __0. Prepare Dataset and Model__

To use the query syntax of ThanoSQL, you must create an API token and run the query below, as mentioned in the [ThanoSQL Workspace](https://docs.thanosql.ai/en/getting_started/how_to_use_ThanoSQL/#5-thanosql-workspace).

In [None]:
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>

### __Prepare Dataset__

In [None]:
%%thanosql
GET THANOSQL DATASET unsplash_data
OPTIONS (overwrite=True)

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>GET THANOSQL DATASET</strong>" Use the query syntax to save the desired dataset to the workspace. </li>
        <li>"<strong>OPTIONS</strong>" Specifies the option to use for <strong>GET THANOSQL DATASET</strong> via query syntax.
        <ul>
            <li>"overwrite" : Set whether to overwrite if a dataset with the same name exists. If True, the old dataset is replaced with the new dataset (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

In [None]:
%%thanosql
COPY unsplash_data 
OPTIONS (overwrite=True)
FROM 'thanosql-dataset/unsplash_data/unsplash.csv'

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Use the "<strong>COPY</strong>" query syntax to specify the dataset name to store in the DB. </li>
        <li>"<strong>OPTIONS</strong>" Specifies the options to use for <strong>COPY</strong> query syntax.
        <ul>
            <li>"overwrite" : Set whether or not a dataset with the same name can be overwritten if it exists on the DB. If True, the existing dataset is changed to the new dataset (True|False, DEFAULT: False) </li>
        </ul>
        </li>
    </ul>
</div>

### __Prepare the Model__

In [None]:
%%thanosql
GET THANOSQL MODEL tutorial_search_clip
OPTIONS (overwrite=True)
AS tutorial_search_clip

<div class="admonition note">
    <h4 class="admonition-title">Query Details </h4>
    <ul>
        <li>"<strong>GET THANOSQL MODEL</strong>" Use the query syntax to store the desired model in the workspace and DB. </li>
        <li>"<strong>OPTIONS</strong>" Use the query syntax to specify the options to use for <strong>GET THANOSQL MODEL</strong>.
        <ul>
            <li>"overwrite" : Set whether datasets with the same name can be overwritten if they exist. If True, the existing dataset is changed to a new dataset (True|False, DEFAULT: False) </li>
        </ul>
        </li>
        <li>Use the query syntax "<strong>AS</strong>" to name the model. If you are not using the AS syntax, accept the name of <code>THANOSQL MODEL</code>.</li>
    </ul>
</div>

## __1. Check Dataset__

To create a text-image search model, we use the `unsplash_data` table stored in ThanoSQL DB. Execute the query statement below and check the contents of the table.

In [None]:
%%thanosql
SELECT photo_id, image_path, photo_image_url, photo_description, ai_description
FROM unsplash_data
LIMIT 5

<div class="admonition note">
    <h4 class="admonition-title">Understanding Data</h4>
    <ul>
        <li><code>photo_id</code> Unique id column name of image </li>
        <li><code>image_path</code> Column name of the path where the image is located </li>
        <li><code>photo_image_url</code> Column name indicating the address of the original image in the website unsplash </li>
        <li><code>photo_description</code> Column name that represents a short human description of the image.</li>
        <li><code>ai_description</code> Column name that describes the image generated by AI</li>
    </ul>
</div>

In [None]:
%%thanosql
PRINT IMAGE 
AS
SELECT image_path 
FROM unsplash_data 
LIMIT 5

## __2. Creating an Image Numerical Model for Text Search__

<div class="admonition danger">
    <h4 class="admonition-title">Notes</h4>
    <p>Because text-image retrieval algorithms take a long time to learn and use pre-trained models with a total of 400 million datasets, we omit the learning process using the "<strong>BUILD MODEL</strong>" query syntax in this tutorial. The <code>tutorial_search_clip</code> model is used as a base algorithm by importing a pre-trained model using <code>clipen</code>. When the "<strong>CONVERT USING</strong>" query statement is executed, a column in which the image is digitized with "model name (<code>tutorial_search_clip</code>)_base algorithm name (<code>clipen</code>)" is automatically created, and "<strong>SEARCH IMAGE</strong>" When the query statement is executed, an image similarity column is automatically created with "model name (<code>tutorial_search_clip</code>)_base algorithm name (<code>clipen</code>)_similarity number(1)". "Number" here means the number of texts used in the search. When a search is performed with two or more texts, the number of columns is sequentially increased according to the order. For more details, see below.</p>
</div>
(Expected time to execute query: 3 min)  

<p>Run the following query syntax "<strong>CONVERT USING</strong>" to quantify the <code>unsplash_data</code> images. Numerical results are stored in the new <mark style="background-color:#D7D0FF">tutorial_search_clip_clipen</mark> column. (The resulting column name will be added as {model_name}_{base_model_name}) </p>

In [None]:
%%thanosql
CONVERT USING tutorial_search_clip
OPTIONS (
    image_col="image_path", 
    table_name="unsplash_data", 
    batch_size=128
    )
AS 
SELECT *
FROM unsplash_data

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>The query syntax "<strong>CONVERT USING</strong>" uses the <code>tutorial_search_clip</code> model as an algorithm for image quantification.</li>
        <li>The query syntax "<strong>OPTIONS</strong>" defines the variables required for image quantification.
        <ul>
            <li>"table_name" : table name to be stored in ThanoSQL DB</li>
            <li>"image_col" : Column name containing the image path </li>
            <li>"batch_size" : The size of the dataset bundle read in one training. According to the paper, the larger the number, the better the learning performance, but considering the size of the memory, 128 is used. (DEFAULT: 16) </li>
        </ul>
        </li>
    </ul>
</div>

In [None]:
%%thanosql
SELECT *
FROM unsplash_data
LIMIT 5

## __3. Search for images by text__

Perform a text-based image search using the "__SEARCH IMAGE__" query syntax and the `tutorial_search_clip` model you created.
 You can. Execute the following query syntax with the text "a black cat" and embedded `unsplash_data`
Calculate the similarity of images. The result value is in the newly added <mark style="background-color:#D7D0FF ">tutorial_search_clip_clipen_similarity1</mark> column.
is saved.

In [None]:
%%thanosql
SEARCH IMAGE text="a black cat"
USING tutorial_search_clip
AS 
SELECT * 
FROM unsplash_data

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Specifies that images will be found using the query syntax "<strong>SEARCH IMAGE</strong>". Enter the text content of the image you want to find using the "text" variable. </li>
        <li>The query syntax "<strong>USING</strong>" specifies to use <code>tutorial_search_clip</code> as the model to use for the search.</li>
    </ul>
</div>

Execute the query syntax below to determine the similarity of the 5 images most similar to the text 'a black cat'.

In [None]:
%%thanosql
SELECT image_path, tutorial_search_clip_clipen_similarity1 
FROM (
    SEARCH IMAGE text="a black cat"
    USING tutorial_search_clip
    AS 
    SELECT * 
    FROM unsplash_data
    )
ORDER BY tutorial_search_clip_clipen_similarity1 DESC 
LIMIT 5

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>The query syntax "<strong>SEARCH IMAGE</strong>" calculates and returns the similarity between the text entered and the image.</li>
        <li>The first "<strong>SELECT</strong>" query syntax selects the <mark style="background-color:#D7D0FF ">image_path</mark> column and the <mark style="background-color:#D7D0FF ">tutorial_search_clip_clipen_similarity1</mark> column from the query result in parentheses.</li>
        <li>The "<strong>ORDER BY</strong>" query syntax sorts the results based on the value of the <mark style="background-color:#D7D0FF ">tutorial_search_clip_clipen_similarity1</mark> column, in descending order ("<strong>DESC</strong>"), of which the top 5 It prints the result of ("<strong>LIMIT</strong>" 5).</li>
    </ul>
</div>

By applying the previous query syntax with the "__PRINT__" statement, you can immediately check the resulting image.

In [None]:
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, tutorial_search_clip_clipen_similarity1 
    FROM (
        SEARCH IMAGE text="a black cat"
        USING tutorial_search_clip
        AS 
        SELECT * 
        FROM unsplash_data
        )
    ORDER BY tutorial_search_clip_clipen_similarity1 DESC 
    LIMIT 5
    )

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <p>This query, combined with the query above, consists of three steps.</p>
    <ul>
        <li>The query syntax "<strong>SELECT</strong>" in the first parentheses produces the result of the step immediately above.</li>
        <li>Use the "<strong>PRINT IMAGE</strong>" query syntax to print that image.</li>
    </ul>
</div>

In [None]:
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, tutorial_search_clip_clipen_similarity1 
    FROM (
        SEARCH IMAGE text="a dog on a chair"
        USING tutorial_search_clip
        AS 
        SELECT * 
        FROM unsplash_data
        )
    ORDER BY tutorial_search_clip_clipen_similarity1 DESC 
    LIMIT 5
    )

In [None]:
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, tutorial_search_clip_clipen_similarity1 
    FROM (
        SEARCH IMAGE text="gloomy photos"
        USING tutorial_search_clip
        AS 
        SELECT * 
        FROM unsplash_data
        )
    ORDER BY tutorial_search_clip_clipen_similarity1 DESC 
    LIMIT 5
    )

In [None]:
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, tutorial_search_clip_clipen_similarity1 
    FROM (
        SEARCH IMAGE text="the feeling when your program finally works"
        USING tutorial_search_clip
        AS 
        SELECT * 
        FROM unsplash_data
        )
    ORDER BY tutorial_search_clip_clipen_similarity1 DESC 
    LIMIT 5
    )

## __4. In Conclusion__

In this tutorial, we used a multimodal text/image quantification model to search for images via text in `unsplash dataset`. As it is a beginner's tutorial, we focused on getting visible results through simple queries. If you use image search with a slightly more colorful query, you will get a value closer to the desired result.

<div class="admonition tip">
    <h4 class="admonition-title">Inquiries about deploying a model for your own service</h4>
    <p>If you have any difficulties in creating your own model using ThanoSQL or applying it to the service, please feel free to contact us below😊</p>
    <p>For inquiries about building a text-image search model: contact@smartmind.team</p>
</div>