# __Search image by keyword__

## Preface

- Tutorial Difficulty : ★☆☆☆☆
- 10 min read
- Languages : [SQL](https://ko.wikipedia.org/wiki/SQL) (100%)
- File location : tutorial_en/thanosql_search/search_image_by_keyword.ipynb
- References : [Food Image and Nutrition Text Introduction Dataset](https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=74)

## Tutorial Introduction

<div class="admonition note">
    <h4 class="admonition-title">Understanding Keyword-Image Search Service</h4>
    <p>ThanoSQL provides a search function that allows you to return images as results using keywords. The search uses a model such as the image classification model to set the keyword desired by the user as the target value, then adds a column that indexes the updated image using the learned model. In other words, keyword-image search finds images that corresponds to the desired target value (category). </p>
</div>

The word "search" has the dictionary meaning of "finding the necessary materials in a book or computer according to its purpose." ThanoSQL keyword-image search works a little differently than searching for information within a DB that has the inclusion of a specific word (keyword). Instead, the keyword-based image search creates a model that pre-learns and predicts words from the features of an image, and provides the image with the highest probability of being included in a specific keyword using that model.

**Below are examples and utilizations of the ThanoSQL keyword image search algorithm.**

- Use shopping categories as learning data to create a learning model, and use that model to create an index column within an existing/new image. The index column combined with numerical attributes such as image registration dates provides more sophisticated searches.
- You can create your own image search service by utilizing various results such as similar image search results and text-image search results covered in the next tutorial.

<div class="admonition note">
    <h4 class="admonition-title"> In this tutorial</h4>
    <p>👉 Use a combination of the "<strong>SEARCH</strong>" query statement provided by ThanoSQL and the "<strong>SELECT</strong>" query statement provided by traditional PostgreSQL to search your desired image using specific keywords. </p>
</div>

<div class="admonition tip">
    <h4 class="admonition-title">Dataset Description</h4>
    <p>The <code>Introduction to Food Images and Nutrition Information Text</code> dataset is organized by the Ministry of Science and ICT and supported by the Korea Intelligence Information Society Agency, and consists of 400 Korean most consumed dine out menu items and traditional Korean menu items. The dataset consists of 842,000 images. This tutorial uses only a few (10 types, 1,190 photos) images from that dataset. </p>
</div>

## __0. Prepare Dataset__

To use the query syntax of ThanoSQL, you must create an API token and run the query below, as mentioned in the [ThanoSQL Workspace](https://docs.thanosql.ai/en/getting_started/how_to_use_ThanoSQL/#5-thanosql-workspace).

In [None]:
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>

### __Prepare Dataset__

In [None]:
%%thanosql
GET THANOSQL DATASET diet_data
OPTIONS (overwrite=True)

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>GET THANOSQL DATASET</strong>" Use this query to save the desired dataset to your workspace environment. </li>
        <li>"<strong>OPTIONS</strong>"Use this statement to specify the options to use for the <strong>GET THANOSQL DATASET</strong> query.
        <ul>
            <li>"overwrite" : Overwrite if a dataset with the same name exists. If set as True, the existing dataset is replaced with the new dataset (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

In [None]:
%%thanosql
COPY diet 
OPTIONS (overwrite=True)
FROM "thanosql-dataset/diet_data/diet.csv"

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Use the "<strong>COPY</strong>" query statement to specify the dataset to be copied into the DB. </li>
        <li>"<strong>OPTIONS</strong>" specifies the options to use for the <strong>COPY</strong> query statement.
        <ul>
            <li>"overwrite" : Overwrite if a dataset with the same name exists in the DB. If True, the existing dataset is overwritten with the new dataset (True|False, DEFAULT: False) </li>
        </ul>
        </li>
    </ul>
</div>

## __1. Check Dataset__

To create a keyword-image search model, we use the <mark style="background-color:#FFEC92">diet</mark> table stored in ThanoSQL DB. Execute the query below and check the contents of the table.

In [None]:
%%thanosql
SELECT * 
FROM diet

<div class="admonition note">
    <h4 class="admonition-title">Understanding the Data Table</h4>
    <p>The <mark style="background-color:#FFEC92">diet</mark> table contains the following information. </p>
    <ul>
        <li><mark style="background-color:#D7D0FF">image_path</mark> : image path </li>
        <li><mark style="background-color:#D7D0FF">label</mark> : filename</li>
    </ul>
</div>

## __2. Create Keyword Search Model__

In order to search for images, it is necessary to train the existing data table to create a search criteria. To do this, we create an image classification model using the dataset mentioned prior. Execute the query syntax below to create a model named <mark style="background-color:#E9D7FD ">diet_image_classification</mark>.  
(Estimated duration of query execution: 3 min)

In [None]:
%%thanosql
BUILD MODEL diet_image_classification
USING ConvNeXt_Tiny
OPTIONS (
    image_col='image_path', 
    label_col='label', 
    epochs=1,
    overwrite=True
    )
AS 
SELECT *
FROM diet

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Create and train the model <mark style="background-color:#E9D7FD ">diet_image_classification</mark> using the query statement "<strong>BUILD MODEL</strong>".</li>
        <li>Specify <code>ConvNeXt_Tiny</code> to be used as the base model with the query statement "<strong>USING</strong>".</li>
        <li>"<strong>OPTIONS</strong>" specifies the options for the query used to create a model.
        <ul>
            <li>"image_col" : The name of the column containing the image path.</li>
            <li>"label_col" : The name of the column containing the target value information</li>
            <li>"epochs" : Number of times to train all training datasets</li>
            <li>"overwrite" : Overwrite if a model with the same name exists. If true, the existing model is replaced with the new model (True|False, DEFAULT : False)</li>
        </ul>
        </li>
    </ul>
</div>

## __3. Use The Generated Model to Check The kKeyword-Image Search Model__

Use the image prediction model created in the previous step(<mark style="background-color:#E9D7FD ">diet_image_classification</mark>) to predict the target value of a specific image. After executing the query below, the prediction result is stored and returned in the <mark style="background-color:#D7D0FF">predicted</mark> column.

In [None]:
%%thanosql
PREDICT USING diet_image_classification
AS 
SELECT *
FROM diet

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Use the <mark style="background-color:#E9D7FD ">diet_image_classification</mark> model created in the previous step for prediction via the query syntax "<strong>PREDICT USING</strong>".</li>
    </ul>
</div>

## __4. Search Using The Generated Model__

Now use the "__PREDICT USING__", "__SELECT__", "__WHERE__" query statements to retrieve data with a specific condition. You can search data for which the <mark style="background-color:#E9D7FD ">label</mark> is '사과파이'('apple pie') and the prediction result is also '사과파이' by writing the following query syntax.

In [None]:
%%thanosql
SELECT A.image_path, A.label, B.predicted 
FROM diet A
LEFT JOIN (
    SELECT * 
    FROM (PREDICT USING diet_image_classification 
    AS SELECT * FROM diet)) B 
ON A.image_path = B.image_path
WHERE A.label = B.predicted
AND A.label LIKE '사과파이'
LIMIT 10

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>SELECT * FROM (...)</strong>" query selects all the results of the query starting with "<strong>PREDICT USING</strong>".</li>
        <li>Set the condition with the "<strong>WHERE</strong> query statement". This condition can be followed by "<strong>AND</strong>".
        <ul>
            <li>"label = predicted" : Queries only data where the<mark style="background-color:#D7D0FF ">label</mark> column and <mark style="background-color:#D7D0FF ">predicted</mark> column have the same values.</li>
            <li>"label = '사과파이'" : Queries data where the <mark style="background-color:#D7D0FF ">label</mark> is 'apple pie'.</li>
        </ul>
        </li>
    </ul>
</div>

## **5. In Conclusion**

In this tutorial, we constructed and utilized a model to search for food images related to keywords using the `food image dataset`. We focused on the operation side of things rather than accuracy. You can improve the accuracy of your model by adjusting various options, such as the number of training done and the number of data in the build option. Furthermore, try following along the image-image and image-text search tutorials to create your own search services.

<div class="admonition tip">
    <h4 class="admonition-title">Inquiries about deploying a model for your own service</h4>
    <p>If you have any difficulties creating your own model using ThanoSQL or applying it to your services, please feel free to contact us below😊</p>
    <p>For inquiries regarding building keyword-image search models: contact@smartmind.team</p>
</div>