# __Create an image classification model__

## Preface

- Tutorial Difficulty: â˜…â˜†â˜†â˜†â˜†
- 10 min read
- Languages : [SQL](https://en.wikipedia.org/wiki/SQL) (100%)
- File location : tutorial_en/thanosql_ml/classification/image_classification.ipynb
- References : [(AI-Hub) Product image data](https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=64), [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545)

## Tutorial Introduction

<div class="admonition note">
   <h4 class="admonition-title">Understanding Classification Operations</h4>
   <p>Classification task is a form of <a href="https://en.wikipedia.org/wiki/Machine_learning">Machine Learning</a> used to predict the category (Category or Class) to which a target belongs. For example, a classification task includes both binary classification, which classifies male or female, and multiple classification, which predicts the species of an animal (dog, cat, rabbit, etc.).</p>
</div>

Since 2010, a contest ([ImageNet](https://en.wikipedia.org/wiki/ImageNet)) has been held to classify images using artificial intelligence models. The classification performance of the winning model at the beginning of the competition was about 72%, but the [ResNet](https://arxiv.org/abs/1512.03385) model that won in 2015 recorded about 96% of the performance, and in a specific area, the classification performance of humans was started to surpass.

<div class="admonition tip">
   <p>The human classification capacity of the same data is said to be about 95%.</p>
</div>

Accurate image classification requires a [data labeling](https://en.wikipedia.org/wiki/Labeled_data) operation on a large dataset, but using the weights of the pre-trained AI model Recalibration for small, labeled datasets is widely used. As a result, it enables training of deep learning models even with a relatively small number of data.

ThanoSQL provides a variety of pre-trained AI models, and provides a way to create models through simple query syntax. Through this, users can extract potential insights from images that are difficult to quantify features from properly trained image classification models and utilize them for various services.

__The following is an example and usage of the ThanoSQL image classification model.__

- ThanoSQL image classification model reduces the iteration of the process of finding suitable categories for product registration in online product sales services. You can categorize product photos with a simple query syntax. Users can save time spent on traditional image classification by only correcting some misclassified data.

- If you are renting or selling art works, you can roughly classify works that are difficult to classify due to vague criteria such as the feeling, technique, and suitable location of each work using the image classification model.

- It can detect and classify defective products such as scratches and damage that were visually checked in manufacturing plants. Signal information such as laser spectrum can also be applied to the image classification model through processing such as visualization transformation.

<div class="admonition tip">
   <p>By storing and utilizing the behavioral histories of people who like art (purchase or rental histories, preferences, etc.) In other words, you can create a model that predicts preferences in age (20s, 30s, 40s, etc.), gender (male, female), and exhibition location (home, cafe, company, etc.) using only the image of a work of art.</p>
</div>

<div class="admonition note">
   <h4 class="admonition-title">In this tutorial</h4>
   <p>ðŸ‘‰ Build a model that classifies more than 10,000 products using the <code>Product Image</code> dataset from <a href="https://aihub.or.kr/">AI-Hub</a>, a leading AI open data sharing platform. The built model can be used as a detection and identification solution in smart logistics warehouses and unmanned stores. Datasets typically consist of over 10,000 commodity datasets of images and label (correct) pairs used to learn image classification techniques, and contain a total of 1,440,000 images. In this tutorial, you will only use 1,800 sheets of training data and 200 sheets of test data to learn how to use ThanoSQL and to check results quickly. <br></p>
</div>

<a href="https://docs.thanosql.ai/img/thanosql_ml/classification/image_classification/image_classification_data_intro.png">
   <img alt="Product Image Example" src="https://docs.thanosql.ai/img/thanosql_ml/classification/image_classification/image_classification_data_intro.png">
</a>

<div class="admonition warning">
   <h4 class="admonition-title">Tutorial Precautions</h4>
   <ul>
      <li>The image classification model can be used to predict one target value (Target, Category/Label) from one image.</li>
      <li>A column indicating the path of the image and a column indicating the target value of the image must exist.</li>
      <li>The base model of the corresponding image classification model (<code>CONVNEXT</code>) uses GPU. Depending on the size of the model used and the batch size, you may run out of GPU memory. In this case, try using a smaller model or reduce the batch size.</li>
   </ul>
</div>

## __0. Prepare Dataset and Model__

To use the query syntax of ThanoSQL, you must create an API token and run the query below, as mentioned in the [ThanoSQL Workspace](https://docs.thanosql.ai/en/getting_started/how_to_use_ThanoSQL/#5-thanosql-workspace).

In [None]:
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>

### __Prepare Dataset__

In [None]:
%%thanosql
GET THANOSQL DATASET product_image_data
OPTIONS (overwrite=True)

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>GET THANOSQL DATASET</strong>" Use the query syntax to save the desired dataset to the workspace. </li>
        <li>"<strong>OPTIONS</strong>" Specifies the option to use for <strong>GET THANOSQL DATASET</strong> via query syntax.
        <ul>
            <li>"overwrite" : Set whether to overwrite if a dataset with the same name exists. If True, the old dataset is replaced with the new dataset (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

In [None]:
%%thanosql
COPY product_image_train
OPTIONS (overwrite=True) 
FROM "thanosql-dataset/product_image_data/product_image_train.csv"

In [None]:
%%thanosql
COPY product_image_test
OPTIONS (overwrite=True) 
FROM "thanosql-dataset/product_image_data/product_image_test.csv"

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>COPY</strong>" Use the query syntax to specify the name of the dataset to be saved in the DB. </li>
        <li>Specifies the options to use for <strong>COPY</strong> via the query syntax "<strong>OPTIONS</strong>" .
        <ul>
            <li>"overwrite" : Set whether overwrite is possible if a dataset with the same name exists on the DB. If True, the old dataset is replaced with the new dataset (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

### __Prepare the Model__

In [None]:
%%thanosql
GET THANOSQL MODEL tutorial_product_classifier
OPTIONS (overwrite=True)
AS tutorial_product_classifier

<div class="admonition note">
    <h4 class="admonition-title">Query Details </h4>
    <ul>
        <li>"<strong>GET THANOSQL MODEL</strong>" Use the query syntax to store the desired model in the workspace and DB. </li>
        <li>"<strong>OPTIONS</strong>" Use the query syntax to specify the options to use for <strong>GET THANOSQL MODEL</strong>.
        <ul>
            <li>"overwrite" : Set whether datasets with the same name can be overwritten if they exist. If True, the existing dataset is changed to a new dataset (True|False, DEFAULT: False) </li>
        </ul>
        </li>
        <li>Use the query syntax "<strong>AS</strong>" to name the model. If you are not using the AS syntax, accept the name of <code>THANOSQL MODEL</code>.</li>
    </ul>
</div>

## __1. Check Dataset__

For this tutorial, we use the <mark style="background-color:#FFEC92 ">product_image_train</mark> table stored in ThanoSQL DB. Execute the query statement below to check the table contents.

In [None]:
%%thanosql
SELECT *
FROM product_image_train
LIMIT 5

<div class="admonition note">
   <h4 class="admonition-title">Understanding Data</h4>
   <ul>
      <li><mark style="background-color:#D7D0FF ">image_path</mark>: Location information of each image's file</li>
      <li><mark style="background-color:#D7D0FF ">div_l</mark> : Large classification of Products</li>
      <li><mark style="background-color:#D7D0FF ">div_m</mark> : middle classification of Products</li>
      <li><mark style="background-color:#D7D0FF ">div_s</mark> : subclassification of Products</li>
      <li><mark style="background-color:#D7D0FF ">div_n</mark> : detailed classification of Products</li>
      <li><mark style="background-color:#D7D0FF ">comp_nm</mark> : Manufacturer</li>
      <li><mark style="background-color:#D7D0FF ">img_prod_nm</mark> : Product name (image)</li>
      <li><mark style="background-color:#D7D0FF ">multi</mark> : Whether it is a multiple product image</li>
   </ul>
</div>

In [None]:
%%thanosql
PRINT IMAGE
AS
SELECT image_path
FROM product_image_train
LIMIT 5

## __2. Predicting Product Image Classification Results Using Pre-trained Models__

By executing the following query statement, you can quickly predict the results using the <mark style="background-color:#E9D7FD ">tutorial_product_classifier</mark> model, a pre-trained product image classification model.

In [None]:
%%thanosql
PREDICT USING tutorial_product_classifier
AS
SELECT *
FROM product_image_test

## __3. Create an image classification model__

Create an image classification model using the <mark style="background-color:#FFEC92 ">product_image_train</mark> dataset from the previous step. Execute the query syntax below to create a model named <mark style="background-color:#E9D7FD ">my_product_classifier</mark>.  
(Estimated duration of query execution: 5 min)

In [None]:
%%thanosql
BUILD MODEL my_product_classifier
USING ConvNeXt_Tiny
OPTIONS (
  image_col='image_path',
  label_col='div_l',
  epochs=1,
  overwrite=True
  )
AS
SELECT *
FROM product_image_train

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>BUILD MODEL</strong>" Use the query syntax to create and train the <mark style="background-color:#E9D7FD">my_product_classifier</mark> model.</li>
        <li>"<strong>USING</strong>" The query syntax specifies the use of <code>ConvNeXt_Tiny</code> as the base model.</li>
        <li>"<strong>OPTIONS</strong>" Specifies the options used to create the model through the query syntax.
        <ul>
            <li>"image_col" : Name of column containing image path</li>
            <li>"label_col" : The name of the column containing information about the target value</li>
            <li>"epochs : Number of times to learn all learning datasets</li>
        </ul>
        </li>
    </ul>
</div>

<div class="admonition tip">
    <p>Here, we set "epochs" to 1 to learn quickly. In general, larger numbers take more computation time, but predictive performance increases as training progresses.</p>
</div>

<div class="admonition note">
    <p>When <strong>overwrite is True </strong>, the user can create a data table with the same name as the previously created data table.<br>
    On the other hand, when <strong>overwrite is False</strong>, the user cannot create a data table with the same name as the previously created data table.</p>
</div>

## __4. Predict product image classification results using the generated model__

Using the product image classification model created in the previous step (<mark style="background-color:#E9D7FD ">my_product_classifier</mark>), try to predict the target value of a specific image (data table not used for training, <mark style="background-color:#FFEC92">product_image_test</mark>). After executing the query below, the prediction result is stored and returned in the <mark style="background-color:#D7D0FF">predicted</mark> column.

In [None]:
%%thanosql
PREDICT USING my_product_classifier
OPTIONS (
    image_col='image_path'
    )
AS
SELECT *
FROM product_image_test

<div class="admonition note">
    <h4 class="admonition-title">Query details</h4>
    <ul>
        <li>Use the <mark style="background-color:#E9D7FD">my_product_classifier</mark> model created in the previous step with the query syntax "<strong>PREDICT USING</strong>".</li>
        <li>Specify the options to use for prediction via the "<strong>OPTIONS</strong>" query syntax.
        <ul>
            <li>"image_col" : The name of the column in which the path of the image to be used for prediction is recorded</li>
        </ul>
        </li>
    </ul>
</div>

## __5. In Conclusion__

In this tutorial, we created an image classification model using the <mark style="background-color:#FFD79C">product image</mark> dataset. As this is a beginner-level tutorial, we have focused on operation rather than explaining the process to improve accuracy. The image classification model can improve its accuracy through fine tuning for each platform or service, and most satisfactory results can be obtained even with a small amount of data labeling. It is also possible to learn a base model using your own data, or to digitize and transform your data using a self-supervised model, and then distribute it using an automated machine learning (Auto-ML) technique. Create your own model and provide competitive services by combining various unstructured data (audio, video, text, etc.) and numeric data.

The next step, the [Creating an Intermediate Image Classification Model] tutorial, takes an in-depth look at image classification models. If you want to learn more about how to build your own image classification model for your service, please proceed to the following tutorials.

- [How to Upload to ThanoSQL DB](https://docs.thanosql.ai/en/how-to_guides/ThanoSQL_connecting/data_upload/)
- [Creating an Intermediate Image Classification Model]
- [Image conversion and creating My model using Auto-ML]
- [Deploy My Image Classification model](https://docs.thanosql.ai/en/how-to_guides/ThanoSQL_connecting/thanosql_api/rest_api_thanosql_query/)

<div class="admonition tip">
    <h4 class="admonition-title">Inquiries about deploying a model for your own service</h4>
    <p>If you have any difficulties in creating your own model using ThanoSQL or applying it to the service, please feel free to contact us belowðŸ˜Š</p>
    <p>For inquiries about building an image classification model: contact@smartmind.team</p>
</div>