# __Create an image classification model__

## Preface

- Tutorial Difficulty: â˜…â˜†â˜†â˜†â˜†
- 10 min read
- Languages : [SQL](https://en.wikipedia.org/wiki/SQL) (100%)
- File location : tutorial_en/thanosql_ml/classification/image_classification.ipynb
- References : [(AI-Hub) Product image data](https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=64), [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545)

## Tutorial Introduction

<div class="admonition note">
    <h4 class="admonition-title">Understanding Classification</h4>
    <p>Classification is a form of <a href="https://en.wikipedia.org/wiki/Machine_learning">Machine Learning</a> used to predict categories (Category or Class) to which the target belongs to. For example, both binary classifications (used for classifying men or women) and multiple classifications (used to predict animal species such as dogs, cats, rabbits, etc.) are included in the classification tasks. <br></p>
</div>

The image classification contest ([ImageNet](https://en.wikipedia.org/wiki/ImageNet)) has been held since 2010. The winning model at the beginning of the competition showed a 72% accuracy. In 2015, the [ResNet](https://arxiv.org/abs/1512.03385) model that won showed 96% accuracy and started to surpass human classification capabilities.

<div class="admonition tip">
   <p>The human ability to classify the same data is estimated at about 95%.</p>
</div>

Even though [Data labeling](https://en.wikipedia.org/wiki/Labeled_data) is important for accurate image classification, methods of correcting the dataset using the weights of the pre-trained model are widely used. This method allows deep learning models training even with a relatively small number of data.

ThanoSQL provides a variety of pre-trained models and allows model creation using simple queries. With this, users can derive insights from images with difficult to quantify features with properly trained models and utilize them for various services.

__The following are examples and applications of the ThanoSQL classification model.__

- The ThanoSQL image classification model reduces the process of finding suitable categories for product registration in online sales services. You can categorize product photos with a simple query. Users can save time compared to traditional image classification by focusing just on correcting some misclassified data.

- When renting or selling art works, you can roughly classify works that are difficult to classify because of their vague criteria such as the feeling, technique, and suitable location of each work using the image classification model.

- You can detect and classify defective products with scratches and damage that were visually checked in manufacturing plants. Signal information such as laser spectrum can also be applied to the image classification model by being processed with techniques such as visualization transformation.

<div class="admonition tip">
   <p>You can also use the behavioral trend of people who like art to create a classification model that finds groups of people who are most likely to enjoy a particular piece of art. In other words, using only artwork images, you can create a model that predicts preferences based on age, gender, place, and etc.
</div>

<div class="admonition note">
   <h4 class="admonition-title">In this tutorial</h4>
   <p>ðŸ‘‰ Build an image classification model to classify more than 10,000 products using the <code>Product Image</code> dataset from <a href="https://aihub.or.kr/">AI-Hub</a>, a data sharing platform. The model can be used for detection and identification in smart warehouses and unmanned stores.
   The model can be used as a detection solution in smart logistics warehouses and unmanned stores. Datasets typically consist of over 10,000 products with images and label (correct) pairs used to for image classification techniques, and contain a total of 1,440,000 images. In this tutorial, you will use 1,800 training data points and 200 test data points to learn how to use ThanoSQL and to obtain results quickly. <br></p>
</div>

<a href="https://docs.thanosql.ai/img/thanosql_ml/classification/image_classification/image_classification_data_intro.png">
   <img alt="Product Image Example" src="https://docs.thanosql.ai/img/thanosql_ml/classification/image_classification/image_classification_data_intro.png">
</a>

<div class="admonition warning">
   <h4 class="admonition-title">Tutorial Precautions</h4>
   <ul>
      <li>The image classification model can be used to predict one target value (Target, Category) from one image.</li>
      <li>Both a column representing the image path and a column representing the target value of the image must exist.</li>
      <li>The base model of the corresponding image classification model (<code>CONVNEXT</code>) uses GPU. Depending on the size and the batch size of the model used, GPU memory may be insufficient. In this case, try using a smaller model or reducing the batch size of the model.</li>
   </ul>
</div>

## __0. Prepare Dataset__

To run ThanoSQL queries, you must create an API token and run the code below, as mentioned in the [ThanoSQL Workspace](https://docs.thanosql.ai/en/getting_started/how_to_use_ThanoSQL/#5-thanosql-workspace).

In [None]:
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>

### __Prepare Dataset__

In [None]:
%%thanosql
GET THANOSQL DATASET product_image_data
OPTIONS (overwrite=True)

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>GET THANOSQL DATASET</strong>" Use this query statement to save the desired dataset to your workspace environment. </li>
        <li>"<strong>OPTIONS</strong>" Use this statement to specify the option to use for the <strong>GET THANOSQL DATASET</strong> query statement.
        <ul>
            <li>"overwrite" : Overwrite if a dataset with the same name exists. If set as True, the existing dataset is replaced with the new dataset (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

In [None]:
%%thanosql
COPY product_image_train
OPTIONS (overwrite=True)
FROM "thanosql-dataset/product_image_data/product_image_train.csv"

In [None]:
%%thanosql
COPY product_image_test
OPTIONS (overwrite=True)
FROM "thanosql-dataset/product_image_data/product_image_test.csv"

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Use the "<strong>COPY</strong>" clause to specify the dataset to be copied into the DB. </li>
        <li>"<strong>OPTIONS</strong>" specifies the options to use for <strong>COPY</strong> clause.
        <ul>
            <li>"overwrite" : Overwrite if a dataset with the same name exists. If True, the existing dataset is overwritten with the new dataset. (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

### __Prepare the Model__

In [None]:
%%thanosql
GET THANOSQL MODEL tutorial_product_classifier
OPTIONS (overwrite=True)
AS tutorial_product_classifier

<div class="admonition note">
    <h4 class="admonition-title">Query Details </h4>
    <ul>
        <li>"<strong>GET THANOSQL MODEL</strong>" downloads the specified model to the workspace. </li>
        <li>"<strong>OPTIONS</strong>" specifies the option values to be used for the <strong>GET THANOSQL MODEL</strong> clause.
        <ul>
            <li>"overwrite" : Determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (True|False, DEFAULT : False) </li>
        </ul>
        </li>
        <li>"<strong>AS</strong>" names the given model. If not specified, the model will be named as the default <code>THANOSQL MODEL</code> name.</li>
    </ul>
</div>

## __1. Check Dataset__

To create the image classification  model, we use the <mark style="background-color:#FFEC92 ">product_image_train</mark> table from the ThanoSQL database. Run the query below to check the contents of the table.

In [None]:
%%thanosql
SELECT *
FROM product_image_train
LIMIT 5

<div class="admonition note">
   <h4 class="admonition-title">Understanding the Data</h4>
   <ul>
      <li><mark style="background-color:#D7D0FF ">image_path</mark>: Image file's path location</li>
      <li><mark style="background-color:#D7D0FF ">div_l</mark> : Large classification of Products</li>
      <li><mark style="background-color:#D7D0FF ">div_m</mark> : Middle classification of Products</li>
      <li><mark style="background-color:#D7D0FF ">div_s</mark> : Subclassification of Products</li>
      <li><mark style="background-color:#D7D0FF ">div_n</mark> : Detailed classification of Products</li>
      <li><mark style="background-color:#D7D0FF ">comp_nm</mark> : Manufacturer</li>
      <li><mark style="background-color:#D7D0FF ">img_prod_nm</mark> : Product name (image)</li>
      <li><mark style="background-color:#D7D0FF ">multi</mark> : Whether image has multiple products</li>
   </ul>
</div>

In [None]:
%%thanosql
PRINT IMAGE
AS
SELECT image_path
FROM product_image_train
LIMIT 5

## __2. Predicting Product Image Classification Results Using Pre-trained Models__

To predict the results using the <mark style="background-color:#E9D7FD ">tutorial_product_classifier</mark> model, a pre-trained image classification model, run the query below.

In [None]:
%%thanosql
PREDICT USING tutorial_product_classifier
AS
SELECT *
FROM product_image_test

## __3. Create an image classification model__

Create an image classification model using the <mark style="background-color:#FFEC92 ">product_image_train</mark> dataset from the previous step. Execute the query below to create a model named <mark style="background-color:#E9D7FD ">my_product_classifier</mark>.  
(Estimated duration of query execution: 5 min)

In [None]:
%%thanosql
BUILD MODEL my_product_classifier
USING ConvNeXt_Tiny
OPTIONS (
  image_col='image_path',
  label_col='div_l',
  epochs=1,
  overwrite=True
  )
AS
SELECT *
FROM product_image_train

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>BUILD MODEL</strong>" creates and trains a model named <mark style="background-color:#E9D7FD">my_product_classifier</mark>.</li>
        <li>"<strong>USING</strong>" specifies <code>ConvNeXt_Tiny</code> as the base model</li>
        <li>"<strong>OPTIONS</strong>" specifies the option values used to create the model.
        <ul>
            <li>"image_col" : Name of column containing the image path</li>
            <li>"label_col" : Name of the column containing information about the target value</li>
            <li>"epochs : Number of times the training the dataset is repeated</li>
        </ul>
        </li>
    </ul>
</div>

<div class="admonition tip">
    <p>In this example, we set "epochs" to 1 to train the model quickly. In general, larger number of "epochs" increases performance of the inference at the cost of the computation time.</p>
</div>

<div class="admonition note">
    <p>When <strong>overwrite is set as True </strong>, users can create a table with the same name as the previously created table.<br>
    On the other hand, when <strong>overwrite is set as False</strong>, users would not be able to create a table with the same name as the previously created table.</p>
</div>

## __4. Predict product image classification results using the generated model__

With image classification model created in the previous step (<mark style="background-color:#E9D7FD ">my_product_classifier</mark>), try predicting the target value from <mark style="background-color:#FFEC92">product_image_test</mark>. After running the query below, the prediction result is stored in the <mark style="background-color:#D7D0FF">predicted</mark> column.

In [None]:
%%thanosql
PREDICT USING my_product_classifier
OPTIONS (
    image_col='image_path'
    )
AS
SELECT *
FROM product_image_test

<div class="admonition note">
    <h4 class="admonition-title">Query details</h4>
    <ul>
        <li>Use the <mark style="background-color:#E9D7FD">my_product_classifier</mark> model created in the previous step with the "<strong>PREDICT USING</strong>" query.</li>
        <li>"<strong>OPTIONS</strong>" specifies the option values used to create the model.
        <ul>
            <li>"image_col" : The name of the column where the path of the image used for prediction is stored.</li>
        </ul>
        </li>
    </ul>
</div>

## __5. In Conclusion__

In this tutorial, we created an image classification model using the <mark style="background-color:#FFD79C">product image</mark> dataset. As this is a beginner-level tutorial, we focused on the development process rather than focusing on accuracy. The image classification model can improve its accuracy with fine tuning based on each the service, and more satisfactory results can be obtained with a small amount of data labeling. It is also possible to train the base model using your own data, or to vectorize and transform your data using a self-supervised model, and then distributing it using automated machine learning (Auto-ML) technique. Create your own model and provide competitive services by combining various unstructured data (audio, video, text, etc.) and structured data.

The [Creating an Intermediate Image Classification Model] tutorial, takes an in-depth look at image classification models. If you want to learn more about building your own image classification model for your service, please proceed with the following tutorials.

- [How to Upload to ThanoSQL DB](https://docs.thanosql.ai/en/how-to_guides/ThanoSQL_connecting/data_upload/)
- [Creating an Intermediate Image Classification Model]
- [Image conversion and creating My model using Auto-ML]
- [Deploy My Image Classification model](https://docs.thanosql.ai/en/how-to_guides/ThanoSQL_connecting/thanosql_api/rest_api_thanosql_query/)

<div class="admonition tip">
    <h4 class="admonition-title">Inquiries about deploying a model for your own service</h4>
    <p>If you have any difficulties in creating your own model using ThanoSQL or applying it to your service, please feel free to contact us belowðŸ˜Š</p>
    <p>For inquiries regarding building an image classification model: contact@smartmind.team</p>
</div>