# __Create a classification model using AutoML__

## Preface

- Tutorial difficulty : ★☆☆☆☆
- 4 min read
- Languages : [SQL](https://en.wikipedia.org/wiki/SQL) (100%)
- File location : tutorial_en/thanosql_ml/classification/automl_classification.ipynb
- References : [(Kaggle) Titanic - Machine Learning from Disaster](https://www.kaggle.com/competitions/titanic/overview)

## Tutorial Introduction

<div class="admonition note">
    <h4 class="admonition-title">Understanding Classification</h4>
    <p>Classification is a form of <a href="https://en.wikipedia.org/wiki/Machine_learning">Machine Learning</a> used to predict categories (Category or Class) to which the target belongs to. For example, both binary classifications (used for classifying men or women) and multiple classifications (used to predict animal species such as dogs, cats, rabbits, etc.) are included in the classification tasks. <br></p>
</div>

To predict whether or not a potential customer will react positively to a particular marketing promotion in your company, you can use your customer's [Customer Relationship Management (CRM)](https://en.wikipedia.org/wiki/Customer_relationship_management) data (demographic information, customer behavior/search data, etc.). In this case, the <a href="https://en.wikipedia.org/wiki/Feature_(machine_learning)">features</a> expressed in the CRM data is used as the input data, and the target value, which is the value to be predicted, is whether the target customer's response to the promotion is positive (1 or True) or negative (0 or False). By using this classification model, you can predict the reaction of customers who have not been exposed to advertisements and target the appropriate customers, thereby continuously increasing marketing efficiency.

__The following are examples and applications of the ThanoSQL classification model.__

- The classification model enables early detection of current user deviations and allows proactive response to problems (deviations). Collected data can help you identify the features of leaving customers and allows you to take appropriate action by discovering leaving customers in advance. This can help prevent customer defections and increase sales.

- You can predict the [Market Segmentation](https://en.wikipedia.org/wiki/Market_segmentation) involving your online platform. Most service users have different characteristics, behaviors, and needs. Classification models utilize the users' features to identify granular groups and enable them to develop strategies tailored to them.  

<div class="admonition note">
    <h4 class="admonition-title">In this tutorial</h4>
    <p>👉 Create a predictive classification model for survivors using the <mark style="background-color:#FFD79C"> <strong>Titanic: Machine Learning from Disaster</strong></mark> dataset for beginners from the machine learning contest platform <a href="https://www.kaggle.com/">Kaggle</a>. The goals of this competition are as follows:
    (For reference, the data for the event is a list of real passengers who were on board during the Titanic incident on April 15, 1912.)</p>
</div>

__Predicting Passengers Who Would Survive The Titanic Incident__

ThanoSQL provides automated machine learning (__Auto-ML__) tools. This tutorial uses Auto-ML to predict passengers who would survive the Titanic incident. ThanoSQL's Auto-ML automates the process for model development and enables data collection and storage along with machine learning model development and distribution (end-to-end machine learning pipelines) using a single language.

__Automated ML has the following advantages:__

1. Implementation and deployment of machine learning solutions without extensive programming or data science knowledge
2. Saving time and resources for deployment of development models
3. Quickly solve problems using the data you have for decision-making

Now let's use ThanoSQL to create a classification model that predicts passengers who would survive the Titanic incident.

## __0. Prepare Dataset__

To run ThanoSQL queries, you must create an API token and run the code below, as mentioned in the [ThanoSQL Workspace](https://docs.thanosql.ai/en/getting_started/how_to_use_ThanoSQL/#5-thanosql-workspace).

In [None]:
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>

### __Prepare Dataset__

In [None]:
%%thanosql
GET THANOSQL DATASET titanic_data
OPTIONS (overwrite=True)

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>GET THANOSQL DATASET</strong>" Use this query statement to save the desired dataset to your workspace environment. </li>
        <li>"<strong>OPTIONS</strong>" Use this statement to specify the option to use for the <strong>GET THANOSQL DATASET</strong> query statement.
        <ul>
            <li>"overwrite" : Overwrite if a dataset with the same name exists. If set as True, the existing dataset is replaced with the new dataset (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

In [None]:
%%thanosql
COPY titanic_train 
OPTIONS (overwrite=True)
FROM "tutorial_data/titanic_data/titanic_train.csv"

In [None]:
%%thanosql
COPY titanic_test 
OPTIONS (overwrite=True)
FROM "thanosql-dataset/titanic_data/titanic_test.csv"

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Use the "<strong>COPY</strong>" clause to specify the dataset to be copied into the DB. </li>
        <li>"<strong>OPTIONS</strong>" specifies the options to use for <strong>COPY</strong> clause.
        <ul>
            <li>"overwrite" : Overwrite if a dataset with the same name exists. If True, the existing dataset is overwritten with the new dataset. (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

## __1. Check Dataset__

To create the survivor classification model, we use the <mark style="background-color:#FFEC92 "><strong>titanic_train</strong></mark> table located in the ThanoSQL DB. Run the query below to check the contents of the table.

In [None]:
%%thanosql
SELECT * 
FROM titanic_train
LIMIT 5 

<div class="admonition note">
    <h4 class="admonition-title">Understanding the Data</h4>
    <p>The <mark style="background-color:#FFEC92 "><strong>tianic_train</strong></mark> dataset contains the following columns.</p>
    <ul>
        <li><mark style="background-color:#D7D0FF">passengerid</mark> : Passenger ID</li>
        <li><mark style="background-color:#D7D0FF">survived</mark> : Whether the passenger on board survived</li>
        <li><mark style="background-color:#D7D0FF">pclass</mark> : Passenger ticket class</li>
        <li><mark style="background-color:#D7D0FF">name</mark> : Passenger name</li>
        <li><mark style="background-color:#D7D0FF">sex</mark> : Passenger gender</li>
        <li><mark style="background-color:#D7D0FF">age</mark> : Passenger age</li>
        <li><mark style="background-color:#D7D0FF">sibsp</mark> : Number of siblings or spouses on board</li>
        <li><mark style="background-color:#D7D0FF">parch</mark> : Number of parents or children on board</li>
        <li><mark style="background-color:#D7D0FF">ticket</mark> : Ticket number</li>
        <li><mark style="background-color:#D7D0FF">fare</mark> : Fare</li>
        <li><mark style="background-color:#D7D0FF">cabin</mark> : Cabin number</li>
        <li><mark style="background-color:#D7D0FF">embarked</mark> : Boarding location or port</li>
    </ul>
</div>

In this tutorial, we will exclude the <mark style="background-color:#D7D0FF">name</mark>, <mark style="background-color:#D7D0FF">ticket</mark>, and <mark style="background-color:#D7D0FF">cabin</mark> columns since they require additional data preprocessing.

## __2. Create a classification model__

Create a survivor classification model using the <mark style="background-color:#FFEC92 ">titanic_train</mark> data. Execute the query below to create a model named <mark style="background-color:#E9D7FD ">titanic_automl_classification</mark>.  
(Estimated duration of query execution: 8 min)

In [None]:
%%thanosql
BUILD MODEL titanic_automl_classification
USING AutomlClassifier 
OPTIONS (
    target='survived', 
    impute_type='iterative',  
    features_to_drop=['name', 'ticket', 'passengerid', 'cabin'],
    time_left_for_this_task=300,
    overwrite=True
    ) 
AS 
SELECT * 
FROM titanic_train

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Create and train a model named <mark style="background-color:#E9D7FD ">titanic_automl_classification</mark> using the "<strong>BUILD MODEL</strong>" query.</li>
        <li>"<strong>OPTIONS</strong>" specifies the options to use for the model creation.
        <ul>
            <li>"target" : The name of the column containing the target value of the classification model </li>
            <li>"impute_type" : Determines how empty values ​​(NaNs) are handled ('simple'|'iterative' , DEFAULT: 'simple') </li>
            <li>"features_to_drop" : Selects columns that cannot be used for training </li>
            <li>"time_left_for_this_task" : The total time given to find a suitable classification model (DEFAULT: 300)</li>
            <li>"overwrite" : Overwrite if a model with the same name exists. If True, the existing model is overwritten with the new model (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

## __3. Evaluate the model__

Execute the query below to evaluate the performance of the model created in the previous step.

In [None]:
%%thanosql 
EVALUATE USING titanic_automl_classification 
OPTIONS (
    target = 'survived'
    )
AS
SELECT *
FROM titanic_train

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Evaluate the <mark style="background-color:#E9D7FD ">titanic_automl_classification</mark> model using the "<strong>EVALUATE USING</strong>" query. </li>
        <li>"<strong>OPTIONS</strong>" specifies the options to use for the model creation.</li>
        <ul>
            <li>"target" : The name of the column containing the target value of the classification model. </li>
        </ul>
        </li>
    </ul>
</div>

<div class="admonition warning">
    <h4 class="admonition-title">Dataset for evaluation</h4>
    <p>Normally, train datasets should not be used for evaluation. However, for this tutorial, the train datasets are used for convenience.</p>
</div>

## __4. Predict survivors using the generated model__

With the model created in the previous step, try predicting survival according to the passenger's information using the <mark style="background-color:#FFEC92 "><strong>titanic_test</strong></mark> dataset (since it has not been used during train).

In [None]:
%%thanosql 
PREDICT USING titanic_automl_classification
AS 
SELECT * 
FROM titanic_test

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <p>Use the <mark style="background-color:#E9D7FD ">titanic_automl_classification</mark> model for prediction using the "<strong>PREDICT USING</strong>" query. For the "<strong>PREDICT</strong>" clause, no special options are required as it follows the generated model's procedures.</p>
</div>

## __5. In Conclusion__

In this tutorial, we created a Titanic survivor classification model using the <mark style="background-color:#FFD79C"><strong>Titanic: Machine Learning from Disaster</strong></mark> data from [Kaggle](https://www.kaggle.com/). As this is a beginner-level tutorial, we focused on the development process rather than focusing on accuracy. If you'd like to learn more about building advanced classification models, going over the intermediate tutorial is recommended.

In the next [Creating an Intermediate Classification Model] tutorial, we'll dive deeper into the "__OPTIONS__" clause to improve accuracy. After completing intermediate and advanced levels try creating a classification model for your own service/product. For the intermediate tutorial, we will create sophisticated classification models using the various "__OPTIONS__" provided by ThanoSQL's AutoML. At the advanced level, you can vectorize unstructured data and include it as a train element in AutoML to create a classification model.

- [How to Upload to ThanoSQL DB](https://docs.thanosql.ai/en/how-to_guides/ThanoSQL_connecting/data_upload/)
- [Creating an Intermediate Image Classification Model]
- [Image conversion and creating My model using Auto-ML]
- [Deploy My Image Classification model](https://docs.thanosql.ai/en/how-to_guides/ThanoSQL_connecting/thanosql_api/rest_api_thanosql_query/)

<div class="admonition tip">
    <h4 class="admonition-title">Inquiries about deploying a model for your own service</h4>
    <p>If you have any difficulties in creating your own model using ThanoSQL or applying it to your service, please feel free to contact us below😊</p>
    <p>For inquiries regarding building a classification model: contact@smartmind.team</p>
</div>