# __Create a regression model using AutoML__

## Preface

- Tutorial Difficulty : ★☆☆☆☆
- 5 min read
- Languages : [SQL](https://en.wikipedia.org/wiki/SQL) (100%)
- File location : tutorial_en/thanosql_ml/regression/automl_regression.ipynb
- References : [(Kaggle) Bike Sharing Demand](https://www.kaggle.com/competitions/bike-sharing-demand/overview)

## Tutorial Introduction

<div class="admonition note">
    <h4 class="admonition-title">Understanding Regression Operations</h4>
    <p>A regression operation is a form of <a href="https://en.wikipedia.org/wiki/Machine_learning">machine learning (ML)</a> that is used to predict numbers whose target has continuity. For example, given weather data, it can be used to predict tomorrow's temperature, or to predict housing prices in a particular area.</p>
</div>

When a company spends a certain amount on advertising, it can use sales performance data from similar past cases to predict the advertising performance. All <a href="https://en.wikipedia.org/wiki/Feature_(machine_learning)">Features</a> and  data that can be converted into data, such as the characteristics of the product to be advertised, when the product is sold, information about the surrounding market, sales volume information of competitors, definition of target customer group, and market trend of the industry group, will be input data. can. By changing the controllable information in the input data, you can predict the optimal sales performance and adjust the advertising cost according to the forecast performance. You can use these regression models to improve ad performance and continuously increase sales.

__The following is an example and usage of the ThanoSQL regression model.__

 - Stock price prediction using stock market price, closing price, high price, low price, related stock price, comprehensive stock index, related news, etc. (finance)
 - Prediction of failure probability and remaining life using sensor data such as temperature, vibration, and sound of equipment/equipment (manufacturing)
 - Prediction of solar energy generation using weather, temperature, cloudiness, insolation, etc. (energy)
 - Demand forecast using historical demand trend, oil price and exchange rate fluctuations (raw materials) <br>

<div class="admonition note">
    <h4 class="admonition-title">In this tutorial</h4>
    <p>👉 Create a bike demand prediction regression model using the <mark style="background-color:#FFD79C"><strong>Bike Sharing Demand</strong></mark> dataset for beginners to <a href="https://www.kaggle.com/">kaggle</a>, a leading machine learning contest platform. The goals of this competition are as follows (For your reference, the data for this competition is based on information such as date and time, temperature, humidity, and wind speed from 2011 to 2012.)</p>
</div>

__Predicting the number of bike rentals per hour on a specific date__

ThanoSQL provides automated machine learning (__Auto-ML__) as a tool. In this tutorial, we use Auto-ML to predict the number of bike rentals. Auto-ML provided by ThanoSQL automates the process for model development, collects and stores data, and develops and deploys machine learning models (end-to-end machine learning pipeline) with only one language (__ThanoSQL__).

__The advantages of using ThanoSQL's automated machine learning are:__

1. Implementation and deployment of machine learning solutions without extensive programming or data science knowledge
2. Saving time and resources for deployment of development models
3. It is possible to quickly solve problems using the data you have for decision-making

Now, let's use ThanoSQL to create a simple regression model to predict the number of bike rentals.

## __0. Prepare Dataset__

To use the query syntax of ThanoSQL, you must create an API token and run the query below, as mentioned in the [ThanoSQL Workspace](https://docs.thanosql.ai/en/getting_started/how_to_use_ThanoSQL/#5-thanosql-workspace).

In [None]:
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>

### __Prepare Dataset__

In [None]:
%%thanosql
GET THANOSQL DATASET bike_sharing_data
OPTIONS (overwrite=True)

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>GET THANOSQL DATASET</strong>" Use the query syntax to save the desired dataset to the workspace. </li>
        <li>"<strong>OPTIONS</strong>" Specifies the option to use for <strong>GET THANOSQL DATASET</strong> via query syntax.
        <ul>
            <li>"overwrite" : Set whether to overwrite if a dataset with the same name exists. If True, the old dataset is replaced with the new dataset (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

In [None]:
%%thanosql
COPY bike_sharing_train 
OPTIONS (overwrite=True)
FROM "thanosql-dataset/bike_sharing_data/bike_sharing_train.csv"

In [None]:
%%thanosql
COPY bike_sharing_test 
OPTIONS (overwrite=True)
FROM "thanosql-dataset/bike_sharing_data/bike_sharing_test.csv"

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>COPY</strong>" Use the query syntax to specify the name of the dataset to be saved in the DB. </li>
        <li>Specifies the options to use for <strong>COPY</strong> via the query syntax "<strong>OPTIONS</strong>" .
        <ul>
            <li>"overwrite" : Set whether overwrite is possible if a dataset with the same name exists on the DB. If True, the old dataset is replaced with the new dataset (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

## __1. Check Dataset__

For this tutorial, we use the <mark style="background-color:#FFEC92 ">bike_sharing_train</mark> table stored in ThanoSQL DB. Execute the query statement below to check the table contents.

In [None]:
%%thanosql
SELECT * 
FROM bike_sharing_train 
LIMIT 5

<div class="admonition note">
    <h4 class="admonition-title">Understanding Data</h4>
    <p>The <mark style="background-color:#FFEC92 "><strong>bike_sharing_train</strong></mark> dataset contains information on the number of bicycle rentals for an hour based on information such as date and time, temperature, humidity, and wind speed from January 2011 to December 2012.</p>
    <ul>
        <li><mark style="background-color:#D7D0FF ">datetime</mark> : Date by hour</li>
        <li><mark style="background-color:#D7D0FF ">season</mark> : Seasons (1=spring, 2=summer, 3=fall, 4=winter)</li>
        <li><mark style="background-color:#D7D0FF ">holiday</mark> : Holidays (0 = non-holiday, 1 = national holidays, etc.)</li>
        <li><mark style="background-color:#D7D0FF ">workingday</mark> : Workday (0 = weekends and holidays; 1 = weekends and non-holiday weekdays)</li>
        <li><mark style="background-color:#D7D0FF ">weather</mark> : Weather</li>
        <li><mark style="background-color:#D7D0FF ">temp</mark> : Temperature</li>
        <li><mark style="background-color:#D7D0FF ">atemp</mark> : Sensory temperature</li>
        <li><mark style="background-color:#D7D0FF ">humidity</mark> : Relative humidity</li>
        <li><mark style="background-color:#D7D0FF ">windspeed</mark> : Wind speed</li>
        <li><mark style="background-color:#D7D0FF ">count</mark> : number of rentals</li>
    </ul>
</div>

## __2. Create a regression model__

Create a bike demand prediction regression model using the <mark style="background-color:#FFEC92 "><strong>bike_sharing_train</strong></mark> dataset from the previous step. Run the query syntax below to create a model named <mark style="background-color:#E9D7FD ">bike_regression</mark>.  
(Estimated time required for query execution: 8 min)

In [None]:
%%thanosql
BUILD MODEL bike_regression
USING AutomlRegressor
OPTIONS (
    target='count', 
    impute_type='simple', 
    datetime_attribs=['datetime'],
    time_left_for_this_task=300,
    overwrite=True
    ) 
AS
SELECT *
FROM bike_sharing_train

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Create and train a model named <mark style="background-color:#E9D7FD ">bike_regression</mark> using the query syntax "<strong>BUILD MODEL</strong>". </li>
        <li>"<strong>OPTIONS</strong>" Specifies the options to use for model creation via the query syntax.
        <ul>
            <li>"target" : The name of the column containing the target value of the regression prediction model.</li>
            <li>"impute_type" : set how empty values ​​(NaNs) in data tables are handled ('simple'|'iterative', DEFAULT: 'simple') </li>
            <li>"datetime_attribs" : List of column names containing date format data</li>
            <li>"time_left_for_this_task" : Time taken to find a suitable regression prediction model (DEFAULT: 300)</li>
            <li>"overwrite" : Set whether overwriting is possible if a model with the same name exists. If True, the old model is changed to the new model (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

<div class="admonition warning">
    <h4 class="admonition-title">Warning</h4>
    <p>When creating an Auto-ML regression prediction model, if parameters other than those specified in <a href="https://docs.thanosql.ai/en/how-to_guides/OPTIONS/#2-automlregressor-algorithm">OPTIONS</a> are used, the model can be created, but all set values are ignored.</p>
</div>

## __3. Evaluate the Generated Model__

Run the query statement below to evaluate the performance of the prediction model you created in the previous step.

In [None]:
%%thanosql
EVALUATE USING bike_regression 
OPTIONS (
    target='count'
    ) 
AS
SELECT *
FROM bike_sharing_train

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Evaluate the <mark style="background-color:#E9D7FD ">bike_regression</mark> model built using the query syntax "<strong>EVALUATE USING</strong>". </li>
        <li>Use the query syntax "<strong>OPTIONS</strong>" to specify the options to use for evaluation.
        <ul>
            <li>"target" : The name of the column containing the target value of the regression prediction model.</li>
        </ul>
        </li>
    </ul>
</div>

<div class="admonition warning">
    <h4 class="admonition-title">Dataset for evaluation</h4>
    <p>The evaluation dataset should not be used for training by isolating a part of the training dataset, but the tutorial uses the training data for convenience.</p>
</div>

## __4. Predict bike rental quantity using generated model__

With the demand forecasting model created in the previous step, try to predict the number of bike rentals for 10 pieces of data in the test dataset (data table not used for training, <mark style="background-color:#FFEC92 ">bike_shaing_test</mark>).

In [None]:
%%thanosql
PREDICT USING bike_regression 
AS
SELECT *
FROM bike_sharing_test
LIMIT 10

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Use the <mark style="background-color:#E9D7FD ">bike_regression</mark> model for prediction using the query syntax "<strong>PREDICT USING</strong>". </li>
        <li>In the case of "<strong>PREDICT</strong>", no special option value is required because it follows the procedure of the generated model.</li>
    </ul>
</div>

## __5. In Conclusion__

In this tutorial, we created a bicycle demand prediction regression model using the <mark style="background-color:#FFD79C">Bike Sharing Demand</mark> dataset from [Kaggle](https://www.kaggle.com). As this is a beginner-level tutorial, I proceeded with an explanation focusing on the overall process rather than the process for improving accuracy. If you want to learn more about building advanced regression models, I recommend going through the intermediate tutorial.

In the next [Creating an Intermediate Regression Working Model] tutorial, we'll dive deeper into "__OPTIONS__" for better accuracy. Create a regression prediction model for your own service/product after completing intermediate and advanced levels. In the intermediate level, we will create sophisticated regression prediction models using the various "__OPTIONS__" provided by ThanoSQL's Auto-ML. In addition, after completing the intermediate level, at the advanced level, you can quantify unstructured data and include it as a learning element for Auto-ML to create a regression prediction model.

- [How to Upload to ThanoSQL DB](https://docs.thanosql.ai/en/how-to_guides/ThanoSQL_connecting/data_upload/)
- [Create an intermediate image classification model]
- [Image conversion and creating My model using Auto-ML]
- [Deploying My image classification model](https://docs.thanosql.ai/en/how-to_guides/ThanoSQL_connecting/thanosql_api/rest_api_thanosql_query/)

<div class="admonition tip">
    <h4 class="admonition-title">Inquiries about deploying a model for your own service</h4>
    <p>If you have any difficulties in creating your own model using ThanoSQL or applying it to the service, please feel free to contact us below😊</p>
    <p>For inquiries about building a regression model: contact@smartmind.team</p>
</div>