# __Create a regression model using AutoML__

## Preface

- Tutorial Difficulty : ★☆☆☆☆
- 5 min read
- Languages : [SQL](https://en.wikipedia.org/wiki/SQL) (100%)
- File location : tutorial_en/thanosql_ml/regression/automl_regression.ipynb
- References : [(Kaggle) Bike Sharing Demand](https://www.kaggle.com/competitions/bike-sharing-demand/overview)

## Tutorial Introduction

<div class="admonition note">
    <h4 class="admonition-title">Understanding Regression</h4>
    <p>A regression is a form of <a href="https://en.wikipedia.org/wiki/Machine_learning">machine learning (ML)</a> that is used to predict numbers with sequential target values. For example, the model can be used to predict tomorrow's temperature or predict housing prices in a particular area.</p>
</div>

When a company spends a certain amount on advertising, sales performance data from similar past cases can be used to predict advertising performance. All <a href="https://en.wikipedia.org/wiki/Feature_(machine_learning)">Features</a> that can be converted into data, such as the features of the product to be advertised, product selling period, information about the surrounding market, sales volume information of competitors, definition of target customer group, and market trend of the industry group, can be used as input data. By changing the adjustable information in the input data, you can predict optimal sales performance and adjust the advertising cost according to the forecast performance. You can use these regression models to improve ad performance and continuously increase sales.

__The following are examples and applications of the ThanoSQL regression model.__

 - Stock price prediction using stock market price, closing price, high price, low price, related stocks, KOSPI index, related news, etc. (finance)
 - Prediction of failure probability and lifespan of equipments using sensor data such as temperature, vibration, and sound (manufacturing)
 - Prediction of solar energy generation using weather, temperature, cloudiness, insolation, etc. (energy)
 - Forecast using demand trends, oil price, and exchange rate fluctuations (raw materials) <br>

<div class="admonition note">
    <h4 class="admonition-title">In this tutorial</h4>
    <p>👉 Create a bike demand regression model using the <mark style="background-color:#FFD79C"><strong>Bike Sharing Demand</strong></mark> dataset for beginners from <a href="https://www.kaggle.com/">Kaggle</a>, a machine learning contest platform. The goals of this contest are as follows (The data for this competition is based on information such as date and time, temperature, humidity, and wind speed from 2011 to 2012.)</p>
</div>

__Predicting the number of bike rentals per hour on a specific date__

ThanoSQL provides automated machine learning (__Auto-ML__) tools. This tutorial uses Auto-ML to predict the number of bike rentals. ThanoSQL's Auto-ML automates the process for model development and enables data collection and storage along with machine learning model development and distribution (end-to-end machine learning pipelines) using a single language.

__The advantages of using ThanoSQL's automated machine learning are:__

1. Implementation and deployment of machine learning solutions without extensive programming or data science knowledge
2. Saving time and resources for deployment of development models
3. Quickly solve problems using the data you have for decision-making

Now, let's use ThanoSQL to create a simple regression model to predict the number of bike rentals on a certain data.

## __0. Prepare Dataset__

To run ThanoSQL queries, you must create an API token and run the code below, as mentioned in the [ThanoSQL Workspace](https://docs.thanosql.ai/en/getting_started/how_to_use_ThanoSQL/#5-thanosql-workspace).

In [None]:
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>

### __Prepare Dataset__

In [None]:
%%thanosql
GET THANOSQL DATASET bike_sharing_data
OPTIONS (overwrite=True)

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>GET THANOSQL DATASET</strong>" downloads the specified dataset to the workspace. </li>
        <li>"<strong>OPTIONS</strong>" specifies the option values to be used for the <strong>GET THANOSQL DATASET</strong> clause.
        <ul>
            <li>"overwrite" : Determines whether to overwrite a dataset if it already exists. If set as True, the old dataset is replaced with the new dataset (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

In [None]:
%%thanosql
COPY bike_sharing_train 
OPTIONS (overwrite=True)
FROM "thanosql-dataset/bike_sharing_data/bike_sharing_train.csv"

In [None]:
%%thanosql
COPY bike_sharing_test 
OPTIONS (overwrite=True)
FROM "thanosql-dataset/bike_sharing_data/bike_sharing_test.csv"

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>COPY</strong>" specifies the name of the dataset to be saved as a database table. </li>
        <li>"<strong>OPTIONS</strong>" specifies the option values to be used for the <strong>COPY</strong> clause.
        <ul>
           <li>"overwrite" : Determines whether to overwrite a table if it already exists. If set as True, the old table is replaced with the new table (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

## __1. Check Dataset__

For this tutorial, we use the <mark style="background-color:#FFEC92 ">bike_sharing_train</mark> table stored in ThanoSQL database. Run the query below to check the contents of the table.

In [None]:
%%thanosql
SELECT * 
FROM bike_sharing_train 
LIMIT 5

<div class="admonition note">
    <h4 class="admonition-title">Understanding the Data</h4>
    <p>The <mark style="background-color:#FFEC92 "><strong>bike_sharing_train</strong></mark> dataset contains information of the number of bicycle rented for an hour based on information such as date and time, temperature, humidity, and wind speed from January 2011 to December 2012.</p>
    <ul>
        <li><mark style="background-color:#D7D0FF ">datetime</mark> : Date by hour</li>
        <li><mark style="background-color:#D7D0FF ">season</mark> : Seasons (1=spring, 2=summer, 3=fall, 4=winter)</li>
        <li><mark style="background-color:#D7D0FF ">holiday</mark> : Holidays (0 = non-holiday, 1 = national holidays, etc.)</li>
        <li><mark style="background-color:#D7D0FF ">workingday</mark> : Workday (0 = weekends and holidays; 1 = weekends and non-holiday weekdays)</li>
        <li><mark style="background-color:#D7D0FF ">weather</mark> : Weather</li>
        <li><mark style="background-color:#D7D0FF ">temp</mark> : Temperature</li>
        <li><mark style="background-color:#D7D0FF ">atemp</mark> : Sensory temperature</li>
        <li><mark style="background-color:#D7D0FF ">humidity</mark> : Relative humidity</li>
        <li><mark style="background-color:#D7D0FF ">windspeed</mark> : Wind speed</li>
        <li><mark style="background-color:#D7D0FF ">count</mark> : Number of rentals</li>
    </ul>
</div>

## __2. Create a regression model__

Create a bike demand regression model using the <mark style="background-color:#FFEC92 "><strong>bike_sharing_train</strong></mark> dataset from the previous step. Run the query below to create a model named <mark style="background-color:#E9D7FD ">bike_regression</mark>.  
(Estimated time required for query execution: 8 min)

In [None]:
%%thanosql
BUILD MODEL bike_regression
USING AutomlRegressor
OPTIONS (
    target='count', 
    impute_type='simple', 
    datetime_attribs=['datetime'],
    time_left_for_this_task=300,
    overwrite=True
    ) 
AS
SELECT *
FROM bike_sharing_train

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Create and train a model named <mark style="background-color:#E9D7FD ">bike_regression</mark> using the "<strong>BUILD MODEL</strong>" query. </li>
        <li>"<strong>OPTIONS</strong>" specifies the options to use for the model creation.
        <ul>
            <li>"target" : The name of the column containing the target value of the classification model </li>
            <li>"impute_type" : Determines how empty values ​​(NaNs) are handled ('simple'|'iterative' , DEFAULT: 'simple') </li>
            <li>"datetime_attribs" : List of column names containing datetime data</li>
            <li>"time_left_for_this_task" : The total time given to find a suitable classification model (DEFAULT: 300)</li>
            <li>"overwrite" : Overwrite if a model with the same name exists. If True, the existing model is overwritten with the new model (True|False, DEFAULT : False) </li>
        </ul>
        </li>
    </ul>
</div>

## __3. Evaluate the Generated Model__

Execute the query below to evaluate the performance of the model created in the previous step.

In [None]:
%%thanosql
EVALUATE USING bike_regression 
OPTIONS (
    target='count'
    ) 
AS
SELECT *
FROM bike_sharing_train

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Evaluate the <mark style="background-color:#E9D7FD ">bike_regression</mark> model using the "<strong>EVALUATE USING</strong>" query. </li>
        <li>"<strong>OPTIONS</strong>" specifies the options to use for the model creation.</li>
        <ul>
            <li>"target" : The name of the column containing the target value of the classification model. </li>
        </ul>
        </li>
    </ul>
</div>

<div class="admonition warning">
    <h4 class="admonition-title">Dataset for evaluation</h4>
    <p>Normally, train datasets should not be used for evaluation. However, for this tutorial, the train datasets are used for convenience.</p>
</div>

## __4. Predict bike rental quantity using generated model__

With the model created in the previous step, try predicting the number of bike rentals using 10 data points from <mark style="background-color:#FFEC92 ">bike_sharing_test</mark>.

In [None]:
%%thanosql
PREDICT USING bike_regression 
AS
SELECT *
FROM bike_sharing_test
LIMIT 10

<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>Use the <mark style="background-color:#E9D7FD ">bike_regression</mark> model for prediction using the "<strong>PREDICT USING</strong>" query. </li>
        <li>For the "<strong>PREDICT</strong>" clause, no special options are required as it follows the generated model's procedures.</li>
    </ul>
</div>

## __5. In Conclusion__

In this tutorial, we created a bicycle demand regression model using the <mark style="background-color:#FFD79C">Bike Sharing Demand</mark> dataset from [Kaggle](https://www.kaggle.com). As this is a beginner-level tutorial, we focused on the development process rather than focusing on accuracy. If you'd like to learn more about building advanced classification models, going over the intermediate tutorial is recommended.

In the next [Creating an Intermediate Classification Model] tutorial, we'll dive deeper into the "__OPTIONS__" clause to improve accuracy. After completing intermediate and advanced levels try creating a regression model for your own service/product. For the intermediate tutorial, we will create sophisticated regression models using the various "__OPTIONS__" provided by ThanoSQL's AutoML. At the advanced level, you can vectorize unstructured data and include it as a train element in AutoML to create a regression model.

- [How to Upload to ThanoSQL DB](https://docs.thanosql.ai/en/how-to_guides/ThanoSQL_connecting/data_upload/)
- [Create an intermediate image classification model]
- [Image conversion and creating My model using Auto-ML]
- [Deploying My image classification model](https://docs.thanosql.ai/en/how-to_guides/ThanoSQL_connecting/thanosql_api/rest_api_thanosql_query/)

<div class="admonition tip">
    <h4 class="admonition-title">Inquiries about deploying a model for your own service</h4>
    <p>If you have any difficulties in creating your own model using ThanoSQL or applying it to your service, please feel free to contact us below😊</p>
    <p>For inquiries regarding building a regression model: contact@smartmind.team</p>
</div>