georgia-tech-db · xzdandy · Sep 26, 2023 · Sep 23, 2023 · Sep 23, 2023 · Sep 25, 2023
diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -36,6 +36,8 @@ parts:
         title: Emotion Analysis
       - file: source/usecases/homesale-forecast.rst
         title: Home Sale Forecasting
+      - file: source/usecases/homerental-predict.rst
+        title: Home Rental Prediction
       # - file: source/usecases/privategpt.rst
       #   title: PrivateGPT
 
@@ -69,8 +71,10 @@ parts:
       - file: source/reference/ai/index
         title: AI Engines
         sections:
-          - file: source/reference/ai/model-train
-            title: Model Training
+          - file: source/reference/ai/model-train-ludwig
+            title: Model Training with Ludwig
+          - file: source/reference/ai/model-train-sklearn
+            title: Model Training with Sklearn
           - file: source/reference/ai/model-forecasting
             title: Time Series Forecasting
           - file: source/reference/ai/hf

diff --git a/docs/source/overview/model-inference.rst b/docs/source/overview/model-inference.rst
@@ -43,7 +43,7 @@ In EvaDB, we can also use models in joins.
 The most powerful usecase is lateral join combined with ``UNNEST``, which is very helpful to flatten the output from `one-to-many` models.
 The key idea here is a model could give multiple outputs (e.g., bounding box) stored in an array. This syntax is used to unroll elements from the array into multiple rows.
 Typical examples are `face detectors <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/face_detector.py>`_ and `object detectors <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/fastrcnn_object_detector.py>`_. 
-In the below example, we use `emotion detector <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/emotion_detector.py>_` to detect emotions from faces in the movie, where a single scene can contain multiple faces. 
+In the below example, we use `emotion detector <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/emotion_detector.py>`_ to detect emotions from faces in the movie, where a single scene can contain multiple faces. 
 
 .. code-block:: sql
 

diff --git a/docs/source/reference/ai/model-forecasting.rst b/docs/source/reference/ai/model-forecasting.rst
@@ -47,7 +47,7 @@ EvaDB's default forecast framework is `statsforecast <https://nixtla.github.io/s
 .. list-table:: Available Parameters
    :widths: 25 75
 
-   * - PREDICT (required) 
+   * - PREDICT (**required**) 
      - The name of the column we wish to forecast.
    * - TIME
      - The name of the column that contains the datestamp, wihch should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. Please visit the `pandas documentation <https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html>`_ for details. If not provided, an auto increasing ID column will be used.

diff --git a/docs/source/reference/ai/model-train-ludwig.rst b/docs/source/reference/ai/model-train-ludwig.rst
@@ -0,0 +1,65 @@
+.. _ludwig:
+
+Model Training with Ludwig
+==========================
+
+1. Installation
+---------------
+
+To use the `Ludwig framework <https://ludwig.ai/latest/>`_, we need to install the extra ludwig dependency in your EvaDB virtual environment.
+
+.. code-block:: bash
+
+   pip install evadb[ludwig]
+
+2. Example Query
+----------------
+
+.. code-block:: sql
+
+   CREATE OR REPLACE FUNCTION PredictHouseRent FROM
+   ( SELECT sqft, location, rental_price FROM HomeRentals )
+   TYPE Ludwig
+   PREDICT 'rental_price'
+   TIME_LIMIT 120;
+
+In the above query, you are creating a new customized function by automatically training a model from the ``HomeRentals`` table.
+The ``rental_price`` column will be the target column for predication, while ``sqft`` and ``location`` are the inputs. 
+
+You can also simply give all other columns in ``HomeRentals`` as inputs and let the underlying AutoML framework to figure it out. Below is an example query:
+
+.. code-block:: sql
+
+   CREATE FUNCTION IF NOT EXISTS PredictHouseRent FROM
+   ( SELECT * FROM HomeRentals )
+   TYPE Ludwig
+   PREDICT 'rental_price'
+   TIME_LIMIT 120;
+
+.. note::
+
+   Check out our :ref:`homerental-predict` for working example.
+
+3. Model Training Parameters
+----------------------------
+
+.. list-table:: Available Parameters
+   :widths: 25 75
+
+   * - PREDICT (**required**)
+     - The name of the column we wish to predict.
+   * - TIME_LIMIT
+     - Time limit to train the model in seconds. Default: 120.
+   * - TUNE_FOR_MEMORY
+     - Whether to refine hyperopt search space for available host / GPU memory. Default: False.    
+
+Below is an example query specifying the above parameters:
+
+.. code-block:: sql
+
+   CREATE FUNCTION IF NOT EXISTS PredictHouseRent FROM
+   ( SELECT * FROM HomeRentals )
+   TYPE Ludwig
+   PREDICT 'rental_price'
+   TIME_LIMIT 3600
+   TUNE_FOR_MEMORY True;
diff --git a/docs/source/reference/ai/model-train-sklearn.rst b/docs/source/reference/ai/model-train-sklearn.rst
@@ -0,0 +1,26 @@
+.. _sklearn:
+
+Model Training with Sklearn
+============================
+
+1. Installation
+---------------
+
+To use the `Sklearn framework <https://scikit-learn.org/stable/>`_, we need to install the extra sklearn dependency in your EvaDB virtual environment.
+
+.. code-block:: bash
+
+   pip install evadb[sklearn]
+
+2. Example Query
+----------------
+
+.. code-block:: sql
+
+   CREATE OR REPLACE FUNCTION PredictHouseRent FROM
+   ( SELECT number_of_rooms, number_of_bathrooms, days_on_market, rental_price FROM HomeRentals )
+   TYPE Sklearn
+   PREDICT 'rental_price';
+
+In the above query, you are creating a new customized function by training a model from the ``HomeRentals`` table using the ``Sklearn`` framework.
+The ``rental_price`` column will be the target column for predication, while the rest columns from the ``SELET`` query are the inputs. 
diff --git a/docs/source/reference/ai/model-train.rst b/docs/source/reference/ai/model-train.rst
diff --git a/docs/source/reference/evaql/create.rst b/docs/source/reference/evaql/create.rst
@@ -117,7 +117,7 @@ Where the `parameter` is ``key value`` pair.
 
 .. note::
 
-   Go over :ref:`hf`, :ref:`predict`, and :ref:`forecast` to check examples for creating function via type.
+   Go over :ref:`hf`, :ref:`ludwig`, and :ref:`forecast` to check examples for creating function via type.
 
 CREATE MATERIALIZED VIEW
 ------------------------

diff --git a/docs/source/usecases/homerental-predict.rst b/docs/source/usecases/homerental-predict.rst
@@ -0,0 +1,124 @@
+.. _homerental-predict:
+
+Home Rental Prediction
+=======================
+
+.. raw:: html
+
+    <embed>
+    <table align="left">
+    <td>
+        <a target="_blank" href="https://colab.research.google.com/github/georgia-tech-db/eva/blob/staging/tutorials/17-home-rental-prediction.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" /> Run on Google Colab</a>
+    </td>
+    <td>
+        <a target="_blank" href="https://github.com/georgia-tech-db/eva/blob/staging/tutorials/17-home-rental-prediction.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" /> View source on GitHub</a>
+    </td>
+    <td>
+        <a target="_blank" href="https://github.com/georgia-tech-db/eva/raw/staging/tutorials/17-home-rental-prediction.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" /> Download notebook</a>
+    </td>
+    </table><br><br>
+    </embed>
+
+Introduction
+------------
+
+In this tutorial, we present how to use :ref:`Prediction AI Engines<ludwig>` in EvaDB to predict home rental prices. EvaDB makes it easy to do predictions using its built-in AutoML engines with your existing databases.
+
+.. include:: ../shared/evadb.rst
+
+.. include:: ../shared/postgresql.rst
+
+We will assume that the input data is loaded into a ``PostgreSQL`` database. 
+To load the home rental data into your database, see the complete `home rental prediction notebook on Colab <https://colab.research.google.com/github/georgia-tech-db/eva/blob/staging/tutorials/17-home-rental-prediction.ipynb>`_.
+
+Preview the Home Sales Data
+-------------------------------------------
+
+We use the `home rental data <https://www.dropbox.com/scl/fi/gy2682i66a8l2tqsowm5x/home_rentals.csv?rlkey=e080k02rv5205h4ullfjdr8lw&raw=1>`_ in this usecase. The data contains eight columns: ``number_of_rooms``, ``number_of_bathrooms``, ``sqft``, ``location``, ``days_on_market``, ``initial_price``, ``neighborhood``, and ``rental_price``.
+
+.. code-block:: sql
+
+   SELECT * FROM postgres_data.home_rentals LIMIT 3;
+
+This query previews the data in the home_rentals table:
+
+.. code-block:: 
+
+    +------------------------------+----------------------------------+-------------------+-----------------------+-----------------------------+----------------------------+---------------------------+---------------------------+
+    | home_rentals.number_of_rooms | home_rentals.number_of_bathrooms | home_rentals.sqft | home_rentals.location | home_rentals.days_on_market | home_rentals.initial_price | home_rentals.neighborhood | home_rentals.rental_price |
+    |------------------------------|----------------------------------|-------------------|-----------------------|-----------------------------|----------------------------|---------------------------|---------------------------|
+    |                            1 |                                1 |               674 |                  good |                           1 |                       2167 |                  downtown |                      2167 |
+    |                            1 |                                1 |               554 |                  poor |                          19 |                       1883 |                  westbrae |                      1883 |
+    |                            0 |                                1 |               529 |                 great |                           3 |                       2431 |                south_side |                      2431 | 
+    +------------------------------+----------------------------------+-------------------+-----------------------+-----------------------------+----------------------------+---------------------------+---------------------------+
+
+Train a Home Rental Prediction Model
+-------------------------------------
+
+Let's next train a prediction model from the home_rental table using EvaDB's ``CREATE FUNCTION`` query.
+We will use the built-in :ref:`Ludwig<ludwig>` engine for this task.
+
+.. code-block:: sql
+
+  CREATE OR REPLACE FUNCTION PredictHouseRent FROM
+  ( SELECT * FROM postgres_data.home_rental )
+  TYPE Ludwig
+  PREDICT 'rental_price'
+  TIME_LIMIT 3600;
+
+In the above query, we use all the columns (except ``rental_price``) from ``home_rental`` table to predict the ``rental_price`` column.
+We set the training time out to be 3600 seconds.
+
+.. note::
+
+   Go over :ref:`ludwig` page on exploring all configurable paramters for the model training frameworks.
+
+.. code-block:: 
+
+   +----------------------------------------------+
+   | Function PredictHouseRent successfully added |
+   +----------------------------------------------+
+
+Predict the Home Rental Price using the Trained Model
+-----------------------------------------------------
+
+Next we use the trained ``PredictHouseRent`` to predict the home rental price.
+
+.. code-block:: sql
+
+   SELECT PredictHouseRent(*) FROM postgres_data.home_rentals LIMIT 3;
+
+We use ``*`` to simply pass all columns into the ``PredictHouseRent`` function.
+
+.. code-block::
+
+   +-------------------------------------------+
+   | predicthouserent.rental_price_predictions |
+   +-------------------------------------------+
+   |                               2087.763672 |
+   |                               1793.570190 |
+   |                               2346.319824 |
+   +-------------------------------------------+
+
+We have the option to utilize a ``LATERAL JOIN`` to compare the actual rental prices in the ``home_rentals`` dataset with the predicted rental prices generated by the trained model, ``PredictHouseRent``.
+
+.. code-block:: sql
+
+   SELECT rental_price, predicted_rental_price
+   FROM postgres_data.home_rentals
+   JOIN LATERAL PredictHouseRent(*) AS Predicted(predicted_rental_price)
+   LIMIT 3;
+
+Here is the query's output:
+
+.. code-block::
+
+   +---------------------------+----------------------------------+
+   | home_rentals.rental_price | Predicted.predicted_rental_price |
+   +---------------------------+----------------------------------+
+   |                      2167 |                      2087.763672 |
+   |                      1883 |                      1793.570190 |
+   |                      2431 |                      2346.319824 |
+   +------------------ --------+----------------------------------+
+
+.. include:: ../shared/footer.rst
diff --git a/docs/source/usecases/homesale-forecast.rst b/docs/source/usecases/homesale-forecast.rst
@@ -22,7 +22,7 @@ Home Sale Forecasting
 Introduction
 ------------
 
-In this tutorial, we present how to use :ref:`forecasting models<forecast>` in EvaDB to predict home sale price. EvaDB makes it easy to do time series predictions using its built-in Auto Forecast function.
+In this tutorial, we present how to use :ref:`Forecasting AI Engines<forecast>` in EvaDB to predict home sale price. EvaDB makes it easy to do time series predictions using its built-in Auto Forecast function.
 
 .. include:: ../shared/evadb.rst
 
@@ -34,7 +34,7 @@ To load the home sales data into your database, see the complete `home sale fore
 Preview the Home Sales Data
 -------------------------------------------
 
-We use the `raw_sales.csv of the House Property Sales Time Series <https://www.kaggle.com/datasets/htagholdings/property-sales?resource=download>`_ in this usecase. The data contains five columns: postcode, price, bedrooms, datesold, and propertytype.
+We use the `raw_sales.csv of the House Property Sales Time Series <https://www.kaggle.com/datasets/htagholdings/property-sales?resource=download>`_ in this usecase. The data contains five columns: ``postcode``, ``price``, ``bedrooms``, ``datesold``, and ``propertytype``.
 
 .. code-block:: sql
 
@@ -74,7 +74,7 @@ Particularly, we are interested in the price of the properties that have three b
 
 In the ``home_sales`` dataset, we have two different property types, houses and units, and price gap between them are large. 
 We'd like to ask EvaDB to analyze the price of houses and units independently. 
-To do so, we specify the ``propertytype`` column as the ``ID `` of the time series data, which represents an identifier for the series.
+To do so, we specify the ``propertytype`` column as the ``ID`` of the time series data, which represents an identifier for the series.
 Here is the query's output ``DataFrame``:
 
 .. note::

diff --git a/script/test/test.sh b/script/test/test.sh
@@ -88,7 +88,7 @@ long_integration_test() {
 }
 
 notebook_test() {
-  PYTHONPATH=./ python -m pytest --durations=5 --nbmake --overwrite "./tutorials" --capture=sys --tb=short -v --log-level=WARNING --nbmake-timeout=3000 --ignore="tutorials/08-chatgpt.ipynb" --ignore="tutorials/14-food-review-tone-analysis-and-response.ipynb" --ignore="tutorials/15-AI-powered-join.ipynb" --ignore="tutorials/16-homesale-forecasting.ipynb"
+  PYTHONPATH=./ python -m pytest --durations=5 --nbmake --overwrite "./tutorials" --capture=sys --tb=short -v --log-level=WARNING --nbmake-timeout=3000 --ignore="tutorials/08-chatgpt.ipynb" --ignore="tutorials/14-food-review-tone-analysis-and-response.ipynb" --ignore="tutorials/15-AI-powered-join.ipynb" --ignore="tutorials/16-homesale-forecasting.ipynb" --ignore="tutorials/17-home-rental-prediction.ipynb"
   code=$?
   print_error_code $code "NOTEBOOK TEST"
 }