Skip to content

Commit

Permalink
Add support for Neuralforecast (#1115)
Browse files Browse the repository at this point in the history
Adding support for `neuralforecast`. Fixes #1112.

```sql
DROP TABLE IF EXISTS AirData;

CREATE TABLE AirData (
    unique_id TEXT(30),
    ds TEXT(30),
    y INTEGER);

LOAD CSV 'data/forecasting/air-passengers.csv' INTO AirData;

DROP FUNCTION IF EXISTS Forecast;

CREATE FUNCTION Forecast FROM
(SELECT unique_id, ds, y FROM AirData)
TYPE Forecasting
PREDICT 'y'
HORIZON 12
LIBRARY 'neuralforecast';

SELECT Forecast(12);
```
One quick issue here is that `neuralforecast` needs `horizon` as a
parameter while training, unlike `statsforecast`. Thus, a better way to
call the UDF would be simply `SELECT Forecast();`, which is currently
unsupported. @xzdandy Please let me know your thoughts.

List of stuff yet to be done:

- [x] Incorporate `neuralforecast`
- [x] Fix `HORIZON` redundancy (UPDATE: Being fixed in #1121)
- [x] Reuse model with lower horizon no
- [x] Add support for ~multivariate forecasting~ exogenous variables
- [x] Add tests
- [x] Add docs

---------

Co-authored-by: xzdandy <xzdandy@gmail.com>
  • Loading branch information
americast and xzdandy committed Sep 30, 2023
1 parent f5a7c92 commit e8a181c
Show file tree
Hide file tree
Showing 10 changed files with 664 additions and 201 deletions.
289 changes: 289 additions & 0 deletions data/forecasting/AirPassengersPanel.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,289 @@
ds,unique_id,y,trend,ylagged
1949-01-31,Airline1,112.0,0,112.0
1949-02-28,Airline1,118.0,1,118.0
1949-03-31,Airline1,132.0,2,132.0
1949-04-30,Airline1,129.0,3,129.0
1949-05-31,Airline1,121.0,4,121.0
1949-06-30,Airline1,135.0,5,135.0
1949-07-31,Airline1,148.0,6,148.0
1949-08-31,Airline1,148.0,7,148.0
1949-09-30,Airline1,136.0,8,136.0
1949-10-31,Airline1,119.0,9,119.0
1949-11-30,Airline1,104.0,10,104.0
1949-12-31,Airline1,118.0,11,118.0
1950-01-31,Airline1,115.0,12,112.0
1950-02-28,Airline1,126.0,13,118.0
1950-03-31,Airline1,141.0,14,132.0
1950-04-30,Airline1,135.0,15,129.0
1950-05-31,Airline1,125.0,16,121.0
1950-06-30,Airline1,149.0,17,135.0
1950-07-31,Airline1,170.0,18,148.0
1950-08-31,Airline1,170.0,19,148.0
1950-09-30,Airline1,158.0,20,136.0
1950-10-31,Airline1,133.0,21,119.0
1950-11-30,Airline1,114.0,22,104.0
1950-12-31,Airline1,140.0,23,118.0
1951-01-31,Airline1,145.0,24,115.0
1951-02-28,Airline1,150.0,25,126.0
1951-03-31,Airline1,178.0,26,141.0
1951-04-30,Airline1,163.0,27,135.0
1951-05-31,Airline1,172.0,28,125.0
1951-06-30,Airline1,178.0,29,149.0
1951-07-31,Airline1,199.0,30,170.0
1951-08-31,Airline1,199.0,31,170.0
1951-09-30,Airline1,184.0,32,158.0
1951-10-31,Airline1,162.0,33,133.0
1951-11-30,Airline1,146.0,34,114.0
1951-12-31,Airline1,166.0,35,140.0
1952-01-31,Airline1,171.0,36,145.0
1952-02-29,Airline1,180.0,37,150.0
1952-03-31,Airline1,193.0,38,178.0
1952-04-30,Airline1,181.0,39,163.0
1952-05-31,Airline1,183.0,40,172.0
1952-06-30,Airline1,218.0,41,178.0
1952-07-31,Airline1,230.0,42,199.0
1952-08-31,Airline1,242.0,43,199.0
1952-09-30,Airline1,209.0,44,184.0
1952-10-31,Airline1,191.0,45,162.0
1952-11-30,Airline1,172.0,46,146.0
1952-12-31,Airline1,194.0,47,166.0
1953-01-31,Airline1,196.0,48,171.0
1953-02-28,Airline1,196.0,49,180.0
1953-03-31,Airline1,236.0,50,193.0
1953-04-30,Airline1,235.0,51,181.0
1953-05-31,Airline1,229.0,52,183.0
1953-06-30,Airline1,243.0,53,218.0
1953-07-31,Airline1,264.0,54,230.0
1953-08-31,Airline1,272.0,55,242.0
1953-09-30,Airline1,237.0,56,209.0
1953-10-31,Airline1,211.0,57,191.0
1953-11-30,Airline1,180.0,58,172.0
1953-12-31,Airline1,201.0,59,194.0
1954-01-31,Airline1,204.0,60,196.0
1954-02-28,Airline1,188.0,61,196.0
1954-03-31,Airline1,235.0,62,236.0
1954-04-30,Airline1,227.0,63,235.0
1954-05-31,Airline1,234.0,64,229.0
1954-06-30,Airline1,264.0,65,243.0
1954-07-31,Airline1,302.0,66,264.0
1954-08-31,Airline1,293.0,67,272.0
1954-09-30,Airline1,259.0,68,237.0
1954-10-31,Airline1,229.0,69,211.0
1954-11-30,Airline1,203.0,70,180.0
1954-12-31,Airline1,229.0,71,201.0
1955-01-31,Airline1,242.0,72,204.0
1955-02-28,Airline1,233.0,73,188.0
1955-03-31,Airline1,267.0,74,235.0
1955-04-30,Airline1,269.0,75,227.0
1955-05-31,Airline1,270.0,76,234.0
1955-06-30,Airline1,315.0,77,264.0
1955-07-31,Airline1,364.0,78,302.0
1955-08-31,Airline1,347.0,79,293.0
1955-09-30,Airline1,312.0,80,259.0
1955-10-31,Airline1,274.0,81,229.0
1955-11-30,Airline1,237.0,82,203.0
1955-12-31,Airline1,278.0,83,229.0
1956-01-31,Airline1,284.0,84,242.0
1956-02-29,Airline1,277.0,85,233.0
1956-03-31,Airline1,317.0,86,267.0
1956-04-30,Airline1,313.0,87,269.0
1956-05-31,Airline1,318.0,88,270.0
1956-06-30,Airline1,374.0,89,315.0
1956-07-31,Airline1,413.0,90,364.0
1956-08-31,Airline1,405.0,91,347.0
1956-09-30,Airline1,355.0,92,312.0
1956-10-31,Airline1,306.0,93,274.0
1956-11-30,Airline1,271.0,94,237.0
1956-12-31,Airline1,306.0,95,278.0
1957-01-31,Airline1,315.0,96,284.0
1957-02-28,Airline1,301.0,97,277.0
1957-03-31,Airline1,356.0,98,317.0
1957-04-30,Airline1,348.0,99,313.0
1957-05-31,Airline1,355.0,100,318.0
1957-06-30,Airline1,422.0,101,374.0
1957-07-31,Airline1,465.0,102,413.0
1957-08-31,Airline1,467.0,103,405.0
1957-09-30,Airline1,404.0,104,355.0
1957-10-31,Airline1,347.0,105,306.0
1957-11-30,Airline1,305.0,106,271.0
1957-12-31,Airline1,336.0,107,306.0
1958-01-31,Airline1,340.0,108,315.0
1958-02-28,Airline1,318.0,109,301.0
1958-03-31,Airline1,362.0,110,356.0
1958-04-30,Airline1,348.0,111,348.0
1958-05-31,Airline1,363.0,112,355.0
1958-06-30,Airline1,435.0,113,422.0
1958-07-31,Airline1,491.0,114,465.0
1958-08-31,Airline1,505.0,115,467.0
1958-09-30,Airline1,404.0,116,404.0
1958-10-31,Airline1,359.0,117,347.0
1958-11-30,Airline1,310.0,118,305.0
1958-12-31,Airline1,337.0,119,336.0
1959-01-31,Airline1,360.0,120,340.0
1959-02-28,Airline1,342.0,121,318.0
1959-03-31,Airline1,406.0,122,362.0
1959-04-30,Airline1,396.0,123,348.0
1959-05-31,Airline1,420.0,124,363.0
1959-06-30,Airline1,472.0,125,435.0
1959-07-31,Airline1,548.0,126,491.0
1959-08-31,Airline1,559.0,127,505.0
1959-09-30,Airline1,463.0,128,404.0
1959-10-31,Airline1,407.0,129,359.0
1959-11-30,Airline1,362.0,130,310.0
1959-12-31,Airline1,405.0,131,337.0
1960-01-31,Airline1,417.0,132,360.0
1960-02-29,Airline1,391.0,133,342.0
1960-03-31,Airline1,419.0,134,406.0
1960-04-30,Airline1,461.0,135,396.0
1960-05-31,Airline1,472.0,136,420.0
1960-06-30,Airline1,535.0,137,472.0
1960-07-31,Airline1,622.0,138,548.0
1960-08-31,Airline1,606.0,139,559.0
1960-09-30,Airline1,508.0,140,463.0
1960-10-31,Airline1,461.0,141,407.0
1960-11-30,Airline1,390.0,142,362.0
1960-12-31,Airline1,432.0,143,405.0
1949-01-31,Airline2,412.0,144,412.0
1949-02-28,Airline2,418.0,145,418.0
1949-03-31,Airline2,432.0,146,432.0
1949-04-30,Airline2,429.0,147,429.0
1949-05-31,Airline2,421.0,148,421.0
1949-06-30,Airline2,435.0,149,435.0
1949-07-31,Airline2,448.0,150,448.0
1949-08-31,Airline2,448.0,151,448.0
1949-09-30,Airline2,436.0,152,436.0
1949-10-31,Airline2,419.0,153,419.0
1949-11-30,Airline2,404.0,154,404.0
1949-12-31,Airline2,418.0,155,418.0
1950-01-31,Airline2,415.0,156,412.0
1950-02-28,Airline2,426.0,157,418.0
1950-03-31,Airline2,441.0,158,432.0
1950-04-30,Airline2,435.0,159,429.0
1950-05-31,Airline2,425.0,160,421.0
1950-06-30,Airline2,449.0,161,435.0
1950-07-31,Airline2,470.0,162,448.0
1950-08-31,Airline2,470.0,163,448.0
1950-09-30,Airline2,458.0,164,436.0
1950-10-31,Airline2,433.0,165,419.0
1950-11-30,Airline2,414.0,166,404.0
1950-12-31,Airline2,440.0,167,418.0
1951-01-31,Airline2,445.0,168,415.0
1951-02-28,Airline2,450.0,169,426.0
1951-03-31,Airline2,478.0,170,441.0
1951-04-30,Airline2,463.0,171,435.0
1951-05-31,Airline2,472.0,172,425.0
1951-06-30,Airline2,478.0,173,449.0
1951-07-31,Airline2,499.0,174,470.0
1951-08-31,Airline2,499.0,175,470.0
1951-09-30,Airline2,484.0,176,458.0
1951-10-31,Airline2,462.0,177,433.0
1951-11-30,Airline2,446.0,178,414.0
1951-12-31,Airline2,466.0,179,440.0
1952-01-31,Airline2,471.0,180,445.0
1952-02-29,Airline2,480.0,181,450.0
1952-03-31,Airline2,493.0,182,478.0
1952-04-30,Airline2,481.0,183,463.0
1952-05-31,Airline2,483.0,184,472.0
1952-06-30,Airline2,518.0,185,478.0
1952-07-31,Airline2,530.0,186,499.0
1952-08-31,Airline2,542.0,187,499.0
1952-09-30,Airline2,509.0,188,484.0
1952-10-31,Airline2,491.0,189,462.0
1952-11-30,Airline2,472.0,190,446.0
1952-12-31,Airline2,494.0,191,466.0
1953-01-31,Airline2,496.0,192,471.0
1953-02-28,Airline2,496.0,193,480.0
1953-03-31,Airline2,536.0,194,493.0
1953-04-30,Airline2,535.0,195,481.0
1953-05-31,Airline2,529.0,196,483.0
1953-06-30,Airline2,543.0,197,518.0
1953-07-31,Airline2,564.0,198,530.0
1953-08-31,Airline2,572.0,199,542.0
1953-09-30,Airline2,537.0,200,509.0
1953-10-31,Airline2,511.0,201,491.0
1953-11-30,Airline2,480.0,202,472.0
1953-12-31,Airline2,501.0,203,494.0
1954-01-31,Airline2,504.0,204,496.0
1954-02-28,Airline2,488.0,205,496.0
1954-03-31,Airline2,535.0,206,536.0
1954-04-30,Airline2,527.0,207,535.0
1954-05-31,Airline2,534.0,208,529.0
1954-06-30,Airline2,564.0,209,543.0
1954-07-31,Airline2,602.0,210,564.0
1954-08-31,Airline2,593.0,211,572.0
1954-09-30,Airline2,559.0,212,537.0
1954-10-31,Airline2,529.0,213,511.0
1954-11-30,Airline2,503.0,214,480.0
1954-12-31,Airline2,529.0,215,501.0
1955-01-31,Airline2,542.0,216,504.0
1955-02-28,Airline2,533.0,217,488.0
1955-03-31,Airline2,567.0,218,535.0
1955-04-30,Airline2,569.0,219,527.0
1955-05-31,Airline2,570.0,220,534.0
1955-06-30,Airline2,615.0,221,564.0
1955-07-31,Airline2,664.0,222,602.0
1955-08-31,Airline2,647.0,223,593.0
1955-09-30,Airline2,612.0,224,559.0
1955-10-31,Airline2,574.0,225,529.0
1955-11-30,Airline2,537.0,226,503.0
1955-12-31,Airline2,578.0,227,529.0
1956-01-31,Airline2,584.0,228,542.0
1956-02-29,Airline2,577.0,229,533.0
1956-03-31,Airline2,617.0,230,567.0
1956-04-30,Airline2,613.0,231,569.0
1956-05-31,Airline2,618.0,232,570.0
1956-06-30,Airline2,674.0,233,615.0
1956-07-31,Airline2,713.0,234,664.0
1956-08-31,Airline2,705.0,235,647.0
1956-09-30,Airline2,655.0,236,612.0
1956-10-31,Airline2,606.0,237,574.0
1956-11-30,Airline2,571.0,238,537.0
1956-12-31,Airline2,606.0,239,578.0
1957-01-31,Airline2,615.0,240,584.0
1957-02-28,Airline2,601.0,241,577.0
1957-03-31,Airline2,656.0,242,617.0
1957-04-30,Airline2,648.0,243,613.0
1957-05-31,Airline2,655.0,244,618.0
1957-06-30,Airline2,722.0,245,674.0
1957-07-31,Airline2,765.0,246,713.0
1957-08-31,Airline2,767.0,247,705.0
1957-09-30,Airline2,704.0,248,655.0
1957-10-31,Airline2,647.0,249,606.0
1957-11-30,Airline2,605.0,250,571.0
1957-12-31,Airline2,636.0,251,606.0
1958-01-31,Airline2,640.0,252,615.0
1958-02-28,Airline2,618.0,253,601.0
1958-03-31,Airline2,662.0,254,656.0
1958-04-30,Airline2,648.0,255,648.0
1958-05-31,Airline2,663.0,256,655.0
1958-06-30,Airline2,735.0,257,722.0
1958-07-31,Airline2,791.0,258,765.0
1958-08-31,Airline2,805.0,259,767.0
1958-09-30,Airline2,704.0,260,704.0
1958-10-31,Airline2,659.0,261,647.0
1958-11-30,Airline2,610.0,262,605.0
1958-12-31,Airline2,637.0,263,636.0
1959-01-31,Airline2,660.0,264,640.0
1959-02-28,Airline2,642.0,265,618.0
1959-03-31,Airline2,706.0,266,662.0
1959-04-30,Airline2,696.0,267,648.0
1959-05-31,Airline2,720.0,268,663.0
1959-06-30,Airline2,772.0,269,735.0
1959-07-31,Airline2,848.0,270,791.0
1959-08-31,Airline2,859.0,271,805.0
1959-09-30,Airline2,763.0,272,704.0
1959-10-31,Airline2,707.0,273,659.0
1959-11-30,Airline2,662.0,274,610.0
1959-12-31,Airline2,705.0,275,637.0
1960-01-31,Airline2,717.0,276,660.0
1960-02-29,Airline2,691.0,277,642.0
1960-03-31,Airline2,719.0,278,706.0
1960-04-30,Airline2,761.0,279,696.0
1960-05-31,Airline2,772.0,280,720.0
1960-06-30,Airline2,835.0,281,772.0
1960-07-31,Airline2,922.0,282,848.0
1960-08-31,Airline2,906.0,283,859.0
1960-09-30,Airline2,808.0,284,763.0
1960-10-31,Airline2,761.0,285,707.0
1960-11-30,Airline2,690.0,286,662.0
1960-12-31,Airline2,732.0,287,705.0
41 changes: 31 additions & 10 deletions docs/source/reference/ai/model-forecasting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,16 +47,24 @@ EvaDB's default forecast framework is `statsforecast <https://nixtla.github.io/s
.. list-table:: Available Parameters
:widths: 25 75

* - PREDICT (**required**)
* - PREDICT (str, required)
- The name of the column we wish to forecast.
* - TIME
- The name of the column that contains the datestamp, wihch should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. Please visit the `pandas documentation <https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html>`_ for details. If not provided, an auto increasing ID column will be used.
* - ID
- The name of column that represents an identifier for the series. If not provided, the whole table is considered as one series of data.
* - MODEL
- We can select one of AutoARIMA, AutoCES, AutoETS, AutoTheta. The default is AutoARIMA. Check `Automatic Forecasting <https://nixtla.github.io/statsforecast/src/core/models_intro.html#automatic-forecasting>`_ to learn details about these models.
* - Frequency
- A string indicating the frequency of the data. The common used ones are D, W, M, Y, which repestively represents day-, week-, month- and year- end frequency. The default value is M. Check `pandas available frequencies <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ for all available frequencies.
* - HORIZON (int, required)
- The number of steps into the future we wish to forecast.
* - TIME (str, default: 'ds')
- The name of the column that contains the datestamp, which should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. Please visit the `pandas documentation <https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html>`_ for details. If relevant column is not found, an auto increasing ID column will be used.
* - ID (str, default: 'unique_id')
- The name of column that represents an identifier for the series. If relevant column is not found, the whole table is considered as one series of data.
* - LIBRARY (str, default: 'statsforecast')
- We can select one of `statsforecast` (default) or `neuralforecast`. `statsforecast` provides access to statistical forecasting methods, while `neuralforecast` gives access to deep-learning based forecasting methods.
* - MODEL (str, default: 'ARIMA')
- If LIBRARY is `statsforecast`, we can select one of ARIMA, CES, ETS, Theta. The default is ARIMA. Check `Automatic Forecasting <https://nixtla.github.io/statsforecast/src/core/models_intro.html#automatic-forecasting>`_ to learn details about these models. If LIBRARY is `neuralforecast`, we can select one of NHITS or NBEATS. The default is NBEATS. Check `NBEATS docs <https://nixtla.github.io/neuralforecast/models.nbeats.html>`_ for details.
* - AUTO (str, default: 'T')
- If set to 'T', it enables automatic hyperparameter optimization. Must be set to 'T' for `statsforecast` library. One may set this parameter to `false` if LIBRARY is `neuralforecast` for faster (but less reliable) results.
* - Frequency (str, default: 'auto')
- A string indicating the frequency of the data. The common used ones are D, W, M, Y, which repestively represents day-, week-, month- and year- end frequency. The default value is M. Check `pandas available frequencies <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ for all available frequencies. If it is not provided, the frequency is attempted to be determined automatically.

Note: If columns other than the ones required as mentioned above are passed while creating the function, they will be treated as exogenous variables if LIBRARY is `neuralforecast`. Otherwise, they would be ignored.

Below is an example query specifying the above parameters:

Expand All @@ -65,8 +73,21 @@ Below is an example query specifying the above parameters:
CREATE FUNCTION IF NOT EXISTS HomeRentalForecast FROM
(SELECT saledate, ma, type FROM HomeData)
TYPE Forecasting
HORIZON 12
PREDICT 'ma'
TIME 'saledate'
ID 'type'
MODEL 'AutoCES'
Frequency 'W';
Below is an example query with `neuralforecast` with `trend` column as exogenous and without automatic hyperparameter optimization:

.. code-block:: sql
CREATE FUNCTION AirPanelForecast FROM
(SELECT unique_id, ds, y, trend FROM AirDataPanel)
TYPE Forecasting
HORIZON 12
PREDICT 'y'
LIBRARY 'neuralforecast'
AUTO 'f'
FREQUENCY 'M';
4 changes: 0 additions & 4 deletions evadb/binder/statement_binder.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,6 @@ def _bind_create_function_statement(self, node: CreateFunctionStatement):
elif column.name == arg_map.get("predict", "y"):
outputs.append(column)
required_columns.remove(column.name)
else:
raise BinderError(
f"Unexpected column {column.name} found for forecasting function."
)
assert (
len(required_columns) == 0
), f"Missing required {required_columns} columns for forecasting function."
Expand Down
Loading

0 comments on commit e8a181c

Please sign in to comment.