Skip to content

Commit

Permalink
CREATE OR REPLACE FUNCTION (#1146)
Browse files Browse the repository at this point in the history
- [x] Support from parser to executor.
- [x] Fix existing unit tests and short integration tests. 
- [x] Add unit tests for `CREATE OR REPALCE FUNCTION` 
- [x] Update documentation
- [x] Add integration test
- [x] Check no long integration test failed due to this PR

Close #1131
  • Loading branch information
xzdandy committed Sep 18, 2023
1 parent c95219f commit d4fe198
Show file tree
Hide file tree
Showing 17 changed files with 312 additions and 55 deletions.
27 changes: 16 additions & 11 deletions docs/source/overview/model-inference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,25 @@
Model Inference
===============

In EvaDB, every model is a function. We can compose SQL queries using functions as building units similar to conventional SQL functions. EvaDB's `cascades optimizer <https://faculty.cc.gatech.edu/~jarulraj/courses/8803-s21/slides/22-cascades.pdf>` will optimize the evaluation of user-defined functions for lower latency. Go over :ref:`optimizations` for more details.
In EvaDB, every model is a function. We can compose SQL queries using functions as building units similar to conventional SQL functions. EvaDB's `cascades optimizer <https://faculty.cc.gatech.edu/~jarulraj/courses/8803-s21/slides/22-cascades.pdf>`_ will optimize the evaluation of user-defined functions for lower latency. Go over :ref:`optimizations` for more details.

.. note::

EvaDB ships with a variety of builtin user-defined functions. Go over :ref:`models` to check them. Did not find the desired model? Go over :ref:`udf` to create your own user-defined functions and contribute to EvaDB.

1. Projection
-------------

The most common usecases are model inference in projections. For example, we can use the `MnistImageClassifier <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/mnist_image_classifier.py>`_ to identify numbers from the `MINST <https://www.dropbox.com/s/yxljxz6zxoqu54v/mnist.mp4>`_ video.
The most common usecases are model inference in projections. For example, we can use the `MnistImageClassifier <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/mnist_image_classifier.py>`_ to identify numbers from the `MINST <https://www.dropbox.com/s/yxljxz6zxoqu54v/mnist.mp4>`_ video.

.. code-block:: sql
SELECT MnistImageClassifier(data).label FROM minst_vid;
2. Selection
------------

Another common usecases are model inference in selections. In the below example, we use ``TextSummarizer`` and ``TextClassifier`` from :ref:`HuggingFace<hf>` to summarize the negative food reviews.
Another common usecases are model inference in selections. In the below example, we use ``TextSummarizer`` and ``TextClassifier`` from :ref:`HuggingFace<hf>` to summarize the negative food reviews.

.. code-block:: sql
Expand All @@ -35,12 +37,13 @@ EvaDB also provides specialized array operators to construct queries. Go over bu
WHERE ObjectDetector(data).labels @> ['person', 'car'];
3. Lateral Join
---------------

In EvaDB, we can also use models in joins.
The most powerful usecase is lateral join combined with ``UNNEST``, which is very helpful to flatten the output from `one-to-many` models.
The key idea here is a model could give multiple outputs (e.g., bounding box) stored in an array. This syntax is used to unroll elements from the array into multiple rows.
Typical examples are `face detectors <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/face_detector.py>`_ and `object detectors <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/fastrcnn_object_detector.py>`_.
In the below example, we use `emotion detector <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/emotion_detector.py>_` to detect emotions from faces in the movie, where a single scene can contain multiple faces.
In EvaDB, we can also use models in joins.
The most powerful usecase is lateral join combined with ``UNNEST``, which is very helpful to flatten the output from `one-to-many` models.
The key idea here is a model could give multiple outputs (e.g., bounding box) stored in an array. This syntax is used to unroll elements from the array into multiple rows.
Typical examples are `face detectors <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/face_detector.py>`_ and `object detectors <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/fastrcnn_object_detector.py>`_.
In the below example, we use `emotion detector <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/emotion_detector.py>_` to detect emotions from faces in the movie, where a single scene can contain multiple faces.

.. code-block:: sql
Expand All @@ -49,8 +52,9 @@ EvaDB also provides specialized array operators to construct queries. Go over bu
LATERAL JOIN UNNEST(FaceDetector(data)) AS Face(bbox, conf);
4. Aggregate Functions
----------------------

Models can also be executed on a sequence of frames, particularly for action detection. This can be accomplished by utilizing ``GROUP BY`` and ``SEGMENT`` to concatenate consecutive frames into a single segment.
Models can also be executed on a sequence of frames, particularly for action detection. This can be accomplished by utilizing ``GROUP BY`` and ``SEGMENT`` to concatenate consecutive frames into a single segment.

.. code-block:: sql
Expand All @@ -66,9 +70,10 @@ Here is another example grouping paragraphs from PDFs:
SELECT SEGMENT(data) FROM MyPDFs GROUP BY '10 paragraphs';
5. Order By
-----------

Models (typically feature extractors) can also be used in the ``ORDER BY`` for embedding-based similarity search. EvaDB also has index support to facilitate this type of queries.
In the below examples, we use the `SentenceFeatureExtractor <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/sentence_feature_extractor.py>`_ to find relevant context `When was the NATO created` from a collection of pdfs as the knowledge base. Go over `PrivateGPT notebook <https://github.com/georgia-tech-db/evadb/blob/staging/tutorials/13-privategpt.ipynb>`_ for more details.
Models (typically feature extractors) can also be used in the ``ORDER BY`` for embedding-based similarity search. EvaDB also has index support to facilitate this type of queries.
In the below examples, we use the `SentenceFeatureExtractor <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/sentence_feature_extractor.py>`_ to find relevant context `When was the NATO created` from a collection of pdfs as the knowledge base. Go over `PrivateGPT notebook <https://github.com/georgia-tech-db/evadb/blob/staging/tutorials/13-privategpt.ipynb>`_ for more details.

.. code-block:: sql
Expand Down
6 changes: 2 additions & 4 deletions docs/source/reference/ai/model-train.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _predict:

Training and Finetuning
========================

Expand Down Expand Up @@ -27,10 +29,6 @@ You can also simply give all other columns in `HomeRentals` as inputs and let th
PREDICT 'rental_price'
TIME_LIMIT 120;
.. note::

Check :ref:`create-udf-train` for available configurations for training models.

2. After training completes, you can use the `PredictHouseRent` like all other functions in EvaDB

.. code-block:: sql
Expand Down
26 changes: 15 additions & 11 deletions docs/source/reference/evaql/create.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,21 +99,25 @@ To register an user-defined function, specify the implementation details of the
TYPE Classification
IMPL 'evadb/functions/fastrcnn_object_detector.py';
.. _create-udf-train:

CREATE FUNCTION via Training
CREATE FUNCTION via Type
----------------------------

To register an user-defined function by training a predication model.

.. code-block:: sql
CREATE FUNCTION IF NOT EXISTS PredictHouseRent FROM
(SELECT * FROM HomeRentals)
TYPE Ludwig
PREDICT 'rental_price'
TIME_LIST 120;
TUNE_FOR_MEMORY False;
CREATE [OR REPALCE] FUNCTION [IF NOT EXISTS] function_name
[ FROM ( select ) ]
TYPE function_type
[ parameter [ ...] ]
Where the `parameter` is ``key value`` pair.

.. warning::

For one ``CREATE FUNCTION`` query, we can specify ``OR REPLACE`` or ``IF NOT EXISTS`` or neither, but not both.

.. note::

Go over :ref:`hf`, :ref:`predict`, and :ref:`forecast` to check examples for creating function via type.

CREATE MATERIALIZED VIEW
------------------------
Expand Down
29 changes: 24 additions & 5 deletions evadb/executor/create_function_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,12 +278,30 @@ def exec(self, *args, **kwargs):
Calls the catalog to insert a function catalog entry.
"""
assert (
self.node.if_not_exists and self.node.or_replace
) is False, (
"OR REPLACE and IF NOT EXISTS can not be both set for CREATE FUNCTION."
)

overwrite = False
# check catalog if it already has this function entry
if self.catalog().get_function_catalog_entry_by_name(self.node.name):
if self.node.if_not_exists:
msg = f"Function {self.node.name} already exists, nothing added."
yield Batch(pd.DataFrame([msg]))
return
elif self.node.or_replace:
# We use DropObjectExecutor to avoid bookkeeping the code. The drop function should be moved to catalog.
from evadb.executor.drop_object_executor import DropObjectExecutor

drop_exectuor = DropObjectExecutor(self.db, None)
try:
drop_exectuor._handle_drop_function(self.node.name, if_exists=False)
except RuntimeError:
pass
else:
overwrite = True
else:
msg = f"Function {self.node.name} already exists."
logger.error(msg)
Expand Down Expand Up @@ -334,11 +352,12 @@ def exec(self, *args, **kwargs):
self.catalog().insert_function_catalog_entry(
name, impl_path, function_type, io_list, metadata
)
yield Batch(
pd.DataFrame(
[f"Function {self.node.name} successfully added to the database."]
)
)

if overwrite:
msg = f"Function {self.node.name} overwritten."
else:
msg = f"Function {self.node.name} added to the database."
yield Batch(pd.DataFrame([msg]))

def _try_initializing_function(
self, impl_path: str, function_args: Dict = {}
Expand Down
13 changes: 11 additions & 2 deletions evadb/optimizer/operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -641,9 +641,10 @@ class LogicalCreateFunction(Operator):
Attributes:
name: str
function_name provided by the user required
or_replace: bool
if true should overwrite if function with same name exists
if_not_exists: bool
if true should throw an error if function with same name exists
else will replace the existing
if true should skip if function with same name exists
inputs: List[FunctionIOCatalogEntry]
function inputs, annotated list similar to table columns
outputs: List[FunctionIOCatalogEntry]
Expand All @@ -659,6 +660,7 @@ class LogicalCreateFunction(Operator):
def __init__(
self,
name: str,
or_replace: bool,
if_not_exists: bool,
inputs: List[FunctionIOCatalogEntry],
outputs: List[FunctionIOCatalogEntry],
Expand All @@ -669,6 +671,7 @@ def __init__(
):
super().__init__(OperatorType.LOGICALCREATEFUNCTION, children)
self._name = name
self._or_replace = or_replace
self._if_not_exists = if_not_exists
self._inputs = inputs
self._outputs = outputs
Expand All @@ -680,6 +683,10 @@ def __init__(
def name(self):
return self._name

@property
def or_replace(self):
return self._or_replace

@property
def if_not_exists(self):
return self._if_not_exists
Expand Down Expand Up @@ -711,6 +718,7 @@ def __eq__(self, other):
return (
is_subtree_equal
and self.name == other.name
and self.or_replace == other.or_replace
and self.if_not_exists == other.if_not_exists
and self.inputs == other.inputs
and self.outputs == other.outputs
Expand All @@ -724,6 +732,7 @@ def __hash__(self) -> int:
(
super().__hash__(),
self.name,
self.or_replace,
self.if_not_exists,
tuple(self.inputs),
tuple(self.outputs),
Expand Down
2 changes: 2 additions & 0 deletions evadb/optimizer/rules/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -755,6 +755,7 @@ def check(self, before: Operator, context: OptimizerContext):
def apply(self, before: LogicalCreateFunction, context: OptimizerContext):
after = CreateFunctionPlan(
before.name,
before.or_replace,
before.if_not_exists,
before.inputs,
before.outputs,
Expand Down Expand Up @@ -782,6 +783,7 @@ def check(self, before: Operator, context: OptimizerContext):
def apply(self, before: LogicalCreateFunction, context: OptimizerContext):
after = CreateFunctionPlan(
before.name,
before.or_replace,
before.if_not_exists,
before.inputs,
before.outputs,
Expand Down
1 change: 1 addition & 0 deletions evadb/optimizer/statement_to_opr_converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,7 @@ def visit_create_function(self, statement: CreateFunctionStatement):

create_function_opr = LogicalCreateFunction(
statement.name,
statement.or_replace,
statement.if_not_exists,
annotated_inputs,
annotated_outputs,
Expand Down
15 changes: 14 additions & 1 deletion evadb/parser/create_function_statement.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ class CreateFunctionStatement(AbstractStatement):
def __init__(
self,
name: str,
or_replace: bool,
if_not_exists: bool,
impl_path: str,
inputs: List[ColumnDefinition] = [],
Expand All @@ -59,6 +60,7 @@ def __init__(
):
super().__init__(StatementType.CREATE_FUNCTION)
self._name = name
self._or_replace = or_replace
self._if_not_exists = if_not_exists
self._inputs = inputs
self._outputs = outputs
Expand All @@ -68,7 +70,12 @@ def __init__(
self._metadata = metadata

def __str__(self) -> str:
s = "CREATE FUNCTION"
s = "CREATE"

if self._or_replace:
s += " OR REPLACE"

s += " " + "FUNCTION"

if self._if_not_exists:
s += " IF NOT EXISTS"
Expand All @@ -95,6 +102,10 @@ def __str__(self) -> str:
def name(self):
return self._name

@property
def or_replace(self):
return self._or_replace

@property
def if_not_exists(self):
return self._if_not_exists
Expand Down Expand Up @@ -136,6 +147,7 @@ def __eq__(self, other):
return False
return (
self.name == other.name
and self.or_replace == other.or_replace
and self.if_not_exists == other.if_not_exists
and self.inputs == other.inputs
and self.outputs == other.outputs
Expand All @@ -150,6 +162,7 @@ def __hash__(self) -> int:
(
super().__hash__(),
self.name,
self.or_replace,
self.if_not_exists,
tuple(self.inputs),
tuple(self.outputs),
Expand Down
13 changes: 8 additions & 5 deletions evadb/parser/evadb.lark
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@ create_table: CREATE TABLE if_not_exists? table_name (create_definitions | (AS s
rename_table: RENAME TABLE table_name TO table_name

// Create Functions
create_function: CREATE FUNCTION if_not_exists? function_name INPUT create_definitions OUTPUT create_definitions TYPE function_type IMPL function_impl function_metadata*
| CREATE FUNCTION if_not_exists? function_name IMPL function_impl function_metadata*
| CREATE FUNCTION if_not_exists? function_name TYPE function_type function_metadata*
| CREATE FUNCTION if_not_exists? function_name FROM LR_BRACKET select_statement RR_BRACKET TYPE function_type function_metadata*
create_function: CREATE or_replace? FUNCTION if_not_exists? function_name INPUT create_definitions OUTPUT create_definitions TYPE function_type IMPL function_impl function_metadata*
| CREATE or_replace? FUNCTION if_not_exists? function_name IMPL function_impl function_metadata*
| CREATE or_replace? FUNCTION if_not_exists? function_name TYPE function_type function_metadata*
| CREATE or_replace? FUNCTION if_not_exists? function_name FROM LR_BRACKET select_statement RR_BRACKET TYPE function_type function_metadata*

// Details
function_name: uid
Expand Down Expand Up @@ -265,6 +265,8 @@ if_exists: IF EXISTS

if_not_exists: IF NOT EXISTS

or_replace: OR REPLACE

// Functions

function_call: function ->function_call
Expand Down Expand Up @@ -373,11 +375,12 @@ PARAMETERS: "PARAMETERS"i
PRIMARY: "PRIMARY"i
REFERENCES: "REFERENCES"i
RENAME: "RENAME"i
REPLACE: "REPLACE"i
USE: "USE"i
SAMPLE: "SAMPLE"i
IFRAMES: "IFRAMES"i
AUDIORATE: "AUDIORATE"i
SELECT: "SELECT"i
SELECT: "SELECT"i
SET: "SET"i
SHUTDOWN: "SHUTDOWN"i
SHOW: "SHOW"i
Expand Down
4 changes: 4 additions & 0 deletions evadb/parser/lark_visitor/_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ def function_args(self, tree):
# Create function
def create_function(self, tree):
function_name = None
or_replace = False
if_not_exists = False
input_definitions = []
output_definitions = []
Expand All @@ -73,6 +74,8 @@ def create_function(self, tree):
if isinstance(child, Tree):
if child.data == "function_name":
function_name = self.visit(child)
elif child.data == "or_replace":
or_replace = True
elif child.data == "if_not_exists":
if_not_exists = True
elif child.data == "create_definitions":
Expand Down Expand Up @@ -103,6 +106,7 @@ def create_function(self, tree):

return CreateFunctionStatement(
function_name,
or_replace,
if_not_exists,
impl_path,
input_definitions,
Expand Down
Loading

0 comments on commit d4fe198

Please sign in to comment.