Skip to content

Commit

Permalink
Merge branch 'staging' into csv-column-names
Browse files Browse the repository at this point in the history
  • Loading branch information
Andy Xu committed Oct 7, 2023
2 parents f2f6805 + f0116f1 commit 8fe769d
Show file tree
Hide file tree
Showing 23 changed files with 246 additions and 38 deletions.
2 changes: 1 addition & 1 deletion docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ parts:
chapters:
- file: source/usecases/forecasting.rst
title: Forecasting
- file: source/usecases/prediction.rst
- file: source/usecases/classification.rst
title: Classification
- file: source/usecases/sentiment-analysis.rst
title: Sentiment Analysis
Expand Down
2 changes: 1 addition & 1 deletion docs/source/benchmarks/text_summarization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Setup SQLite Database
Install MindsDB
~~~~~~~~~~~~~~~

Follow the `MindsDB nstallation guide <https://docs.mindsdb.com/setup/self-hosted/pip/source>`_ to install it via ``pip``.
Follow the `MindsDB installation guide <https://docs.mindsdb.com/setup/self-hosted/pip/source>`_ to install it via ``pip``.

.. note::

Expand Down
2 changes: 1 addition & 1 deletion docs/source/dev-guide/release/release-steps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Simply point ``master`` head to the latest commit of ``staging``.
Setup Credentials
~~~~~~~~~~~~~~~~~~

Please check :ref:`setup_pypi_account` about how to setup PyPi account.
Please check :ref:`setup_pypi_account` about how to setup PyPI account.

Setup Github token. You can obtain a personal token from Github.

Expand Down
6 changes: 3 additions & 3 deletions docs/source/overview/model-inference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ In EvaDB, every model is a function. We can compose SQL queries using functions
1. Projection
-------------

The most common usecases are model inference in projections. For example, we can use the `MnistImageClassifier <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/mnist_image_classifier.py>`_ to identify numbers from the `MINST <https://www.dropbox.com/s/yxljxz6zxoqu54v/mnist.mp4>`_ video.
The most common usecases are model inference in projections. For example, we can use the `MnistImageClassifier <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/mnist_image_classifier.py>`_ to identify numbers from the `MNIST <https://www.dropbox.com/s/yxljxz6zxoqu54v/mnist.mp4>`_ video.

.. code-block:: sql
SELECT MnistImageClassifier(data).label FROM minst_vid;
SELECT MnistImageClassifier(data).label FROM mnist_vid;
2. Selection
------------
Expand Down Expand Up @@ -96,4 +96,4 @@ We can also use the `SiftFeatureExtractor <https://github.com/georgia-tech-db/ev
.. note::

Go over our :ref:`Usecases<sentiment-analysis>` to check more ways of utlizing models in real-world use cases.
Go over our :ref:`Usecases<sentiment-analysis>` to check more ways of utilizing models in real-world use cases.
4 changes: 2 additions & 2 deletions docs/source/reference/ai/model-forecasting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,11 @@ EvaDB's default forecast framework is `statsforecast <https://nixtla.github.io/s
* - LIBRARY (str, default: 'statsforecast')
- We can select one of `statsforecast` (default) or `neuralforecast`. `statsforecast` provides access to statistical forecasting methods, while `neuralforecast` gives access to deep-learning based forecasting methods.
* - MODEL (str, default: 'ARIMA')
- If LIBRARY is `statsforecast`, we can select one of ARIMA, CES, ETS, Theta. The default is ARIMA. Check `Automatic Forecasting <https://nixtla.github.io/statsforecast/src/core/models_intro.html#automatic-forecasting>`_ to learn details about these models. If LIBRARY is `neuralforecast`, we can select one of NHITS or NBEATS. The default is NBEATS. Check `NBEATS docs <https://nixtla.github.io/neuralforecast/models.nbeats.html>`_ for details.
- If LIBRARY is `statsforecast`, we can select one of ARIMA, ting, ETS, Theta. The default is ARIMA. Check `Automatic Forecasting <https://nixtla.github.io/statsforecast/src/core/models_intro.html#automatic-forecasting>`_ to learn details about these models. If LIBRARY is `neuralforecast`, we can select one of NHITS or NBEATS. The default is NBEATS. Check `NBEATS docs <https://nixtla.github.io/neuralforecast/models.nbeats.html>`_ for details.
* - AUTO (str, default: 'T')
- If set to 'T', it enables automatic hyperparameter optimization. Must be set to 'T' for `statsforecast` library. One may set this parameter to `false` if LIBRARY is `neuralforecast` for faster (but less reliable) results.
* - Frequency (str, default: 'auto')
- A string indicating the frequency of the data. The common used ones are D, W, M, Y, which repestively represents day-, week-, month- and year- end frequency. The default value is M. Check `pandas available frequencies <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ for all available frequencies. If it is not provided, the frequency is attempted to be determined automatically.
- A string indicating the frequency of the data. The common used ones are D, W, M, Y, which respectively represents day-, week-, month- and year- end frequency. The default value is M. Check `pandas available frequencies <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ for all available frequencies. If it is not provided, the frequency is attempted to be determined automatically.

Note: If columns other than the ones required as mentioned above are passed while creating the function, they will be treated as exogenous variables if LIBRARY is `neuralforecast`. Otherwise, they would be ignored.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/reference/ai/model-train-sklearn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ To use the `Sklearn framework <https://scikit-learn.org/stable/>`_, we need to i
PREDICT 'rental_price';
In the above query, you are creating a new customized function by training a model from the ``HomeRentals`` table using the ``Sklearn`` framework.
The ``rental_price`` column will be the target column for predication, while the rest columns from the ``SELET`` query are the inputs.
The ``rental_price`` column will be the target column for predication, while the rest columns from the ``SELECT`` query are the inputs.
4 changes: 2 additions & 2 deletions docs/source/reference/evaql/create.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ The index can be created on either a column of a table directly or outputs from
* [index_name] is the name the of constructed index.
* [table_name] is the name of the table, on which the index is created.
* [column_name] is the name of one of the column in the table. We currently only support creating index on single column of a table.
* [function_name] is an optional parameter that can be added if the index needs to be construsted on results of a funciton.
* [function_name] is an optional parameter that can be added if the index needs to be constructed on results of a function.

Examples
~~~~~~~~
Expand Down Expand Up @@ -104,7 +104,7 @@ CREATE FUNCTION via Type

.. code-block:: sql
CREATE [OR REPALCE] FUNCTION [IF NOT EXISTS] function_name
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] function_name
[ FROM ( select ) ]
TYPE function_type
[ parameter [ ...] ]
Expand Down
2 changes: 1 addition & 1 deletion docs/source/reference/vector_stores/pinecone.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Pinecone
==========

Pinecone is a managed, cloud-native vector database with a simple API and no infrastructure hassles.
The connection to Pincone is based on the `pinecone-client <https://docs.pinecone.io/docs/python-client>`_ library.
The connection to Pinecone is based on the `pinecone-client <https://docs.pinecone.io/docs/python-client>`_ library.

Dependency
----------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/usecases/classification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ We set the training time out to be ``3600`` seconds.

.. note::

The :ref:`ludwig` page lists all the configurable paramters for the model training framework.
The :ref:`ludwig` page lists all the configurable parameters for the model training framework.

This query returns the trained model:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/usecases/forecasting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ This query returns the trained model:
.. note::

The :ref:`forecast` page lists all the configurable paramters for the forecasting model.
The :ref:`forecast` page lists all the configurable parameters for the forecasting model.

In the ``home_sales`` dataset, we have two different types of properties -- houses and units, and price gap between them is large. To get better forecasts,
we specify the ``propertytype`` column as the ``ID`` of the time series data.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/usecases/image-search.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Similar Image Search Powered By Vector Index

EvaQL supports the ``ORDER BY`` and ``LIMIT`` clauses to retrieve the ``top-k`` most similar images for a given image.

EvaDB contains a built-in ``Similarity(x, y)`` function that computets the Euclidean distance between ``x`` and ``y``. We will use this function to compare the feature vector of image being search (i.e., the given image) and the feature vectors of all the images in the dataset that is stored in the vector index.
EvaDB contains a built-in ``Similarity(x, y)`` function that computes the Euclidean distance between ``x`` and ``y``. We will use this function to compare the feature vector of image being search (i.e., the given image) and the feature vectors of all the images in the dataset that is stored in the vector index.

EvaDB's query optimizer automatically picks the correct vector index to accelerate a given EvaQL query. It uses the vector index created in the prior step to accelerate the following image search query:

Expand Down
2 changes: 1 addition & 1 deletion evadb/binder/create_index_statement_binder.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def bind_create_index(binder: StatementBinder, node: CreateIndexStatement):
len(output.array_dimensions) == 2
), "Index input needs to be 2 dimensional."

# Vector type speciic check.
# Vector type specific check.
if node.vector_store_type == VectorStoreType.FAISS:
assert (
output.array_type == NdArrayType.FLOAT32
Expand Down
2 changes: 1 addition & 1 deletion evadb/executor/create_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def __init__(self, db: EvaDBDatabase, node: CreatePlan):
super().__init__(db, node)

def exec(self, *args, **kwargs):
# create a table in the ative database if set
# create a table in the active database if set
is_native_table = self.node.table_info.database_name is not None

check_if_exists = handle_if_not_exists(
Expand Down
8 changes: 4 additions & 4 deletions evadb/executor/create_function_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ def handle_forecasting_function(self):
frequency = arg_map["frequency"]
if frequency is None:
raise RuntimeError(
f"Can not infer the frequency for {self.node.name}. Please explictly set it."
f"Can not infer the frequency for {self.node.name}. Please explicitly set it."
)

season_dict = { # https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases
Expand Down Expand Up @@ -393,7 +393,7 @@ def handle_forecasting_function(self):
if int(x.split("horizon")[1].split(".pkl")[0]) >= horizon
]
if len(existing_model_files) == 0:
print("Training, please wait...")
logger.info("Training, please wait...")
if library == "neuralforecast":
model.fit(df=data, val_size=horizon)
else:
Expand Down Expand Up @@ -471,9 +471,9 @@ def exec(self, *args, **kwargs):
# We use DropObjectExecutor to avoid bookkeeping the code. The drop function should be moved to catalog.
from evadb.executor.drop_object_executor import DropObjectExecutor

drop_exectuor = DropObjectExecutor(self.db, None)
drop_executor = DropObjectExecutor(self.db, None)
try:
drop_exectuor._handle_drop_function(self.node.name, if_exists=False)
drop_executor._handle_drop_function(self.node.name, if_exists=False)
except RuntimeError:
pass
else:
Expand Down
4 changes: 2 additions & 2 deletions evadb/executor/executor_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,14 +182,14 @@ def handle_vector_store_params(
def create_table_catalog_entry_for_native_table(
table_info: TableInfo, column_list: List[ColumnDefinition]
):
column_catalog_entires = xform_column_definitions_to_catalog_entries(column_list)
column_catalog_entries = xform_column_definitions_to_catalog_entries(column_list)

# Assemble table.
table_catalog_entry = TableCatalogEntry(
name=table_info.table_name,
file_url=None,
table_type=TableType.NATIVE_DATA,
columns=column_catalog_entires,
columns=column_catalog_entries,
database_name=table_info.database_name,
)
return table_catalog_entry
2 changes: 1 addition & 1 deletion evadb/executor/set_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def exec(self, *args, **kwargs):
https://www.postgresql.org/docs/7.0/sql-set.htm
https://duckdb.org/docs/sql/configuration.html
This design change for configuation manager will be taken care of
This design change for configuration manager will be taken care of
as a separate PR for the issue #1140, where all instances of config use
will be replaced
"""
Expand Down
2 changes: 1 addition & 1 deletion evadb/third_party/databases/mariadb/mariadb_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ class MariaDbHandler(DBHandler):

"""
Class for implementing the Maria DB handler as a backend store for
EvaDb.
EvaDB.
"""

def __init__(self, name: str, **kwargs):
Expand Down
2 changes: 1 addition & 1 deletion evadb/third_party/databases/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ def get_sqlalchmey_uri(self) -> str:

def is_sqlalchmey_compatible(self) -> bool:
"""
Return whether the data source is sqlaclemy compatible
Return whether the data source is sqlaclchemy compatible
Returns:
A True / False boolean value..
Expand Down
12 changes: 6 additions & 6 deletions script/formatting/formatter.py
Original file line number Diff line number Diff line change
Expand Up @@ -461,22 +461,22 @@ def check_file(file):

# CODESPELL
#LOG.info("Codespell")
subprocess.check_output("codespell 'evadb/*.py'",
subprocess.check_output(""" codespell "evadb/*.py" """,
shell=True,
universal_newlines=True)
subprocess.check_output("codespell 'evadb/*/*.py'",
subprocess.check_output(""" codespell "evadb/*/*.py" """,
shell=True,
universal_newlines=True)
subprocess.check_output("codespell 'docs/source/*/*.rst'",
subprocess.check_output(""" codespell "docs/source/*/*.rst" """,
shell=True,
universal_newlines=True)
subprocess.check_output("codespell 'docs/source/*.rst'",
subprocess.check_output(""" codespell "docs/source/*.rst" """,
shell=True,
universal_newlines=True)
subprocess.check_output("codespell '*.md'",
subprocess.check_output(""" codespell "*.md" """,
shell=True,
universal_newlines=True)
subprocess.check_output("codespell 'evadb/*.md'",
subprocess.check_output(""" codespell "evadb/*.md" """,
shell=True,
universal_newlines=True)

Expand Down
Loading

0 comments on commit 8fe769d

Please sign in to comment.