Merge branch 'staging' into csv-column-names

georgia-tech-db · Oct 7, 2023 · 8fe769d · 8fe769d
2 parents f2f6805 + f0116f1
commit 8fe769d
Show file tree

Hide file tree

Showing 23 changed files with 246 additions and 38 deletions.
diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -22,7 +22,7 @@ parts:
     chapters:
       - file: source/usecases/forecasting.rst
         title: Forecasting
-      - file: source/usecases/prediction.rst
+      - file: source/usecases/classification.rst
         title: Classification
       - file: source/usecases/sentiment-analysis.rst
         title: Sentiment Analysis

diff --git a/docs/source/benchmarks/text_summarization.rst b/docs/source/benchmarks/text_summarization.rst
@@ -88,7 +88,7 @@ Setup SQLite Database
 Install MindsDB
 ~~~~~~~~~~~~~~~
 
-Follow the `MindsDB nstallation guide <https://docs.mindsdb.com/setup/self-hosted/pip/source>`_ to install it via ``pip``.
+Follow the `MindsDB installation guide <https://docs.mindsdb.com/setup/self-hosted/pip/source>`_ to install it via ``pip``.
 
 .. note::
 

diff --git a/docs/source/dev-guide/release/release-steps.rst b/docs/source/dev-guide/release/release-steps.rst
@@ -25,7 +25,7 @@ Simply point ``master`` head to the latest commit of ``staging``.
 Setup Credentials
 ~~~~~~~~~~~~~~~~~~
 
-Please check :ref:`setup_pypi_account` about how to setup PyPi account.
+Please check :ref:`setup_pypi_account` about how to setup PyPI account.
 
 Setup Github token. You can obtain a personal token from Github.
 

diff --git a/docs/source/overview/model-inference.rst b/docs/source/overview/model-inference.rst
@@ -12,11 +12,11 @@ In EvaDB, every model is a function. We can compose SQL queries using functions
 1. Projection
 -------------
 
-The most common usecases are model inference in projections. For example, we can use the `MnistImageClassifier <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/mnist_image_classifier.py>`_ to identify numbers from the `MINST <https://www.dropbox.com/s/yxljxz6zxoqu54v/mnist.mp4>`_ video. 
+The most common usecases are model inference in projections. For example, we can use the `MnistImageClassifier <https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/mnist_image_classifier.py>`_ to identify numbers from the `MNIST <https://www.dropbox.com/s/yxljxz6zxoqu54v/mnist.mp4>`_ video. 
 
 .. code-block:: sql
 
-   SELECT MnistImageClassifier(data).label FROM minst_vid;
+   SELECT MnistImageClassifier(data).label FROM mnist_vid;
 
 2. Selection
 ------------
@@ -96,4 +96,4 @@ We can also use the `SiftFeatureExtractor <https://github.com/georgia-tech-db/ev
 
 .. note::
 
-   Go over our :ref:`Usecases<sentiment-analysis>` to check more ways of utlizing models in real-world use cases.
+   Go over our :ref:`Usecases<sentiment-analysis>` to check more ways of utilizing models in real-world use cases.
diff --git a/docs/source/reference/ai/model-forecasting.rst b/docs/source/reference/ai/model-forecasting.rst
@@ -58,11 +58,11 @@ EvaDB's default forecast framework is `statsforecast <https://nixtla.github.io/s
    * - LIBRARY (str, default: 'statsforecast')
      - We can select one of `statsforecast` (default) or `neuralforecast`. `statsforecast` provides access to statistical forecasting methods, while `neuralforecast` gives access to deep-learning based forecasting methods.
    * - MODEL (str, default: 'ARIMA')
-     - If LIBRARY is `statsforecast`, we can select one of ARIMA, CES, ETS, Theta. The default is ARIMA. Check `Automatic Forecasting <https://nixtla.github.io/statsforecast/src/core/models_intro.html#automatic-forecasting>`_ to learn details about these models. If LIBRARY is `neuralforecast`, we can select one of NHITS or NBEATS. The default is NBEATS. Check `NBEATS docs <https://nixtla.github.io/neuralforecast/models.nbeats.html>`_ for details.
+     - If LIBRARY is `statsforecast`, we can select one of ARIMA, ting, ETS, Theta. The default is ARIMA. Check `Automatic Forecasting <https://nixtla.github.io/statsforecast/src/core/models_intro.html#automatic-forecasting>`_ to learn details about these models. If LIBRARY is `neuralforecast`, we can select one of NHITS or NBEATS. The default is NBEATS. Check `NBEATS docs <https://nixtla.github.io/neuralforecast/models.nbeats.html>`_ for details.
    * - AUTO (str, default: 'T')
      - If set to 'T', it enables automatic hyperparameter optimization. Must be set to 'T' for `statsforecast` library. One may set this parameter to `false` if LIBRARY is `neuralforecast` for faster (but less reliable) results.
    * - Frequency (str, default: 'auto')
-     - A string indicating the frequency of the data. The common used ones are D, W, M, Y, which repestively represents day-, week-, month- and year- end frequency. The default value is M. Check `pandas available frequencies <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ for all available frequencies. If it is not provided, the frequency is attempted to be determined automatically.
+     - A string indicating the frequency of the data. The common used ones are D, W, M, Y, which respectively represents day-, week-, month- and year- end frequency. The default value is M. Check `pandas available frequencies <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ for all available frequencies. If it is not provided, the frequency is attempted to be determined automatically.
 
 Note: If columns other than the ones required as mentioned above are passed while creating the function, they will be treated as exogenous variables if LIBRARY is `neuralforecast`. Otherwise, they would be ignored.
 

diff --git a/docs/source/reference/ai/model-train-sklearn.rst b/docs/source/reference/ai/model-train-sklearn.rst
@@ -23,4 +23,4 @@ To use the `Sklearn framework <https://scikit-learn.org/stable/>`_, we need to i
    PREDICT 'rental_price';
 
 In the above query, you are creating a new customized function by training a model from the ``HomeRentals`` table using the ``Sklearn`` framework.
-The ``rental_price`` column will be the target column for predication, while the rest columns from the ``SELET`` query are the inputs. 
+The ``rental_price`` column will be the target column for predication, while the rest columns from the ``SELECT`` query are the inputs. 
diff --git a/docs/source/reference/evaql/create.rst b/docs/source/reference/evaql/create.rst
@@ -68,7 +68,7 @@ The index can be created on either a column of a table directly or outputs from
 * [index_name] is the name the of constructed index.
 * [table_name] is the name of the table, on which the index is created.
 * [column_name] is the name of one of the column in the table. We currently only support creating index on single column of a table.
-* [function_name] is an optional parameter that can be added if the index needs to be construsted on results of a funciton.
+* [function_name] is an optional parameter that can be added if the index needs to be constructed on results of a function.
 
 Examples
 ~~~~~~~~
@@ -104,7 +104,7 @@ CREATE FUNCTION via Type
 
 .. code-block:: sql
 
-   CREATE [OR REPALCE] FUNCTION [IF NOT EXISTS] function_name
+   CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] function_name
    [ FROM ( select ) ]
    TYPE function_type
    [ parameter [ ...] ]

diff --git a/docs/source/reference/vector_stores/pinecone.rst b/docs/source/reference/vector_stores/pinecone.rst
@@ -2,7 +2,7 @@ Pinecone
 ==========
 
 Pinecone is a managed, cloud-native vector database with a simple API and no infrastructure hassles.
-The connection to Pincone is based on the `pinecone-client <https://docs.pinecone.io/docs/python-client>`_ library.
+The connection to Pinecone is based on the `pinecone-client <https://docs.pinecone.io/docs/python-client>`_ library.
 
 Dependency
 ----------

diff --git a/docs/source/usecases/classification.rst b/docs/source/usecases/classification.rst
@@ -70,7 +70,7 @@ We set the training time out to be ``3600`` seconds.
 
 .. note::
 
-   The :ref:`ludwig` page lists all the configurable paramters for the model training framework.
+   The :ref:`ludwig` page lists all the configurable parameters for the model training framework.
 
 This query returns the trained model:
 

diff --git a/docs/source/usecases/forecasting.rst b/docs/source/usecases/forecasting.rst
@@ -84,7 +84,7 @@ This query returns the trained model:
 
 .. note::
 
-   The :ref:`forecast` page lists all the configurable paramters for the forecasting model.
+   The :ref:`forecast` page lists all the configurable parameters for the forecasting model.
 
 In the ``home_sales`` dataset, we have two different types of properties -- houses and units, and price gap between them is large. To get better forecasts,
 we specify the ``propertytype`` column as the ``ID`` of the time series data.

diff --git a/docs/source/usecases/image-search.rst b/docs/source/usecases/image-search.rst
@@ -88,7 +88,7 @@ Similar Image Search Powered By Vector Index
 
 EvaQL supports the ``ORDER BY`` and ``LIMIT`` clauses to retrieve the ``top-k`` most similar images for a given image. 
 
-EvaDB contains a built-in ``Similarity(x, y)`` function that computets the Euclidean distance between ``x`` and ``y``. We will use this function to compare the feature vector of image being search (i.e., the given image) and the feature vectors of all the images in the dataset that is stored in the vector index.
+EvaDB contains a built-in ``Similarity(x, y)`` function that computes the Euclidean distance between ``x`` and ``y``. We will use this function to compare the feature vector of image being search (i.e., the given image) and the feature vectors of all the images in the dataset that is stored in the vector index.
 
 EvaDB's query optimizer automatically picks the correct vector index to accelerate a given EvaQL query. It uses the vector index created in the prior step to accelerate the following image search query:
 

diff --git a/evadb/binder/create_index_statement_binder.py b/evadb/binder/create_index_statement_binder.py
@@ -95,7 +95,7 @@ def bind_create_index(binder: StatementBinder, node: CreateIndexStatement):
                 len(output.array_dimensions) == 2
             ), "Index input needs to be 2 dimensional."
 
-            # Vector type speciic check.
+            # Vector type specific check.
             if node.vector_store_type == VectorStoreType.FAISS:
                 assert (
                     output.array_type == NdArrayType.FLOAT32

diff --git a/evadb/executor/create_executor.py b/evadb/executor/create_executor.py
@@ -34,7 +34,7 @@ def __init__(self, db: EvaDBDatabase, node: CreatePlan):
         super().__init__(db, node)
 
     def exec(self, *args, **kwargs):
-        # create a table in the ative database if set
+        # create a table in the active database if set
         is_native_table = self.node.table_info.database_name is not None
 
         check_if_exists = handle_if_not_exists(

diff --git a/evadb/executor/create_function_executor.py b/evadb/executor/create_function_executor.py
@@ -252,7 +252,7 @@ def handle_forecasting_function(self):
         frequency = arg_map["frequency"]
         if frequency is None:
             raise RuntimeError(
-                f"Can not infer the frequency for {self.node.name}. Please explictly set it."
+                f"Can not infer the frequency for {self.node.name}. Please explicitly set it."
             )
 
         season_dict = {  # https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases
@@ -393,7 +393,7 @@ def handle_forecasting_function(self):
             if int(x.split("horizon")[1].split(".pkl")[0]) >= horizon
         ]
         if len(existing_model_files) == 0:
-            print("Training, please wait...")
+            logger.info("Training, please wait...")
             if library == "neuralforecast":
                 model.fit(df=data, val_size=horizon)
             else:
@@ -471,9 +471,9 @@ def exec(self, *args, **kwargs):
                 # We use DropObjectExecutor to avoid bookkeeping the code. The drop function should be moved to catalog.
                 from evadb.executor.drop_object_executor import DropObjectExecutor
 
-                drop_exectuor = DropObjectExecutor(self.db, None)
+                drop_executor = DropObjectExecutor(self.db, None)
                 try:
-                    drop_exectuor._handle_drop_function(self.node.name, if_exists=False)
+                    drop_executor._handle_drop_function(self.node.name, if_exists=False)
                 except RuntimeError:
                     pass
                 else:

diff --git a/evadb/executor/executor_utils.py b/evadb/executor/executor_utils.py
@@ -182,14 +182,14 @@ def handle_vector_store_params(
 def create_table_catalog_entry_for_native_table(
     table_info: TableInfo, column_list: List[ColumnDefinition]
 ):
-    column_catalog_entires = xform_column_definitions_to_catalog_entries(column_list)
+    column_catalog_entries = xform_column_definitions_to_catalog_entries(column_list)
 
     # Assemble table.
     table_catalog_entry = TableCatalogEntry(
         name=table_info.table_name,
         file_url=None,
         table_type=TableType.NATIVE_DATA,
-        columns=column_catalog_entires,
+        columns=column_catalog_entries,
         database_name=table_info.database_name,
     )
     return table_catalog_entry
diff --git a/evadb/executor/set_executor.py b/evadb/executor/set_executor.py
@@ -32,7 +32,7 @@ def exec(self, *args, **kwargs):
         https://www.postgresql.org/docs/7.0/sql-set.htm
         https://duckdb.org/docs/sql/configuration.html
 
-        This design change for configuation manager will be taken care of
+        This design change for configuration manager will be taken care of
         as a separate PR for the issue #1140, where all instances of config use
         will be replaced
         """

diff --git a/evadb/third_party/databases/mariadb/mariadb_handler.py b/evadb/third_party/databases/mariadb/mariadb_handler.py
@@ -26,7 +26,7 @@ class MariaDbHandler(DBHandler):
 
     """
     Class for implementing the Maria DB handler as a backend store for
-    EvaDb.
+    EvaDB.
     """
 
     def __init__(self, name: str, **kwargs):

diff --git a/evadb/third_party/databases/types.py b/evadb/third_party/databases/types.py
@@ -89,7 +89,7 @@ def get_sqlalchmey_uri(self) -> str:
 
     def is_sqlalchmey_compatible(self) -> bool:
         """
-        Return  whether the data source is sqlaclemy compatible
+        Return  whether the data source is sqlaclchemy compatible
 
         Returns:
             A True / False boolean value..

diff --git a/script/formatting/formatter.py b/script/formatting/formatter.py
@@ -461,22 +461,22 @@ def check_file(file):
 
         # CODESPELL
         #LOG.info("Codespell")
-        subprocess.check_output("codespell 'evadb/*.py'", 
+        subprocess.check_output(""" codespell "evadb/*.py" """, 
                 shell=True, 
                 universal_newlines=True)
-        subprocess.check_output("codespell 'evadb/*/*.py'", 
+        subprocess.check_output(""" codespell "evadb/*/*.py" """, 
                 shell=True, 
                 universal_newlines=True)
-        subprocess.check_output("codespell 'docs/source/*/*.rst'", 
+        subprocess.check_output(""" codespell "docs/source/*/*.rst" """, 
                 shell=True, 
                 universal_newlines=True)
-        subprocess.check_output("codespell 'docs/source/*.rst'", 
+        subprocess.check_output(""" codespell "docs/source/*.rst" """, 
                 shell=True, 
                 universal_newlines=True)
-        subprocess.check_output("codespell '*.md'", 
+        subprocess.check_output(""" codespell "*.md" """, 
                 shell=True, 
                 universal_newlines=True)
-        subprocess.check_output("codespell 'evadb/*.md'", 
+        subprocess.check_output(""" codespell "evadb/*.md" """, 
                 shell=True, 
                 universal_newlines=True)