georgia-tech-db · xzdandy · Aug 31, 2023 · Aug 31, 2023 · Aug 31, 2023 · Aug 31, 2023
diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -7,6 +7,8 @@ parts:
         sections:
           - file: source/overview/getting-started/install-guide
             title: Installation Guide
+          - file: source/overview/getting-started/data-source
+            title: Integrate Data Source
       - file: source/overview/concepts
         #- file: source/overview/faq
 
@@ -45,6 +47,12 @@ parts:
           - file: source/reference/evaql/insert
           - file: source/reference/evaql/delete
           - file: source/reference/evaql/rename
+          - file: source/reference/evaql/use
+
+      - file: source/reference/databases/index
+        title: Data Sources
+        sections: 
+          - file: source/reference/databases/postgres 
 
       - file: source/reference/udfs/index
         title: Models

diff --git a/docs/source/dev-guide/contribute.rst b/docs/source/dev-guide/contribute.rst
@@ -1,3 +1,5 @@
+.. _contributing:
+
 Contributing
 ----------------
 

diff --git a/docs/source/dev-guide/extend/new-data-source.rst b/docs/source/dev-guide/extend/new-data-source.rst
@@ -1,3 +1,5 @@
+.. _add-data-source:
+
 Structured Data Source Integration
 ==================================
 This document details steps involved in adding a new structured data source integration in EvaDB.

diff --git a/docs/source/overview/getting-started/data-source.rst b/docs/source/overview/getting-started/data-source.rst
@@ -0,0 +1,59 @@
+Integrate Data Source
+=====================
+
+EvaDB supports an extensive data sources for both structured and unstructured data.
+
+1. Connect to an existing structured data source.
+
+.. code-block:: python
+
+   cursor.query("""
+        CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = {
+                "user": "eva",
+ 		"password": "password",
+ 		"host": "localhost",
+ 		"port": "5432",
+ 		"database": "evadb"
+     	};""").df()
+
+.. note::
+
+   Check :ref:`Create DATABASE statement<sql-create-database>` for syntax documentation and :ref:`Data Sources<data-sources>` for all supported data source engines.
+
+The above query connects to an exsiting Postgres database, which allows us to build AI applications in EvaDB without data migration.
+For example, the following query previews the available data using :ref:`SELECT<sql-select>`.
+
+.. code-block:: python
+
+   cursor.query("SELECT * FROM postgres_data.food_review;").df()
+
+We can also run native queries in the connected database by the :ref:`USE<sql-use>` statement.
+
+.. code-block:: python
+
+   cursor.query("""
+        USE postgres_data {
+                INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.')
+        };""").df()
+
+
+2. Load unstructured data. EvaDB supports a wide range of type of unstructured data. Below are some example:
+
+.. code-block:: python
+
+   cursor.query(
+       "LOAD IMAGE 'reddit-images/*.jpg' INTO reddit_dataset;"
+   ).df()
+
+We load the local reddit image dataset into EvaDB. 
+
+.. code-block:: python
+
+   cursor.query("LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4' INTO MNISTVid;").df()
+
+We load the MNIST video from s3 bucket into EvaDB.
+
+.. note::
+
+   Check :ref:`LOAD statement<sql-load>` for all types of supported unstructured data.
+
diff --git a/docs/source/overview/getting-started/install-guide.rst b/docs/source/overview/getting-started/install-guide.rst
@@ -5,57 +5,52 @@ Installation Guide
 
 EvaDB provides couple different installation options to allow easy extension to rich functionalities. 
 
-Default
+Use pip
 -------
 
-By Default, EvaDB installs only the minimal requirements.
+EvaDB supports Python (versions >= 3.8). We recommend installing with `pip` within an `isolated virtual environment <https://docs.python-guide.org/dev/virtualenvs/>`_.
 
-.. code-block::
+.. code-block:: bash
 
+    python -m venv evadb-venv
+    source evadb-venv/bin/activate
+    pip install --upgrade pip
     pip install evadb
 
-Vision Capability
------------------
+Install additional packages
+---------------------------
 
-You can install EvaDB with the vision extension. 
-With vision extension, you can run queries to do image classification, object detection, and emotion analysis workloads, etc.
+* `evadb[vision]` for vision dependencies. With vision dependencies, we can run queries to do image classification, object detection, and emotion analysis workloads, etc.
+* `evadb[document]` for LLM dependencies. With LLM dependencies, we can leverage the capability of LLM to summarize or do question answering for documents.
+* `evadb[qdrant]` for embedding-based similarity search.
+* `evadb[ludwig]` for model training and finetuning.
+* `evadb[ray]` for distributed execution on ray.
 
-.. code-block::
+Install from source
+-------------------
 
-    pip install evadb[vision]
+.. code-block:: bash
 
-Documents Summarization with LLM
---------------------------------
+   git clone https://github.com/georgia-tech-db/evadb.git
+   cd evadb
+   pip install -e .
 
-You can also use EvaDB to leverage the capability of LLM to summarize or do question answering for documents.
+.. note::
 
-.. code-block::
+   Check :ref:`Contribution Guide<contributing>` for more details.
 
-    pip install evadb[document]
-
-Additional Vector Index
------------------------
-
-EvaDB installs ``faiss`` vector index by default, but users can also install other index library such as ``qdrant`` for similarity search feature.
-
-.. code-block::
-
-    pip install evadb[qdrant]
-
-Training or Finetuning Model
-----------------------------
-
-Instead of using existing models for only inference, you can also train a customized function inside EvaDB with the ``ludwig`` extension.
+Run your first SQL query in EvaDB
+----------------------------------
 
-.. code-block::
+To run SQL query in EvaDB, we need to first create a `cursor` object. The following query lists all the builtin user-defined functions. 
 
-    pip install evadb[ludwig]
+.. code-block:: python
 
-Better Performance and Scalability
-----------------------------------
+   import evdb
+   cursor = evadb.connect().cursor()
+   print(cursor.query("SHOW UDFS;").df())
 
-EvaDB also allows users to improve the query performance by using ``ray`` to parallelize queries.
+.. note::
 
-.. code-block::
+   Check :ref:`Python APIs<python-api>` for connection and cursor-related documentation.
 
-    pip install evadb[ray]
diff --git a/docs/source/reference/api.rst b/docs/source/reference/api.rst
@@ -1,3 +1,5 @@
+.. _python-api:
+
 Basic API
 ==========
 
@@ -74,4 +76,4 @@ EvaDBQuery Interface
     ~evadb.EvaDBQuery.order
     ~evadb.EvaDBQuery.show
     ~evadb.EvaDBQuery.sql_query
-    ~evadb.EvaDBQuery.execute
+    ~evadb.EvaDBQuery.execute
diff --git a/docs/source/reference/databases/index.rst b/docs/source/reference/databases/index.rst
@@ -0,0 +1,9 @@
+.. _data-sources:
+
+Data Sources
+=============
+
+Below are all supported data sources for EvaDB. We welcome adding new data source integrations in EvaDB. Check :ref:`add-data-source` for guidance.
+
+
+.. tableofcontents::
diff --git a/docs/source/reference/databases/postgres.rst b/docs/source/reference/databases/postgres.rst
@@ -0,0 +1,36 @@
+PostgreSQL
+==========
+
+The connection to PostgreSQL is based on the `psycopg2 <https://pypi.org/project/psycopg2/>`_ library.
+
+Dependency
+----------
+
+* psycopg2
+
+
+Parameters
+----------
+
+Required:
+
+* `user` is the database user.
+* `password` is the database password.
+* `host` is the host name, IP address, or URL.
+* `port` is the port used to make TCP/IP connection.
+* `database` is the database name.
+
+
+Create Connection
+-----------------
+
+.. code-block:: text
+
+   CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = {
+        "user": "eva", 
+        "password": "password",
+        "host": "localhost",
+        "port": "5432", 
+        "database": "evadb"
+   };
+
diff --git a/docs/source/reference/evaql/create.rst b/docs/source/reference/evaql/create.rst
@@ -1,6 +1,37 @@
 CREATE 
 ======
 
+.. _sql-create-database:
+
+CREATE DATABASE
+---------------
+
+The CREATE DATABASE statement allows us to connect to an external structured data store in EvaDB.
+
+.. code:: text
+
+   CREATE DATABASE [database_connection]
+        WITH ENGINE = [database_engine],
+        PARAMETERS = [key_value_parameters];
+
+* [database_connection] is the name of the database connection. `[database_connection].[table_name]` will be used as table name to compose SQL queries in EvaDB.
+* [database_engine] is the supported database engine. Check :ref:`supported data sources<data-sources>` for all engine and their available configuration parameters.
+* [key_value_parameters] is a list of key-value pairs as arguments to establish a connection.
+
+
+Examples
+~~~~~~~~
+
+.. code:: text
+
+   CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = {
+        "user": "eva", 
+        "password": "password",
+        "host": "localhost",
+        "port": "5432", 
+        "database": "evadb"
+   };
+
 CREATE TABLE
 ------------
 

diff --git a/docs/source/reference/evaql/load.rst b/docs/source/reference/evaql/load.rst
@@ -1,3 +1,5 @@
+.. _sql-load:
+
 LOAD
 ====
 

diff --git a/docs/source/reference/evaql/select.rst b/docs/source/reference/evaql/select.rst
@@ -1,3 +1,5 @@
+.. _sql-select:
+
 SELECT
 ======
 

diff --git a/docs/source/reference/evaql/use.rst b/docs/source/reference/evaql/use.rst
@@ -0,0 +1,36 @@
+.. _sql-use:
+
+USE
+===
+
+The USE statement allows us to run arbitary native queries in the connected database.
+
+.. code:: text
+
+   USE [database_connection] { [native_query] };
+
+* [database_connection] is an external database connection instanced by the `CREATE DATABASE statement`.
+* [native_query] is an arbitary SQL query supprted by the [database_connection]. 
+
+.. warning::
+
+   Currently EvaDB only supports single query in one USE statement. The [native_query] should not end with semicolon.
+
+Examples
+--------
+
+.. code:: text
+
+   USE postgres_data {
+     DROP TABLE IF EXISTS food_review
+   };
+
+   USE postgres_data {
+     CREATE TABLE food_review (name VARCHAR(10), review VARCHAR(1000))
+   };
+
+   USE postgres_data {
+     INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.')
+   };
+
+