Skip to content

Commit

Permalink
Revamp getting started. (#1016)
Browse files Browse the repository at this point in the history
Inspired by https://ludwig.ai/latest/getting_started/.

Topics will be covered in this PR: 

- Installation Guide
- Integrate Data Source
- USE SQL API
- CREATE DATABASE SQL API
  • Loading branch information
xzdandy committed Aug 31, 2023
1 parent 94b3793 commit 3f05ee8
Show file tree
Hide file tree
Showing 12 changed files with 220 additions and 36 deletions.
8 changes: 8 additions & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ parts:
sections:
- file: source/overview/getting-started/install-guide
title: Installation Guide
- file: source/overview/getting-started/data-source
title: Integrate Data Source
- file: source/overview/concepts
#- file: source/overview/faq

Expand Down Expand Up @@ -50,6 +52,12 @@ parts:
- file: source/reference/evaql/insert
- file: source/reference/evaql/delete
- file: source/reference/evaql/rename
- file: source/reference/evaql/use

- file: source/reference/databases/index
title: Data Sources
sections:
- file: source/reference/databases/postgres

- file: source/reference/udfs/index
title: Models
Expand Down
2 changes: 2 additions & 0 deletions docs/source/dev-guide/contribute.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _contributing:

Contributing
----------------

Expand Down
2 changes: 2 additions & 0 deletions docs/source/dev-guide/extend/new-data-source.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _add-data-source:

Structured Data Source Integration
==================================
This document details steps involved in adding a new structured data source integration in EvaDB.
Expand Down
59 changes: 59 additions & 0 deletions docs/source/overview/getting-started/data-source.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
Integrate Data Source
=====================

EvaDB supports an extensive data sources for both structured and unstructured data.

1. Connect to an existing structured data source.

.. code-block:: python
cursor.query("""
CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = {
"user": "eva",
"password": "password",
"host": "localhost",
"port": "5432",
"database": "evadb"
};""").df()
.. note::

Check :ref:`Create DATABASE statement<sql-create-database>` for syntax documentation and :ref:`Data Sources<data-sources>` for all supported data source engines.

The above query connects to an exsiting Postgres database, which allows us to build AI applications in EvaDB without data migration.
For example, the following query previews the available data using :ref:`SELECT<sql-select>`.

.. code-block:: python
cursor.query("SELECT * FROM postgres_data.food_review;").df()
We can also run native queries in the connected database by the :ref:`USE<sql-use>` statement.

.. code-block:: python
cursor.query("""
USE postgres_data {
INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.')
};""").df()
2. Load unstructured data. EvaDB supports a wide range of type of unstructured data. Below are some examples:

.. code-block:: python
cursor.query(
"LOAD IMAGE 'reddit-images/*.jpg' INTO reddit_dataset;"
).df()
We load the local reddit image dataset into EvaDB.

.. code-block:: python
cursor.query("LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4' INTO MNISTVid;").df()
We load the MNIST video from s3 bucket into EvaDB.

.. note::

Check :ref:`LOAD statement<sql-load>` for all types of supported unstructured data.

65 changes: 30 additions & 35 deletions docs/source/overview/getting-started/install-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,57 +5,52 @@ Installation Guide

EvaDB provides couple different installation options to allow easy extension to rich functionalities.

Default
Use pip
-------

By Default, EvaDB installs only the minimal requirements.
EvaDB supports Python (versions >= 3.8). We recommend installing with ``pip`` within an `isolated virtual environment <https://docs.python-guide.org/dev/virtualenvs/>`_.

.. code-block::
.. code-block:: bash
python -m venv evadb-venv
source evadb-venv/bin/activate
pip install --upgrade pip
pip install evadb
Vision Capability
-----------------
Install additional packages
---------------------------

You can install EvaDB with the vision extension.
With vision extension, you can run queries to do image classification, object detection, and emotion analysis workloads, etc.
* ``evadb[vision]`` for vision dependencies. With vision dependencies, we can run queries to do image classification, object detection, and emotion analysis workloads, etc.
* ``evadb[document]`` for LLM dependencies. With LLM dependencies, we can leverage the capability of LLM to summarize or do question answering for documents.
* ``evadb[qdrant]`` for embedding-based similarity search.
* ``evadb[ludwig]`` for model training and finetuning.
* ``evadb[ray]`` for distributed execution on ray.

.. code-block::
Install from source
-------------------

pip install evadb[vision]
.. code-block:: bash
Documents Summarization with LLM
--------------------------------
git clone https://github.com/georgia-tech-db/evadb.git
cd evadb
pip install -e .
You can also use EvaDB to leverage the capability of LLM to summarize or do question answering for documents.
.. note::

.. code-block::
Check :ref:`Contribution Guide<contributing>` for more details.

pip install evadb[document]
Additional Vector Index
-----------------------

EvaDB installs ``faiss`` vector index by default, but users can also install other index library such as ``qdrant`` for similarity search feature.

.. code-block::
pip install evadb[qdrant]
Training or Finetuning Model
----------------------------

Instead of using existing models for only inference, you can also train a customized function inside EvaDB with the ``ludwig`` extension.
Run your first SQL query in EvaDB
----------------------------------

.. code-block::
To run SQL query in EvaDB, we need to first create a ``cursor`` object. The following query lists all the builtin user-defined functions.

pip install evadb[ludwig]
.. code-block:: python
Better Performance and Scalability
----------------------------------
import evdb
cursor = evadb.connect().cursor()
print(cursor.query("SHOW UDFS;").df())
EvaDB also allows users to improve the query performance by using ``ray`` to parallelize queries.
.. note::

.. code-block::
Check :ref:`Python APIs<python-api>` for connection and cursor-related documentation.

pip install evadb[ray]
4 changes: 3 additions & 1 deletion docs/source/reference/api.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _python-api:

Basic API
==========

Expand Down Expand Up @@ -74,4 +76,4 @@ EvaDBQuery Interface
~evadb.EvaDBQuery.order
~evadb.EvaDBQuery.show
~evadb.EvaDBQuery.sql_query
~evadb.EvaDBQuery.execute
~evadb.EvaDBQuery.execute
9 changes: 9 additions & 0 deletions docs/source/reference/databases/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _data-sources:

Data Sources
=============

Below are all supported data sources for EvaDB. We welcome adding new data source integrations in EvaDB. Check :ref:`add-data-source` for guidance.


.. tableofcontents::
36 changes: 36 additions & 0 deletions docs/source/reference/databases/postgres.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
PostgreSQL
==========

The connection to PostgreSQL is based on the `psycopg2 <https://pypi.org/project/psycopg2/>`_ library.

Dependency
----------

* psycopg2


Parameters
----------

Required:

* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.


Create Connection
-----------------

.. code-block:: text
CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = {
"user": "eva",
"password": "password",
"host": "localhost",
"port": "5432",
"database": "evadb"
};
31 changes: 31 additions & 0 deletions docs/source/reference/evaql/create.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,37 @@
CREATE
======

.. _sql-create-database:

CREATE DATABASE
---------------

The CREATE DATABASE statement allows us to connect to an external structured data store in EvaDB.

.. code:: text
CREATE DATABASE [database_connection]
WITH ENGINE = [database_engine],
PARAMETERS = [key_value_parameters];
* [database_connection] is the name of the database connection. `[database_connection].[table_name]` will be used as table name to compose SQL queries in EvaDB.
* [database_engine] is the supported database engine. Check :ref:`supported data sources<data-sources>` for all engine and their available configuration parameters.
* [key_value_parameters] is a list of key-value pairs as arguments to establish a connection.


Examples
~~~~~~~~

.. code:: text
CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = {
"user": "eva",
"password": "password",
"host": "localhost",
"port": "5432",
"database": "evadb"
};
CREATE TABLE
------------

Expand Down
2 changes: 2 additions & 0 deletions docs/source/reference/evaql/load.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _sql-load:

LOAD
====

Expand Down
2 changes: 2 additions & 0 deletions docs/source/reference/evaql/select.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _sql-select:

SELECT
======

Expand Down
36 changes: 36 additions & 0 deletions docs/source/reference/evaql/use.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
.. _sql-use:

USE
===

The USE statement allows us to run arbitary native queries in the connected database.

.. code:: text
USE [database_connection] { [native_query] };
* [database_connection] is an external database connection instanced by the `CREATE DATABASE statement`.
* [native_query] is an arbitary SQL query supprted by the [database_connection].

.. warning::

Currently EvaDB only supports single query in one USE statement. The [native_query] should not end with semicolon.

Examples
--------

.. code:: text
USE postgres_data {
DROP TABLE IF EXISTS food_review
};
USE postgres_data {
CREATE TABLE food_review (name VARCHAR(10), review VARCHAR(1000))
};
USE postgres_data {
INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.')
};

0 comments on commit 3f05ee8

Please sign in to comment.