Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp getting started. #1016

Merged
merged 9 commits into from
Aug 31, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ parts:
sections:
- file: source/overview/getting-started/install-guide
title: Installation Guide
- file: source/overview/getting-started/data-source
title: Integrate Data Source
- file: source/overview/concepts
#- file: source/overview/faq

Expand Down Expand Up @@ -50,6 +52,12 @@ parts:
- file: source/reference/evaql/insert
- file: source/reference/evaql/delete
- file: source/reference/evaql/rename
- file: source/reference/evaql/use

- file: source/reference/databases/index
title: Data Sources
sections:
- file: source/reference/databases/postgres

- file: source/reference/udfs/index
title: Models
Expand Down
2 changes: 2 additions & 0 deletions docs/source/dev-guide/contribute.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _contributing:

Contributing
----------------

Expand Down
2 changes: 2 additions & 0 deletions docs/source/dev-guide/extend/new-data-source.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _add-data-source:

Structured Data Source Integration
==================================
This document details steps involved in adding a new structured data source integration in EvaDB.
Expand Down
59 changes: 59 additions & 0 deletions docs/source/overview/getting-started/data-source.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
Integrate Data Source
=====================

EvaDB supports an extensive data sources for both structured and unstructured data.

1. Connect to an existing structured data source.

.. code-block:: python

cursor.query("""
CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = {
"user": "eva",
"password": "password",
"host": "localhost",
"port": "5432",
"database": "evadb"
};""").df()

.. note::

Check :ref:`Create DATABASE statement<sql-create-database>` for syntax documentation and :ref:`Data Sources<data-sources>` for all supported data source engines.

The above query connects to an exsiting Postgres database, which allows us to build AI applications in EvaDB without data migration.
For example, the following query previews the available data using :ref:`SELECT<sql-select>`.

.. code-block:: python

cursor.query("SELECT * FROM postgres_data.food_review;").df()

We can also run native queries in the connected database by the :ref:`USE<sql-use>` statement.

.. code-block:: python

cursor.query("""
USE postgres_data {
INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.')
};""").df()


2. Load unstructured data. EvaDB supports a wide range of type of unstructured data. Below are some example:

.. code-block:: python

cursor.query(
"LOAD IMAGE 'reddit-images/*.jpg' INTO reddit_dataset;"
).df()

We load the local reddit image dataset into EvaDB.

.. code-block:: python

cursor.query("LOAD VIDEO 's3://bucket/eva_videos/mnist.mp4' INTO MNISTVid;").df()

We load the MNIST video from s3 bucket into EvaDB.

.. note::

Check :ref:`LOAD statement<sql-load>` for all types of supported unstructured data.

65 changes: 30 additions & 35 deletions docs/source/overview/getting-started/install-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,57 +5,52 @@ Installation Guide

EvaDB provides couple different installation options to allow easy extension to rich functionalities.

Default
Use pip
-------

By Default, EvaDB installs only the minimal requirements.
EvaDB supports Python (versions >= 3.8). We recommend installing with `pip` within an `isolated virtual environment <https://docs.python-guide.org/dev/virtualenvs/>`_.

.. code-block::
.. code-block:: bash

python -m venv evadb-venv
source evadb-venv/bin/activate
pip install --upgrade pip
pip install evadb

Vision Capability
-----------------
Install additional packages
---------------------------

You can install EvaDB with the vision extension.
With vision extension, you can run queries to do image classification, object detection, and emotion analysis workloads, etc.
* `evadb[vision]` for vision dependencies. With vision dependencies, we can run queries to do image classification, object detection, and emotion analysis workloads, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use ``evadb[vision]``. Single ` (italic) is hard to see in the doc.

* `evadb[document]` for LLM dependencies. With LLM dependencies, we can leverage the capability of LLM to summarize or do question answering for documents.
* `evadb[qdrant]` for embedding-based similarity search.
* `evadb[ludwig]` for model training and finetuning.
* `evadb[ray]` for distributed execution on ray.

.. code-block::
Install from source
-------------------

pip install evadb[vision]
.. code-block:: bash

Documents Summarization with LLM
--------------------------------
git clone https://github.com/georgia-tech-db/evadb.git
cd evadb
pip install -e .

You can also use EvaDB to leverage the capability of LLM to summarize or do question answering for documents.
.. note::

.. code-block::
Check :ref:`Contribution Guide<contributing>` for more details.

pip install evadb[document]

Additional Vector Index
-----------------------

EvaDB installs ``faiss`` vector index by default, but users can also install other index library such as ``qdrant`` for similarity search feature.

.. code-block::

pip install evadb[qdrant]

Training or Finetuning Model
----------------------------

Instead of using existing models for only inference, you can also train a customized function inside EvaDB with the ``ludwig`` extension.
Run your first SQL query in EvaDB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others look good to me. Is showing how to run EvaDB necessary here? Since we already have at getting started page.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation is to show user how to run SQL query in EvaDB since we do not have a mysql cli kind of command line client now. We probably also want to modify the Getting Started Page (example https://ludwig.ai/latest/getting_started/), after we complete all sections under that.

----------------------------------

.. code-block::
To run SQL query in EvaDB, we need to first create a `cursor` object. The following query lists all the builtin user-defined functions.

pip install evadb[ludwig]
.. code-block:: python

Better Performance and Scalability
----------------------------------
import evdb
cursor = evadb.connect().cursor()
print(cursor.query("SHOW UDFS;").df())

EvaDB also allows users to improve the query performance by using ``ray`` to parallelize queries.
.. note::

.. code-block::
Check :ref:`Python APIs<python-api>` for connection and cursor-related documentation.

pip install evadb[ray]
4 changes: 3 additions & 1 deletion docs/source/reference/api.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _python-api:

Basic API
==========

Expand Down Expand Up @@ -74,4 +76,4 @@ EvaDBQuery Interface
~evadb.EvaDBQuery.order
~evadb.EvaDBQuery.show
~evadb.EvaDBQuery.sql_query
~evadb.EvaDBQuery.execute
~evadb.EvaDBQuery.execute
9 changes: 9 additions & 0 deletions docs/source/reference/databases/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _data-sources:

Data Sources
=============

Below are all supported data sources for EvaDB. We welcome adding new data source integrations in EvaDB. Check :ref:`add-data-source` for guidance.


.. tableofcontents::
36 changes: 36 additions & 0 deletions docs/source/reference/databases/postgres.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
PostgreSQL
==========

The connection to PostgreSQL is based on the `psycopg2 <https://pypi.org/project/psycopg2/>`_ library.

Dependency
----------

* psycopg2


Parameters
----------

Required:

* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.


Create Connection
-----------------

.. code-block:: text

CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = {
"user": "eva",
"password": "password",
"host": "localhost",
"port": "5432",
"database": "evadb"
};

31 changes: 31 additions & 0 deletions docs/source/reference/evaql/create.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,37 @@
CREATE
======

.. _sql-create-database:

CREATE DATABASE
---------------

The CREATE DATABASE statement allows us to connect to an external structured data store in EvaDB.

.. code:: text

CREATE DATABASE [database_connection]
WITH ENGINE = [database_engine],
PARAMETERS = [key_value_parameters];

* [database_connection] is the name of the database connection. `[database_connection].[table_name]` will be used as table name to compose SQL queries in EvaDB.
* [database_engine] is the supported database engine. Check :ref:`supported data sources<data-sources>` for all engine and their available configuration parameters.
* [key_value_parameters] is a list of key-value pairs as arguments to establish a connection.


Examples
~~~~~~~~

.. code:: text

CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = {
"user": "eva",
"password": "password",
"host": "localhost",
"port": "5432",
"database": "evadb"
};

CREATE TABLE
------------

Expand Down
2 changes: 2 additions & 0 deletions docs/source/reference/evaql/load.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _sql-load:

LOAD
====

Expand Down
2 changes: 2 additions & 0 deletions docs/source/reference/evaql/select.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _sql-select:

SELECT
======

Expand Down
36 changes: 36 additions & 0 deletions docs/source/reference/evaql/use.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
.. _sql-use:

USE
===

The USE statement allows us to run arbitary native queries in the connected database.

.. code:: text

USE [database_connection] { [native_query] };

* [database_connection] is an external database connection instanced by the `CREATE DATABASE statement`.
* [native_query] is an arbitary SQL query supprted by the [database_connection].

.. warning::

Currently EvaDB only supports single query in one USE statement. The [native_query] should not end with semicolon.

Examples
--------

.. code:: text

USE postgres_data {
DROP TABLE IF EXISTS food_review
};

USE postgres_data {
CREATE TABLE food_review (name VARCHAR(10), review VARCHAR(1000))
};

USE postgres_data {
INSERT INTO food_review (name, review) VALUES ('Customer 1', 'I ordered fried rice but it is too salty.')
};