Skip to content

Commit

Permalink
Update to executor documentation. (#363)
Browse files Browse the repository at this point in the history
Co-authored-by: 19thyneb <thyne.boonmark@gmail.com>
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>
Co-authored-by: NiStannum <52202164+NiStannum@users.noreply.github.com>
  • Loading branch information
4 people committed Apr 17, 2021
1 parent eab1982 commit 084cf77
Show file tree
Hide file tree
Showing 6 changed files with 57 additions and 32 deletions.
8 changes: 5 additions & 3 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,16 @@ This website contains pages that overview of the basic and advanced functionalit
:maxdepth: 1
:caption: Advanced Topics

source/advanced/map
source/advanced/datatype
source/advanced/date
source/advanced/map
source/advanced/indexgroup
source/advanced/custom
source/advanced/architecture
source/advanced/interestingness
source/advanced/executor
source/advanced/datatype
source/advanced/interestingness



.. toctree::
:maxdepth: 1
Expand Down
6 changes: 3 additions & 3 deletions doc/source/advanced/date.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
********************************
Working with Dates
********************************
***************************************
Working with Temporal Datetime Columns
***************************************

.. note:: You can follow along this tutorial in a Jupyter notebook. [`Github <https://github.com/lux-org/lux-binder/blob/master/tutorial/tutorial/5-datetime.ipynb>`_] [`Binder <https://mybinder.org/v2/gh/lux-org/lux-binder/master?urlpath=tree/tutorial/5-datetime.ipynb>`_]

Expand Down
61 changes: 42 additions & 19 deletions doc/source/advanced/executor.rst
Original file line number Diff line number Diff line change
@@ -1,47 +1,66 @@
****************
Execution Engine
****************

Fetching the data required for generating visualizations can be computationally expensive, especially on large datasets. Lux provides a extensible framework for users to pick their own execution backend for data processing. We currently support Pandas (default, :mod:`lux.executor.PandasExecutor`) and SQL (:mod:`lux.executor.SQLExecutor`). In this tutorial, we explain how to use switch to SQL as an execution backend, as an example of how you can use a different data processing mechanism in Lux.

Please refer to :mod:`lux.executor.Executor`, if you are interested in extending Lux for your own execution backend.
**************************
Working with SQL Databases
**************************

Lux provides an extensible framework for users to pick their own execution backend for data processing. We currently support Pandas (:mod:`lux.executor.PandasExecutor`) and SQL (:mod:`lux.executor.SQLExecutor`) as the execution engine. By default, Lux leverages Pandas as its execution backend; in other words, the data processing code is performed as a set of Pandas operations on top of dataframe. In this tutorial, we further explain how Lux can be used with SQL with tables inside a Postgres database.

.. note:: You can follow a tutorial describing how Lux can be used with data inside a Postgres database in a Jupyter notebook. [`Github <https://github.com/lux-org/lux-binder-sql/blob/master/notebooks/Using%20Lux%20with%20SQL%20Databases.ipynb>`_] [`Binder <https://mybinder.org/v2/gh/lux-org/lux-binder-sql/HEAD>`_]

SQL Executor
=============

Lux extends its visualization exploration operations to data within SQL databases. By using the SQL Executor, users can specify a SQL database to connect a LuxSQLTable for generating all the visualizations recommended in Lux.
Lux extends its visualization capabilities to SQL within Postgres databases. By using the SQLExecutor, users can create a :code:`LuxSQLTable` that connects to a Postgres database. When the :code:`LuxSQLTable` object is printed out, Lux displays a subset of the data and recommends a default set of visualizations to display.

.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/SQLexecutor1.gif?raw=true
:width: 900
:align: center


What is the SQL Executor?
==========================

It is common for data to be stored within a relational database, such as Postgres.
The execution engine in Lux processes the data in order to generate the data required for the visualization. By default, Lux uses Pandas as its execution engine.
However, fetching the data required for generating visualizations can be computationally expensive. Database users may not be able to pull in the entire dataset, either due to a lack of permissions or due to the data being too large to work with on a local machine. Thus, in order to leverage Lux's capabilities, you can use the :code:`LuxSQLTable` to work with data stored inside a Postgres database. A :code:`LuxSQLTable` represents a SQL table with the Postgres database. The :code:`LuxSQLTable` contains a skeleton of the dataframe schema and does not store the entire data in the database. (Underneath the hoods, :code:`LuxSQLTable` is a database that serve as the LuxDataFrame for a table. However, note that since :code:`LuxSQLTable` is not a dataframe, you cannot use the usual Pandas Dataframe functions on :code:`LuxSQLTable`.)

Connecting Lux to a Database
----------------------------

Before Lux can operate on data within a Postgresql database, users have to connect their LuxSQLTable to their database.
To do this, users first need to specify a connection to their SQL database. This can be done using the psycopg2 package's functionality.
.. note:: To run these examples with your own Postgresql database locally, please follow `these instructions <https://github.com/lux-org/lux-binder-sql/blob/master/notebooks/Using%20Lux%20with%20SQL%20Databases.ipynb>`_ how to set up and populate the appropriate example database and table.

Before Lux can operate on data within a Postgres database, users have to connect their LuxSQLTable to their database.
To do this, users first need to specify a connection to their SQL database. This can be done using `psycopg2 <https://pypi.org/project/psycopg2/>`_ or `sqlalchemy <https://www.sqlalchemy.org/>`_ SQL database connectors, shown as follows:

.. code-block:: python
import psycopg2
connection = psycopg2.connect("dbname=example_database user=example_user, password=example_password")
connection = psycopg2.connect("dbname=postgres_db_name user=example_user password=example_user_password")
Once this connection is created, users can connect the lux config to the database using the set_SQL_connection command.
.. code-block:: python
from sqlalchemy import create_engine
engine = create_engine("postgresql://postgres:lux@localhost:5432")
Note that users will have to install these packages on their own if they want to connect Lux to their databases.
Once this connection is created, users can connect the lux config to the database using the :code:`set_SQL_connection` command.

.. code-block:: python
lux.config.set_SQL_connection(connection)
When the set_SQL_connection function is called, Lux will then populate the LuxSQLTable with all the metadata it needs to run its intent from the database table.
After the SQL connection is set, Lux fetches the details required to connect to your PostgreSQL database and generate useful recommendations.

Connecting a LuxSQLTable to a Table/View
--------------------------
----------------------------------------

LuxSQLTables can be connected to individual tables or views created within your Postgresql database. This can be done by either specifying the table/view name in the constructor.
LuxSQLTables can be connected to individual tables or views created within your Postgresql database. This can be done by specifying the table or view name in the constructor.
.. We are actively working on supporting joins between multiple tables. But as of now, the functionality is limited to one table or view per LuxSQLTable object only.
.. code-block:: python
sql_tbl = LuxSQLTable(table_name = "my_table")
You can also connect a LuxSQLTable to a table/view by using the set_SQL_table function.
Alternatively, you can also connect a LuxSQLTable to a table or view by using :code:`set_SQL_table`:

.. code-block:: python
Expand All @@ -51,17 +70,21 @@ You can also connect a LuxSQLTable to a table/view by using the set_SQL_table fu
Choosing an Executor
--------------------------


Once a user has created a connection to their Postgresql database, they need to change Lux's execution engine so that the system can collect and process the data properly.
By default Lux uses the Pandas executor to process local data in the LuxDataframe, but users will use the SQL executor when their LuxSQLTable is connected to a database.
By default, Lux uses the Pandas executor to process local data in the LuxDataframe, but users will use the SQL executor when their LuxSQLTable is connected to a database.
Users can specify the executor that Lux will use via the set_executor_type function as follows:

.. code-block:: python
lux_df.set_executor_type("SQL")
Once a LuxSQLTable has been connected to a Postgresql table and set to use the SQL Executor, users can take full advantage of Lux's visual exploration capabilities as-is. Users can set their intent to specify which variables they are most interested in and discover insightful visualizations from their database.
Once a LuxSQLTable has been connected to a Postgresql table and set to use the SQL Executor, users can take full advantage of Lux's visual exploration capabilities as-is to discover insightful visualizations from their database.

SQL Executor Limitations
--------------------------

While users can make full use of Lux's functionalities on data within a database table, they will not be able to use any of Pandas' Dataframe functions to manipulate the data in the LuxSQLTable object. Since the Lux SQL Executor delegates most data processing to the Postgresql database, it does not pull in the entire dataset into the Lux Dataframe. As such there is no actual data within the LuxSQLTable to manipulate, only the relevant metadata required to for Lux to manage its intent. Thus, if users are interested in manipulating or querying their data, this needs to be done through SQL or an alternative RDBMS interface.
While users can make full use of Lux's functionalities on data within a database table, they will not be able to use any of Pandas' Dataframe functions to manipulate the data in the LuxSQLTable object. Since the Lux SQL Executor delegates most data processing to the Postgresql database, it does not pull in the entire dataset into the Lux Dataframe. As such there is no actual data within the LuxSQLTable to manipulate, only the relevant metadata required for Lux to manage its intent. Thus, if users are interested in manipulating or querying their data, this needs to be done through SQL or an alternative RDBMS interface.

Currently, Lux's SQLExecutor does not support JOIN operation on SQL tables. Therefore, you cannot explore data and create recommended visualizations across multiple SQL tables only through Lux. We are consistently working on expanding the SQL capabilities of Lux, please let us know about how you're using the SQLExecutor and how we can improve the functionality `here <https://github.com/lux-org/lux/issues>`_ !

6 changes: 3 additions & 3 deletions doc/source/advanced/map.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
********************************
Working with Geographic Data
********************************
************************************
Working with Geographic Data Columns
************************************

This tutorial describes how geographic attributes can be visualized automatically with Lux.
Lux recognizes any columns named :code:`state` and :code:`country` that contains US States or worldwide countries as geographic attributes.
Expand Down
4 changes: 2 additions & 2 deletions doc/source/guide/export.rst
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ To allow further edits of visualizations, visualizations can be exported to code
Exporting Visualizations to Matplotlib
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We can also be export the visualization as code in `Matplotlib <https://matplotlib.org/>`_.
We can also export the visualization as code in `Matplotlib <https://matplotlib.org/>`_.

.. code-block:: python
Expand All @@ -177,7 +177,7 @@ We can also be export the visualization as code in `Matplotlib <https://matplotl
:align: center
:alt: add screenshot

This can be copy-and-pasted back into a new notebook cell for further editing.
This code can be copy-and-pasted back into a new notebook cell for further editing.

.. code-block:: python
Expand Down
4 changes: 2 additions & 2 deletions doc/source/reference/gen/lux.vis.Vis.Vis.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
lux.vis.Vis.Vis
lux.vis.Vis.Vis
===============

.. currentmodule:: lux.vis.Vis
Expand All @@ -24,9 +24,9 @@ lux.vis.Vis.Vis
~Vis.remove_filter_from_spec
~Vis.set_intent
~Vis.to_altair
~Vis.to_vegalite
~Vis.to_code
~Vis.to_matplotlib
~Vis.to_vegalite



Expand Down

0 comments on commit 084cf77

Please sign in to comment.