From 084cf77d08d755640dc1e5fd857d0f2c6a5bd8dd Mon Sep 17 00:00:00 2001 From: thyneb19 Date: Fri, 16 Apr 2021 20:39:21 -0700 Subject: [PATCH] Update to executor documentation. (#363) Co-authored-by: 19thyneb Co-authored-by: Doris Lee Co-authored-by: NiStannum <52202164+NiStannum@users.noreply.github.com> --- doc/index.rst | 8 ++- doc/source/advanced/date.rst | 6 +- doc/source/advanced/executor.rst | 61 ++++++++++++++------ doc/source/advanced/map.rst | 6 +- doc/source/guide/export.rst | 4 +- doc/source/reference/gen/lux.vis.Vis.Vis.rst | 4 +- 6 files changed, 57 insertions(+), 32 deletions(-) diff --git a/doc/index.rst b/doc/index.rst index fa9f0da9..2ed0c498 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -39,14 +39,16 @@ This website contains pages that overview of the basic and advanced functionalit :maxdepth: 1 :caption: Advanced Topics - source/advanced/map + source/advanced/datatype source/advanced/date + source/advanced/map source/advanced/indexgroup source/advanced/custom source/advanced/architecture - source/advanced/interestingness source/advanced/executor - source/advanced/datatype + source/advanced/interestingness + + .. toctree:: :maxdepth: 1 diff --git a/doc/source/advanced/date.rst b/doc/source/advanced/date.rst index 901ecf84..6ecc9c69 100644 --- a/doc/source/advanced/date.rst +++ b/doc/source/advanced/date.rst @@ -1,6 +1,6 @@ -******************************** -Working with Dates -******************************** +*************************************** +Working with Temporal Datetime Columns +*************************************** .. note:: You can follow along this tutorial in a Jupyter notebook. [`Github `_] [`Binder `_] diff --git a/doc/source/advanced/executor.rst b/doc/source/advanced/executor.rst index e5347e98..071affbb 100644 --- a/doc/source/advanced/executor.rst +++ b/doc/source/advanced/executor.rst @@ -1,47 +1,66 @@ -**************** -Execution Engine -**************** - -Fetching the data required for generating visualizations can be computationally expensive, especially on large datasets. Lux provides a extensible framework for users to pick their own execution backend for data processing. We currently support Pandas (default, :mod:`lux.executor.PandasExecutor`) and SQL (:mod:`lux.executor.SQLExecutor`). In this tutorial, we explain how to use switch to SQL as an execution backend, as an example of how you can use a different data processing mechanism in Lux. - -Please refer to :mod:`lux.executor.Executor`, if you are interested in extending Lux for your own execution backend. +************************** +Working with SQL Databases +************************** +Lux provides an extensible framework for users to pick their own execution backend for data processing. We currently support Pandas (:mod:`lux.executor.PandasExecutor`) and SQL (:mod:`lux.executor.SQLExecutor`) as the execution engine. By default, Lux leverages Pandas as its execution backend; in other words, the data processing code is performed as a set of Pandas operations on top of dataframe. In this tutorial, we further explain how Lux can be used with SQL with tables inside a Postgres database. +.. note:: You can follow a tutorial describing how Lux can be used with data inside a Postgres database in a Jupyter notebook. [`Github `_] [`Binder `_] SQL Executor ============= -Lux extends its visualization exploration operations to data within SQL databases. By using the SQL Executor, users can specify a SQL database to connect a LuxSQLTable for generating all the visualizations recommended in Lux. +Lux extends its visualization capabilities to SQL within Postgres databases. By using the SQLExecutor, users can create a :code:`LuxSQLTable` that connects to a Postgres database. When the :code:`LuxSQLTable` object is printed out, Lux displays a subset of the data and recommends a default set of visualizations to display. + +.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/SQLexecutor1.gif?raw=true + :width: 900 + :align: center + + +What is the SQL Executor? +========================== + +It is common for data to be stored within a relational database, such as Postgres. +The execution engine in Lux processes the data in order to generate the data required for the visualization. By default, Lux uses Pandas as its execution engine. +However, fetching the data required for generating visualizations can be computationally expensive. Database users may not be able to pull in the entire dataset, either due to a lack of permissions or due to the data being too large to work with on a local machine. Thus, in order to leverage Lux's capabilities, you can use the :code:`LuxSQLTable` to work with data stored inside a Postgres database. A :code:`LuxSQLTable` represents a SQL table with the Postgres database. The :code:`LuxSQLTable` contains a skeleton of the dataframe schema and does not store the entire data in the database. (Underneath the hoods, :code:`LuxSQLTable` is a database that serve as the LuxDataFrame for a table. However, note that since :code:`LuxSQLTable` is not a dataframe, you cannot use the usual Pandas Dataframe functions on :code:`LuxSQLTable`.) Connecting Lux to a Database ---------------------------- -Before Lux can operate on data within a Postgresql database, users have to connect their LuxSQLTable to their database. -To do this, users first need to specify a connection to their SQL database. This can be done using the psycopg2 package's functionality. +.. note:: To run these examples with your own Postgresql database locally, please follow `these instructions `_ how to set up and populate the appropriate example database and table. + +Before Lux can operate on data within a Postgres database, users have to connect their LuxSQLTable to their database. +To do this, users first need to specify a connection to their SQL database. This can be done using `psycopg2 `_ or `sqlalchemy `_ SQL database connectors, shown as follows: .. code-block:: python import psycopg2 - connection = psycopg2.connect("dbname=example_database user=example_user, password=example_password") + connection = psycopg2.connect("dbname=postgres_db_name user=example_user password=example_user_password") -Once this connection is created, users can connect the lux config to the database using the set_SQL_connection command. +.. code-block:: python + + from sqlalchemy import create_engine + engine = create_engine("postgresql://postgres:lux@localhost:5432") + +Note that users will have to install these packages on their own if they want to connect Lux to their databases. +Once this connection is created, users can connect the lux config to the database using the :code:`set_SQL_connection` command. .. code-block:: python lux.config.set_SQL_connection(connection) -When the set_SQL_connection function is called, Lux will then populate the LuxSQLTable with all the metadata it needs to run its intent from the database table. +After the SQL connection is set, Lux fetches the details required to connect to your PostgreSQL database and generate useful recommendations. Connecting a LuxSQLTable to a Table/View --------------------------- +---------------------------------------- -LuxSQLTables can be connected to individual tables or views created within your Postgresql database. This can be done by either specifying the table/view name in the constructor. +LuxSQLTables can be connected to individual tables or views created within your Postgresql database. This can be done by specifying the table or view name in the constructor. +.. We are actively working on supporting joins between multiple tables. But as of now, the functionality is limited to one table or view per LuxSQLTable object only. .. code-block:: python sql_tbl = LuxSQLTable(table_name = "my_table") -You can also connect a LuxSQLTable to a table/view by using the set_SQL_table function. +Alternatively, you can also connect a LuxSQLTable to a table or view by using :code:`set_SQL_table`: .. code-block:: python @@ -51,17 +70,21 @@ You can also connect a LuxSQLTable to a table/view by using the set_SQL_table fu Choosing an Executor -------------------------- + Once a user has created a connection to their Postgresql database, they need to change Lux's execution engine so that the system can collect and process the data properly. -By default Lux uses the Pandas executor to process local data in the LuxDataframe, but users will use the SQL executor when their LuxSQLTable is connected to a database. +By default, Lux uses the Pandas executor to process local data in the LuxDataframe, but users will use the SQL executor when their LuxSQLTable is connected to a database. Users can specify the executor that Lux will use via the set_executor_type function as follows: .. code-block:: python lux_df.set_executor_type("SQL") -Once a LuxSQLTable has been connected to a Postgresql table and set to use the SQL Executor, users can take full advantage of Lux's visual exploration capabilities as-is. Users can set their intent to specify which variables they are most interested in and discover insightful visualizations from their database. +Once a LuxSQLTable has been connected to a Postgresql table and set to use the SQL Executor, users can take full advantage of Lux's visual exploration capabilities as-is to discover insightful visualizations from their database. SQL Executor Limitations -------------------------- -While users can make full use of Lux's functionalities on data within a database table, they will not be able to use any of Pandas' Dataframe functions to manipulate the data in the LuxSQLTable object. Since the Lux SQL Executor delegates most data processing to the Postgresql database, it does not pull in the entire dataset into the Lux Dataframe. As such there is no actual data within the LuxSQLTable to manipulate, only the relevant metadata required to for Lux to manage its intent. Thus, if users are interested in manipulating or querying their data, this needs to be done through SQL or an alternative RDBMS interface. \ No newline at end of file +While users can make full use of Lux's functionalities on data within a database table, they will not be able to use any of Pandas' Dataframe functions to manipulate the data in the LuxSQLTable object. Since the Lux SQL Executor delegates most data processing to the Postgresql database, it does not pull in the entire dataset into the Lux Dataframe. As such there is no actual data within the LuxSQLTable to manipulate, only the relevant metadata required for Lux to manage its intent. Thus, if users are interested in manipulating or querying their data, this needs to be done through SQL or an alternative RDBMS interface. + +Currently, Lux's SQLExecutor does not support JOIN operation on SQL tables. Therefore, you cannot explore data and create recommended visualizations across multiple SQL tables only through Lux. We are consistently working on expanding the SQL capabilities of Lux, please let us know about how you're using the SQLExecutor and how we can improve the functionality `here `_ ! + diff --git a/doc/source/advanced/map.rst b/doc/source/advanced/map.rst index 94611ed0..1128ed23 100644 --- a/doc/source/advanced/map.rst +++ b/doc/source/advanced/map.rst @@ -1,6 +1,6 @@ -******************************** -Working with Geographic Data -******************************** +************************************ +Working with Geographic Data Columns +************************************ This tutorial describes how geographic attributes can be visualized automatically with Lux. Lux recognizes any columns named :code:`state` and :code:`country` that contains US States or worldwide countries as geographic attributes. diff --git a/doc/source/guide/export.rst b/doc/source/guide/export.rst index 7af15a91..c0571fd8 100644 --- a/doc/source/guide/export.rst +++ b/doc/source/guide/export.rst @@ -166,7 +166,7 @@ To allow further edits of visualizations, visualizations can be exported to code Exporting Visualizations to Matplotlib ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -We can also be export the visualization as code in `Matplotlib `_. +We can also export the visualization as code in `Matplotlib `_. .. code-block:: python @@ -177,7 +177,7 @@ We can also be export the visualization as code in `Matplotlib