Skip to content

Commit

Permalink
Documentation for Lux Architecture (#16)
Browse files Browse the repository at this point in the history
* Added Sum and Max aggregate functionality to SQL Executor

* Added Documentation for Lux Architecture
  • Loading branch information
19thyneb committed Jun 22, 2020
1 parent 4592cb8 commit f1ab029
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 4 deletions.
105 changes: 101 additions & 4 deletions doc/source/advanced/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,104 @@
System Architecture
********************************

- Overview of key component of system with architecture diagram
- Parser, Compiler, Validator
- Widget (ipywidget)
- See Jay's thesis for more info
Overview of Lux Architecture
=================================
Lux is composed of multiple modules, each with distinct responsibilities. The
architecture can be described in layers: the user interface layer, the user input validation
and parsing layer, the query processing layer, the data execution layer, and finally the
analytics layer. The principle behind this design is to take advantage of the extensibility of loosely coupled modules.

.. image:: ../lux/doc/source/guide/Lux_Architecture.PNG
:width: 400
:align: center

Lux Data Structures
=================================
In this section we introduce these essential building
blocks to provide background information before going over the rest of the system.

Lux Dataframe
--------------------------------
To benefit from the convenience of Pandas dataframes,
Lux is designed with a focus on a tight integration with Pandas.
We define the central piece to Lux's data model as the Lux Dataframe (LDF),
a subclassed Pandas dataframe that supports all dataframe operations
while housing other variables and functions for generating visual recommendations.

Spec/Context
--------------------------------
The Spec object represents a single unit of user specification. These specifications can be
attributes that designate columns or filter values that specify rows in the dataset. The LDF
stores these objects in a list named the Context, which holds all current specifications for
generating recommendations. An essential job of the LDF is to maintain the Specs within
the Context, so that generated visual recommendations are up to date with the user's input.

View/ViewCollection
--------------------------------
Since Lux maintains sets of visualizations, we require a data structure that encapsulates
each visualization and its properties so that we can score, rank and display them later. Hence,
we define a View object for each visualization as a representation of all information required
for data fetching and rendering. The LDF stores multiple Views in a View Collection, which
represents a set of visualizations to display to the user. Since data fetching
for a View is an expensive operation, Lux's Views are decoupled from the
data, making modification or transfering of Views easier during query processing stages.

Lux System
=================================
Based on established definitions of the data structures used in Lux, we overview the system
with a focus on each module. The following sections describe the life cycle of
how Lux interprets the user's analytical intent, fetches the relevant data, and performs
analytics to generate visualizations.

Widget (ipywidget)
--------------------------------
Lux outputs visualizations to Jupyter via custom widgets. These
widgets act as a framework for creating custom HTML representations of Python objects
within Jupyter. Displaying Lux's output through widgets lets us make the
visualizations interactive for users. Users can select particular visualizations of interest
and save them for later use.

Parser
--------------------------------
The Parser allows users to specify what variable relationships they are interested in exploring
without having to explicitely create Lux Specification objects.
Before any processing happens, Lux interprets user inputs to transform strings into Spec
objects for the Context. All syntax rules are applied to parse user input in this stage.

Validator
--------------------------------
Input validation catches inconsistencies between a LDF's Specs and the dataset. With
this feature, data scientists can discover mistakes early on in their exploration and make
corrections. For example, if there is a filter specification where the attribute "Origin" is
equal to "USA", the validation stage checks whether the value "USA" exists for the attribute
"Origin" in the dataset.

Compiler
--------------------------------
Lux allows users to provide the bare minimum in terms of input specifications. Therefore,
Spec objects often require additional processing before they are used for creating Views.
Underspecified information for Specs within the Context are inferred during the compilation
stage. The transformation of these Specs into Views is a three-step process.

1. **View Collection generation**: The system generates list of Views for visualization. These Views are created from Specs in the Context that are fully or partially specified. In the fully defined case, there is no ambiguity in which attributes the user wants to visualize. For partially specified instances, the system locates any Spec objects that include wildcard characters that are denoted by a question mark. These wildcard Specs are further processed to enumerate all candidate Views that hold explicit Specs. Ultimately, Lux creates a list of Views that correspond to each visualization that will be displayed in the frontend.
2. **Infer data type and data model information**: The system auto-fills missing details for each View. Each View holds Specs that correspond to the attributes for a visualization. For each of the attributes, we populate the Specs with corresponding data type information. These bits of information are necessary for encoding data into the correct visual elements.
3. **Visual Encoding**: The final step in the compilation is an automatic encoding process that determines visualization mappings. The system automatically infers type, marks, channels and additional details that can be left underspecified in the input specifications. The system implements a set of visualization encoding rules that automatically determines marks and channels of each visualization based on data properties determined in step 2, as shown in the table below.

========================== ========================== ==========================
Number of Dimensions Number of Measures Mark Type
========================== ========================== ==========================
0 1 Histogram
1 (ordinal) 0, 1 Line Chart
1 (categorical) 0, 1 Bar Chart
2 (ordinal) 0, 1 Line Chart
2 (categorical) 0, 1 Line Chart
0 2 Scatter plot
1 2 Scatter plot
0 3 Scatter plot
========================== ========================== ==========================

Executor
--------------------------------
The data executor populates each View with a subset of the dataframe based on their View
specifications. You can read more on the Lux Execution Engines' specifics
`here <https://lux-api.readthedocs.io/en/dfapi/source/guide/executor.html>`_.
Binary file added doc/source/guide/Lux_Architecture.PNG
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f1ab029

Please sign in to comment.