In this demo, we will be analyzing the [IBM HR Employee Attrition dataset](https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset) to understand how employee characteristics might influence attrition. 

Lux is designed to be tightly integrated with Pandas and can be used as-is, without modifying your existing Pandas code. 

To enable Lux, simply add `import lux` along with your Pandas import statement.

In [1]:
import pandas as pd
import lux

In [5]:
help(lux)

Help on package lux:

NAME
    lux

DESCRIPTION
    #  Copyright 2019-2020 The Lux Authors.
    #
    #  Licensed under the Apache License, Version 2.0 (the "License");
    #  you may not use this file except in compliance with the License.
    #  You may obtain a copy of the License at
    #
    #      http://www.apache.org/licenses/LICENSE-2.0
    #
    #  Unless required by applicable law or agreed to in writing, software
    #  distributed under the License is distributed on an "AS IS" BASIS,
    #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    #  See the License for the specific language governing permissions and
    #  limitations under the License.

PACKAGE CONTENTS
    _config (package)
    _version
    action (package)
    core (package)
    executor (package)
    history (package)
    interestingness (package)
    processor (package)
    utils (package)
    vis (package)
    vislib (package)

DATA
    config = <lux._config.config.Config object>


In [2]:
# Collecting basic usage statistics for Lux (For more information, see: https://tinyurl.com/logging-consent)
lux.logger = True # Remove this line if you do not want your interactions recorded

<div>
    <img align="left" src="https://github.com/lux-org/lux-resources/blob/master/icon/table.png?raw=True" width="30">
    <h1 style="padding-left: 40px;">Visualizations of dataframes beyond simple tables</h1>
</div>

Lux preserves the Pandas dataframe semantics -- which means that you can apply any command from Pandas's API to the dataframes in Lux and expect the same behavior. For example, we can load the dataset via standard Pandas `read_csv` command.

In [3]:
df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/employee.csv")

To get an overview of the dataframe, simply print out the dataframe `df`. By clicking on the Toggle button, you can now explore the data visually through Lux. You should see three tabs of visualizations recommended to you. 

In [4]:
df

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()



The visualizations are displayed in different tabs as [actions](https://lux-api.readthedocs.io/en/latest/source/getting_started/overview.html#visualizing-dataframes-with-recommendations).
By inspecting the Correlation action, we see several salient patterns with a sharp triangular pattern. This checks out with our intuition that it is impossible for total working years to exceed the employee's age, and for any employee to stay longer at a company than their total working years.  

<div>
    <img align="left" src="https://github.com/lux-org/lux-resources/blob/master/icon/steering-wheel.png?raw=True" width="30">
    <h1 style="padding-left: 40px;">Steering analysis with intent</h1>
</div>

Let's say that we want to investigate factors that influence employee attrition. Beyond these basic recommendations, you can further specify your analysis *intent*, i.e., the data attributes and values that you are interested in visualizing. 

In [6]:
df.intent=["Attrition"]

Upon printing the dataframe again, Lux leverages the analysis intent to steer the recommendations towards what the user might be interested in.

In [7]:
df

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()



On the left, we see that the visualization based on the specified intent shows that around 15% of employees leaves the company. On the right, in the Enhance action, we learn that employees that leave typically spent less time in their working life and at the Company than their counterparts by about four years. 

ℹ️ Check out [this tutorial](https://lux-api.readthedocs.io/en/latest/source/guide/intent.html#) to learn more about how to specify intent in Lux.

<div>
    <img align="left" src="https://github.com/lux-org/lux-resources/blob/master/icon/search.png?raw=True" width="30">
    <h1 style="padding-left: 40px;">Quick inspection of 1-D Series</h1>
</div>

Based on this insight, we want to derive a new column that captures what percentage of the employee's working year have they spend at this company. We quickly divide the two columns in order to inspect the Series visualization. 

In [8]:
df["YearsAtCompany"]/df["TotalWorkingYears"]

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()



We print out the result of dividing the two columns. The Series visualization enables us to quickly verify that the data range lies between 0 and 1. Somewhat surprisingly, we also find that a large group of employees who have spent almost all of their working life at this company. 

We want to learn more, so we create a new column `%WorkingYearsAtCompany` to capture this metric.

In [9]:
df["%WorkingYearsAtCompany"]=df["YearsAtCompany"]/df["TotalWorkingYears"]

<div>
    <img align="left" src="https://github.com/lux-org/lux-resources/blob/master/icon/flask.png?raw=True" width="30">
    <h1 style="padding-left: 40px;">Quick-and-dirty experimentation with visualizations</h1>
</div>


Lux is built on the principle that users should always be able to visualize and explore anything they specify, without having to think about how the visualization should look like.

Continuing our analysis, we are interested in seeing if there are any differences in percentage of working years at the Company for young employees compared to older employees. To investigate this hypothesis, we generate a Vis object showing the relationship between `Age` and `%WorkingYearsAtCompany`. 

In [10]:
from lux.vis.Vis import Vis
Vis(["%WorkingYearsAtCompany","Age"],df)

LuxWidget(current_vis={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}, 'axis': {'labelCo…

<Vis  (x: %WorkingYearsAtCompany, y: Age) mark: scatter, score: 0.0 >

The visualization does not show a super clear trend. To look into this more, we can binarize the `Age` variable based on whether the employee is above or below the average age across employees. 

In [11]:
df["Age"]

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()



In [12]:
df["IsOld"]=df["Age"]>df["Age"].mean()

In [13]:
df["IsOld"]

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()



Now, we again look at whether young or old people tend to stay in the company for longer. The visualization generated from Lux shows that older employees actually have a shorter percentage of working year at the company. 

In [14]:
Vis(["%WorkingYearsAtCompany","IsOld"],df)

LuxWidget(current_vis={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}, 'axis': {'labelCo…

<Vis  (x: MEAN(%WorkingYearsAtCompany), y: IsOld) mark: bar, score: 0.0 >

We are also interested in whether the salary of old and young employees differs significantly. We take a look at the columns in the dataframe and see that there is a group of columns that contains the word `Rate`.

In [15]:
df.columns

Index(['Age', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department',
       'DistanceFromHome', 'Education', 'EducationField', 'EmployeeCount',
       'EmployeeNumber', 'EnvironmentSatisfaction', 'Gender', 'HourlyRate',
       'JobInvolvement', 'JobLevel', 'JobRole', 'JobSatisfaction',
       'MaritalStatus', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked',
       'Over18', 'OverTime', 'PercentSalaryHike', 'PerformanceRating',
       'RelationshipSatisfaction', 'StandardHours', 'StockOptionLevel',
       'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance',
       'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion',
       'YearsWithCurrManager', '%WorkingYearsAtCompany', 'IsOld'],
      dtype='object')

In [16]:
compensation = list(filter(lambda col: 'Rate' in col, df.columns))
compensation

['DailyRate', 'HourlyRate', 'MonthlyRate']

We then pose these compensation-related attributes against the `IsOld` variable and find that the differences in compensation is not significant across old v.s. young employees. 

In [17]:
from lux.vis.VisList import VisList

In [18]:
VisList([compensation,"IsOld"],df)

LuxWidget(recommendations=[{'action': 'Vis List', 'description': 'Shows a vis list defined by the intent', 'vs…

[<Vis  (x: MEAN(DailyRate)  , y: IsOld) mark: bar, score: 0.00 >,
 <Vis  (x: MEAN(HourlyRate) , y: IsOld) mark: bar, score: 0.00 >,
 <Vis  (x: MEAN(MonthlyRate), y: IsOld) mark: bar, score: 0.00 >]

The programmatic generation of Vis and Vislist provides a quick and dirty way to ask specific questions about the dataframe.

ℹ️ Check out [this tutorial](https://lux-api.readthedocs.io/en/latest/source/guide/vis.html) to learn more about how to create Vis and VisList in Lux.

<div>
    <img align="left" style="margin-top: -10px;" src="https://github.com/lux-org/lux-resources/blob/master/icon/present.png?raw=True" width="50">
    <h1 style="padding-left: 55px;">Exporting visualization insight to edit and share</h1>
</div>

By skimming through the recommended visualizations again, we find that distance from home to work is a important factor in employee attrition.

In [19]:
df

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()



We can click on the bar chart visualization of `DistanceFromHome` v.s. `Attrition` and export it as code.

In [20]:
vis = df.exported[0]
print(vis.to_code(language="altair"))

No visualization selected to export.
See more: https://lux-api.readthedocs.io/en/latest/source/guide/FAQ.html#troubleshooting-tips


IndexError: list index out of range

Then we can make minor stylistic changes before we can export this visualization to a slidedeck to share with colleagues.

In [None]:
import altair as alt
visData = pd.DataFrame({'Attrition': {0: 'No', 1: 'Yes'}, 
                        'DistanceFromHome': {0: 8.915652879156529, 1: 10.632911392405063}})

chart = alt.Chart(visData,title="Important Factor for Employee Attrition!!").mark_bar().encode(
    y = alt.Y('Attrition', type= 'nominal', axis=alt.Axis(labelOverlap=True, title='Attrition')),
    x = alt.X('DistanceFromHome', type= 'quantitative', axis=alt.Axis(title='Average Distance from Home')),
)
chart = chart.configure_mark(tooltip=alt.TooltipContent('encoding'))
chart = chart.configure_title(fontWeight=500,fontSize=13,font='Helvetica Neue')
chart = chart.configure_axis(titleFontWeight=500,titleFontSize=11,titleFont='Helvetica Neue',
            labelFontWeight=400,labelFontSize=8,labelFont='Helvetica Neue',labelColor='#505050')
chart = chart.configure_legend(titleFontWeight=500,titleFontSize=10,titleFont='Helvetica Neue',
            labelFontWeight=400,labelFontSize=8,labelFont='Helvetica Neue')
chart = chart.properties(width=250,height=70)

chart

ℹ️ Check out [this tutorial](https://lux-api.readthedocs.io/en/latest/source/guide/export.html) to learn more about exporting visualizations in Lux.


# Try out Lux! 

To get started, Lux can be installed through [PyPI](https://pypi.org/project/lux-api/). 

```bash
pip install lux-api
``` 


To use Lux in [Jupyter notebook](https://github.com/jupyter/notebook) or [VSCode](https://code.visualstudio.com/docs/python/jupyter-support), activate the notebook extension:

```bash
jupyter nbextension install --py luxwidget
jupyter nbextension enable --py luxwidget
```

To use Lux in [Jupyter Lab](https://github.com/jupyterlab/jupyterlab), activate the lab extension:

```bash
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install luxwidget
```

If you encounter issues with the installation, please refer to [this page](https://lux-api.readthedocs.io/en/latest/source/guide/FAQ.html#troubleshooting-tips) to troubleshoot the installation.

### More information: 

- Follow us on [Twitter](https://twitter.com/lux_api) for discussion and updates.
- Sign up for the early-user [mailing list](https://forms.gle/XKv3ejrshkCi3FJE6) to stay tuned for upcoming releases, updates, or user studies. 
- Visit [ReadTheDoc](https://lux-api.readthedocs.io/en/latest/) for more detailed documentation.
- Clone [lux-binder](https://github.com/lux-org/lux-binder) to try out these [hands-on exercises](https://github.com/lux-org/lux-binder/tree/master/exercise) or [tutorial series](https://github.com/lux-org/lux-binder/tree/master/tutorial) on how to use Lux.
- Report any bugs, issues, or requests through [Github Issues](https://github.com/lux-org/lux/issues). 

<div style="color:#bfbfbf">Icons made by <a href="https://www.freepik.com" title="Freepik" style="color:#b1bbe6">Freepik</a> from <a href="https://www.flaticon.com/" title="Flaticon" style="color:#b1bbe6">www.flaticon.com</a></div>