In [None]:
!pip install lux-api
!jupyter nbextension install --py luxwidget
!jupyter nbextension enable --py luxwidget

In [None]:
!pip install dtale

In [3]:
import lux # Lux must be imported before pandas
import pandas as pd

import dtale

In [4]:
import sklearn
from sklearn import datasets

import os
import numpy as np
using_colab = 'google.colab' in str(get_ipython())
using_binder = np.any(['binder' in x.lower() for x in os.environ.keys()])

In [5]:
dataset = sklearn.datasets.fetch_california_housing(as_frame=True)

In [6]:
df = dataset['data'] # X
df['MedHouseVal'] = dataset['target'] # y

# D-Tale

According to the developers: ``D-Tale is the combination of a Flask back-end and a React front-end to bring you an easy way to view & analyze Pandas data structures. It integrates seamlessly with ipython notebooks & python/ipython terminals. Currently this tool supports such Pandas objects as DataFrame, Series, MultiIndex, DatetimeIndex & RangeIndex.''

See the GitHub repo for up-to-date details: https://github.com/man-group/dtale

In general, this is very useful for EDA!  However, for reproducibility purposes, final analysis should always be scripted. Still, this is a very powerful tool for quickly viewing, summarizing data, identifying possible issues, and generating hypotheses - exactly what EDA is supposed to do.

In [7]:
if using_colab:
  import dtale.app as dtale_app
  dtale_app.USE_COLAB = True
elif using_binder:
  !pip install jupyter-server-proxy
  !pip install nbserverproxy
  !jupyter serverextension enable --sys-prefix jupyter_server_proxy

  import dtale.app as dtale_app
  dtale_app.JUPYTER_SERVER_PROXY = True

In [None]:
df # In Pandas

In [9]:
dtale.show(df) # In D-Tale!

https://z9ph4dn88g-496ff2e9c6d22116-40000-colab.googleusercontent.com/dtale/main/1

To look at the dataframe in a new browser, try the code below.  Note that this does not seem to work when logging
into a remote Jupyter server. However, there lots of configuration tools, and even a way to use this on Colab.
Refer to the documentation for the most up-to-date examples: https://github.com/man-group/dtale

In [10]:
# dtale.show(df).open_browser()
# You can also " > Open in New Tab"

Some things to try:
    
1. Click on the header to get a description of the rows, types, skew, outliers, etc.
2. Select 'Filter outliers' at the bottom of the pop-up to remove rows that are considered outliers.
3. Click 'Describe' to see descriptive statistics - also check out the histograms and Q-Q plot!
4. View Duplicates.
5. Convert the column type.
6. Select 'Heat Map' to highlight rows based on values.
7. Use 'Replacements' to replace NaN, etc. using various tools including sklearn imputers.
8. Use filters at the bottom to filter rows based on criteria; click on the icon to get a pop-up that lets you build more complex criteria!
9. Double click on a row to change the value!
10. Under the '>' icon in the top left, select 'Clean Columns' to explore options to clean data.
11. Under the '>' icon in the top left, select 'Feature Analysis by Correlation' to see how columns are correlated with one another.
12. Under the '>' icon in the top left, select 'Charts' to make various plots of features against each other.

D-Tale is also "smart" about inferring things from your column labels. For example, if you look at the Describe options for Latitude or Longitude, it will detect the other and give a "Geolocation" option.  Try it out!

Check out Animations under different types of Charts!

# Lux

According to the developers: ``Lux is a Python library that makes data science easier by automating certain aspects of the data exploration process. Lux is designed to facilitate faster experimentation with data, even when the user does not have a clear idea of what they are looking for.''

See the documentation for up-to-date information: https://lux-api.readthedocs.io/en/latest/index.html

See the installation instructions here: https://lux-api.readthedocs.io/en/latest/source/getting_started/installation.html

If using anaconda, install using: 

```code
$ conda activate myenv
$ conda install -c conda-forge lux-api
$ sudo mkdir /usr/local/share/jupyter; sudo chown -R $USER /usr/local/share/jupyter
$ jupyter nbextension install --py luxwidget
$ jupyter nbextension enable --py luxwidget
```

Note: You do need to **create the dataframe AFTER the import** statement (at least at the time of writing); trying to display previously created dataframes does not work.

In [11]:
if using_colab:
  from google.colab import output
  output.enable_custom_widget_manager()

In [16]:
df = dataset['data'] # X
df['MedHouseVal'] = dataset['target'] # y

In [17]:
df.default_display = "lux" # Set Lux as default display for this dataframe

In [None]:
df

By default up to 3 types of ["analytical actions"](https://lux-api.readthedocs.io/en/latest/source/reference/lux.action.html) are used to determine what is "interesting".

* The Correlation Tab shows pairwise relationships between quantitative attributes from most linearly correlated (Pearson's correlation score) to least correlated.
* The Distribution Tab shows univariate distributions ordered from most skewed to least skewed. 
* The Occurrence Tab shows up for categorical features and displays a series of bar charts. (no examples in this dataset)

You can "steer" these defaults by suggesting an "intent" described in more detail [here](https://lux-api.readthedocs.io/en/latest/source/getting_started/overview.html#steering-recommendations-via-user-intent).

In [19]:
df.intent = ['MedInc','MedHouseVal']

In [None]:
df

In [None]:
# You can select a window and click the export button (top right)
df.exported

In [None]:
# You can see recommendations of "interesting things" from Lux 
df.recommendation

In [None]:
# You can also access these plots
df.recommendation['Enhance'][0]

In [24]:
# The great thing about Lux is that you can get raw code to reproduce these plots so you can further modify them as needed
code = df.recommendation['Enhance'][0].to_code('matplotlib')

In [None]:
print(code)