Skip to content

Commit

Permalink
Configuration for topk and sort order (#206)
Browse files Browse the repository at this point in the history
* bugfix for describe and convert_dtypes

* added back metadata series test

* black

* default to pandas display when df.dtypes printed

* various fixes to support int columns

* skip series vis for df.iterrows series element

* config setting for modifying top K and sorting

* note about regenerated config
  • Loading branch information
dorisjlee committed Jan 9, 2021
1 parent 9dc0958 commit 623fb51
Show file tree
Hide file tree
Showing 19 changed files with 188 additions and 22 deletions.
55 changes: 54 additions & 1 deletion doc/source/reference/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,28 @@
Configuration Settings
***********************

In Lux, users can customize various global settings to configure the behavior of Lux through :py:class:`lux.config.Config`. This page documents some of the configurations that you can apply in Lux.
In Lux, users can customize various global settings to configure the behavior of Lux through :py:class:`lux.config.Config`. These configurations are applied across all dataframes in the session. This page documents some of the configurations that you can apply in Lux.

.. note::

Lux caches past generated recommendations, so if you have already printed the dataframe in the past, the recommendations would not be regenerated with the new config properties. In order for the config properties to apply, you would need to explicitly expire the recommendations as such:

.. code-block:: python
df = pd.read_csv("..")
df # recommendations already generated here
df.expire_recs()
lux.config.SOME_SETTING = "..."
df # recommendation will be generated again here
Alternatively, you can place the config settings before you first print out the dataframe for the first time:

.. code-block:: python
df = pd.read_csv("..")
lux.config.SOME_SETTING = "..."
df # recommendations generated for the first time with config
Change the default display of Lux
Expand Down Expand Up @@ -108,3 +129,35 @@ The above results in the following changes:

See `this page <https://lux-api.readthedocs.io/en/latest/source/guide/style.html>`__ for more details.

Modify Sorting and Ranking in Recommendations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In Lux, we select a small subset of visualizations to display in each action tab to avoid displaying too many charts at once.
Certain recommendation categories ranks and selects the top K most interesting visualizations to display.
You can modify the sorting order and selection cutoff via :code:`lux.config`.
By default, the recommendations are sorted in a :code:`"descending"` order based on their interestingness score, you can reverse the ordering by setting the sort order as:

.. code-block:: python
lux.config.sort = "ascending"
To turn off the sorting of visualizations based on its score completely and ensure that the visualizations show up in the same order across all dataframes, you can set the sorting as "none":

.. code-block:: python
lux.config.sort = "none"
For recommendation actions that generate a lot of visualizations, we select the cutoff criteria as the top 15 visualizations. If you would like to see only see the top 6 visualizations, you can set:

.. code-block:: python
lux.config.topk = 6
If you would like to turn off the selection criteria completely and display everything, you can turn off the top K selection by:

.. code-block:: python
lux.config.topk = False
Beware that this may generate large numbers of visualizations (e.g., for 10 quantitative variables, this will generate 45 scatterplots in the Correlation action!)

4 changes: 4 additions & 0 deletions doc/source/reference/gen/lux._config.config.Config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ lux.\_config.config.Config
.. autosummary::

~Config.__init__
~Config.register_action
~Config.remove_action
~Config.set_SQL_connection
~Config.set_executor_type

Expand All @@ -30,5 +32,7 @@ lux.\_config.config.Config
~Config.sampling
~Config.sampling_cap
~Config.sampling_start
~Config.sort
~Config.topk


1 change: 0 additions & 1 deletion doc/source/reference/gen/lux.core.series.LuxSeries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@ lux.core.series.LuxSeries
~LuxSeries.cumsum
~LuxSeries.describe
~LuxSeries.diff
~LuxSeries.display_pandas
~LuxSeries.div
~LuxSeries.divide
~LuxSeries.divmod
Expand Down
3 changes: 1 addition & 2 deletions doc/source/reference/gen/lux.vis.VisList.VisList.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ lux.vis.VisList.VisList
.. autosummary::

~VisList.__init__
~VisList.bottomK
~VisList.get
~VisList.map
~VisList.normalize_score
Expand All @@ -23,8 +22,8 @@ lux.vis.VisList.VisList
~VisList.remove_index
~VisList.set
~VisList.set_intent
~VisList.showK
~VisList.sort
~VisList.topK



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ lux.vislib.altair.AltairChart.AltairChart
~AltairChart.apply_default_config
~AltairChart.encode_color
~AltairChart.initialize_chart
~AltairChart.sanitize_dataframe



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ lux.vislib.altair.BarChart.BarChart
~BarChart.apply_default_config
~BarChart.encode_color
~BarChart.initialize_chart
~BarChart.sanitize_dataframe



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ lux.vislib.altair.Histogram.Histogram
~Histogram.apply_default_config
~Histogram.encode_color
~Histogram.initialize_chart
~Histogram.sanitize_dataframe



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ lux.vislib.altair.LineChart.LineChart
~LineChart.apply_default_config
~LineChart.encode_color
~LineChart.initialize_chart
~LineChart.sanitize_dataframe



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ lux.vislib.altair.ScatterChart.ScatterChart
~ScatterChart.apply_default_config
~ScatterChart.encode_color
~ScatterChart.initialize_chart
~ScatterChart.sanitize_dataframe



Expand Down
53 changes: 51 additions & 2 deletions lux/_config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
For more resources, see https://github.com/pandas-dev/pandas/blob/master/pandas/_config
"""
from collections import namedtuple
from typing import Any, Callable, Dict, Iterable, List, Optional
import warnings
from typing import Any, Callable, Dict, Iterable, List, Optional, Union
import lux
import warnings

RegisteredOption = namedtuple("RegisteredOption", "name action display_condition args")

Expand All @@ -30,6 +30,55 @@ def __init__(self):
self._sampling_cap = 30000
self._sampling_flag = True
self._heatmap_flag = True
self._topk = 15
self._sort = "descending"

@property
def topk(self):
return self._topk

@topk.setter
def topk(self, k: Union[int, bool]):
"""
Setting parameter to display top k visualizations in each action
Parameters
----------
k : Union[int,bool]
False: if display all visualizations (no top-k)
k: number of visualizations to display
"""
if isinstance(k, int) or isinstance(k, bool):
self._topk = k
else:
warnings.warn(
"Parameter to lux.config.topk must be an integer or a boolean.",
stacklevel=2,
)

@property
def sort(self):
return self._sort

@sort.setter
def sort(self, flag: Union[str]):
"""
Setting parameter to determine sort order of each action
Parameters
----------
flag : Union[str]
"none", "ascending","descending"
No sorting, sort by ascending order, sort by descending order
"""
flag = flag.lower()
if isinstance(flag, str) and flag in ["none", "ascending", "descending"]:
self._sort = flag
else:
warnings.warn(
"Parameter to lux.config.sort must be one of the following: 'none', 'ascending', or 'descending'.",
stacklevel=2,
)

@property
def sampling_cap(self):
Expand Down
3 changes: 2 additions & 1 deletion lux/action/correlation.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,8 @@ def correlation(ldf: LuxDataFrame, ignore_transpose: bool = True):
if ignore_rec_flag:
recommendation["collection"] = []
return recommendation
vlist = vlist.topK(15)
vlist.sort()
vlist = vlist.showK()
recommendation["collection"] = vlist
return recommendation

Expand Down
3 changes: 2 additions & 1 deletion lux/action/enhance.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ def enhance(ldf):
for vis in vlist:
vis.score = interestingness(vis, ldf)

vlist = vlist.topK(15)
vlist.sort()
vlist = vlist.showK()
recommendation["collection"] = vlist
return recommendation
3 changes: 2 additions & 1 deletion lux/action/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,8 @@ def get_complementary_ops(fltr_op):
vlist_copy = lux.vis.VisList.VisList(output, ldf)
for i in range(len(vlist_copy)):
vlist[i].score = interestingness(vlist_copy[i], ldf)
vlist = vlist.topK(15)
vlist.sort()
vlist = vlist.showK()
if recommendation["action"] == "Similarity":
recommendation["collection"] = vlist[1:]
else:
Expand Down
1 change: 1 addition & 0 deletions lux/action/generalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,5 +93,6 @@ def generalize(ldf):

vlist.remove_duplicates()
vlist.sort(remove_invalid=True)
vlist._collection = list(filter(lambda x: x.score != -1, vlist._collection))
recommendation["collection"] = vlist
return recommendation
1 change: 0 additions & 1 deletion lux/action/univariate.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,6 @@ def univariate(ldf, *args):
vlist = VisList(intent, ldf)
for vis in vlist:
vis.score = interestingness(vis, ldf)
# vlist = vlist.topK(15) # Basic visualizations should not be capped
vlist.sort()
recommendation["collection"] = vlist
return recommendation
6 changes: 5 additions & 1 deletion lux/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,12 @@ def __repr__(self):
ldf = LuxDataFrame(self)

try:
# Ignore recommendations when Series a results of:
# 1) Values of the series are of dtype objects (df.dtypes)
is_dtype_series = all(isinstance(val, np.dtype) for val in self.values)
if ldf._pandas_only or is_dtype_series:
# 2) Mixed type, often a result of a "row" acting as a series (df.iterrows, df.iloc[0])
mixed_dtype = len(set([type(val) for val in self.values])) > 1
if ldf._pandas_only or is_dtype_series or mixed_dtype:
print(series_repr)
ldf._pandas_only = False
else:
Expand Down
22 changes: 13 additions & 9 deletions lux/vis/VisList.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,18 +233,22 @@ def sort(self, remove_invalid=True, descending=True):
# remove the items that have invalid (-1) score
if remove_invalid:
self._collection = list(filter(lambda x: x.score != -1, self._collection))
if lux.config.sort == "none":
return
elif lux.config.sort == "ascending":
descending = False
elif lux.config.sort == "descending":
descending = True
# sort in-place by “score” by default if available, otherwise user-specified field to sort by
self._collection.sort(key=lambda x: x.score, reverse=descending)

def topK(self, k):
# sort and truncate list to first K items
self.sort(remove_invalid=True)
return VisList(self._collection[:k])

def bottomK(self, k):
# sort and truncate list to first K items
self.sort(descending=False, remove_invalid=True)
return VisList(self._collection[:k])
def showK(self):
k = lux.config.topk
if k == False:
return self
elif isinstance(k, int):
k = abs(k)
return VisList(self._collection[:k])

def normalize_score(self, invert_order=False):
max_score = max(list(self.get("score")))
Expand Down
41 changes: 39 additions & 2 deletions tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ def random_categorical(ldf):
vlist = VisList(intent, ldf)
for vis in vlist:
vis.score = 10
vlist = vlist.topK(15)
vlist.sort()
vlist = vlist.showK()
return {
"action": "bars",
"description": "Random list of Bar charts",
Expand Down Expand Up @@ -105,7 +106,8 @@ def random_categorical(ldf):
vlist = VisList(intent, ldf)
for vis in vlist:
vis.score = 10
vlist = vlist.topK(15)
vlist.sort()
vlist = vlist.showK()
return {
"action": "bars",
"description": "Random list of Bar charts",
Expand Down Expand Up @@ -235,6 +237,41 @@ def test_heatmap_flag_config():
lux.config.heatmap = True


def test_topk(global_var):
df = pd.read_csv("lux/data/college.csv")
lux.config.topk = False
df._repr_html_()
assert len(df.recommendation["Correlation"]) == 45, "Turn off top K"
lux.config.topk = 20
df = pd.read_csv("lux/data/college.csv")
df._repr_html_()
assert len(df.recommendation["Correlation"]) == 20, "Show top 20"
for vis in df.recommendation["Correlation"]:
assert vis.score > 0.2


def test_sort(global_var):
df = pd.read_csv("lux/data/college.csv")
lux.config.topk = 15
df._repr_html_()
assert len(df.recommendation["Correlation"]) == 15, "Show top 15"
for vis in df.recommendation["Correlation"]:
assert vis.score > 0.2
df = pd.read_csv("lux/data/college.csv")
lux.config.sort = "ascending"
df._repr_html_()
assert len(df.recommendation["Correlation"]) == 15, "Show bottom 15"
for vis in df.recommendation["Correlation"]:
assert vis.score < 0.2

lux.config.sort = "none"
df = pd.read_csv("lux/data/college.csv")
df._repr_html_()
scorelst = [x.score for x in df.recommendation["Distribution"]]
assert sorted(scorelst) != scorelst, "unsorted setting"
lux.config.sort = "descending"


# TODO: This test does not pass in pytest but is working in Jupyter notebook.
# def test_plot_setting(global_var):
# df = pytest.car_df
Expand Down
9 changes: 9 additions & 0 deletions tests/test_series.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,12 @@ def test_print_dtypes(global_var):
with warnings.catch_warnings(record=True) as w:
print(df.dtypes)
assert len(w) == 0, "Warning displayed when printing dtypes"


def test_print_iterrow(global_var):
df = pytest.college_df
with warnings.catch_warnings(record=True) as w:
for index, row in df.iterrows():
print(row)
break
assert len(w) == 0, "Warning displayed when printing iterrow"

0 comments on commit 623fb51

Please sign in to comment.