Updating sql-engine after merge with the travis build fix (#213)

* Similarity as a default action (#182) * similarity formatting fixed * added another similarity test case; fixed bug where colored heatmap dimension is temporal (invalidate all 2 msr 1 temporal case) * filter and similarity together * filter and similarity together * remove filter * black line length * file reorg and clean; change sim metric Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * bump numpy min version for travis * Special character issue (#184) * rename col * broken * fixed period replacement bug * add tests * refine tests * refine tests * remove cols * fix tests * add agg * fixed tests * clean up PR Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Colored bar interestingness bug (#189) * rewrote chi2 contingency with pd.crosstab * catching KeyError issue with chi2 contingency * padding interestingness with warning instead of error * interestingness now reuses ndim and nmsr computed in Compiler * bug fix for parser with int values * improve Vis repr to better display inferred intent when data is absent but fully compiled intent (all clauses) * Add sampling parameters as a global config (#192) * update export tutorial to add explanation for standalone argument * minor fixes and remove cell output in notebooks * added contributing doc * fix bugs and uncomment some tests * remove raise warning * remove unnecessary import * split up rename test into two parts * fix setting warning, fix data_type bugs and add relevant tests * remove ordinal data type * add test for small dataframe resetting index * add loc and iloc tests * fix attribute access directly to dataframe * add small changes to code * added test for qcut and cut * add check if dtype is Interval * added qcut test * fix Record KeyError * add tests * take care of reset_index case * small edits * add data_model to column_group Clause * small edits for row_group * fixes to row group * add config for start and cap for samples * finish sampling config and tests * black formatting * add documentation for sampling config * remove small added issues * minor changes to docs * implement heatmap flag and add tests * black formatting and documentation edits Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Coalesce all data_type attributes of frame into one (#185) * coalesce data_types into data_type_lookup * black reformat * changed to better variable names * lux not defined error * fixed * black format * Update CONTRIBUTING.md * Bug Fix: User-provided Index causes KeyError in Pandas Execution (#191) * Moved Executor Parameters to Global Config * Black formatting * Moved table_name parameter to frame.py. Removed executor_type parameter executor_type parameter no longer necessary to maintain * Fixed reference to table_name parameter table_name is now a parameter within frame.py * Adjusted Functions to Set SQL Connection Moved set_SQL_connection function to config. Added set_SQL_table function within frame.py to let users specify which database table will be associated with their dataframe * Update SQLExecutor name parameter * Fix Executor Reference Update current_vis() to reference lux.config.executor * Update frame.py * Moved set functions to global config * Fixed Index Issue in Pandas Executor Issue caused when user sets an index. The Pandas Executor was not correctly renaming this new index column to Record in execute_aggregate() * Added tests for set_index functions * Black formatting * Update Pandas Executor to handle NA values Readded missing dropna parameter within execute_aggregate() groupby function call * Updated Pandas Coverage Tests Commented out set_index case which has not been addressed yet * Black Formatting * Update to Pandas Executor Index Handling Cleaned up how execute_aggregrate renames index columns. Now retrieves the index name from vis.data instead of filtering out non-index columns. Created separate test function for when user specifies an index in read_csv. Co-authored-by: 19thyneb <thyne.boonmark@gmail.com> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Initialize Config once only during __init__ (#194) * basic matplotlib chart example * migrate register default action to init * config class * move actions * fixed tests * changes * alright * fix plot_config * black reformat * black reformat Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu> * Update README.md * Series Bugfix for describe and convert_dtypes (#197) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * Update Lux Docs (#195) * add black to travis * reformat all code and adjust test * remove .idea * fix contributing doc * small change in contributing * update * reformat, update command to fix version * remove dev dependencies * first pass -- inline comments * _config/config.py * delete test notebook * action * line length 105 * executor * interestingness * processor * vislib * tests, travis, CONTRIBUTING * .format () changed * replace tabs with escape chars * update using black * more rewrites and merges into single line * update pyproject.toml and makefile * coalesce data_types into data_type_lookup * black reformat * changed to better variable names * lux not defined error * fixed * black format * config doc updated * fix link for executor * more links * fixed overview * more links fixed * pandas methods no longer included * updates to some docstrings * black reformat * minor fixes * minor fix Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Supporting dataframe with integer columns (#203) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * various fixes to support int columns * fixed merge conflict issues. vis.data shows None DF. * Override Pandas DataFrames created from I/O pandas operations (#207) * update export tutorial to add explanation for standalone argument * minor fixes and remove cell output in notebooks * added contributing doc * fix bugs and uncomment some tests * remove raise warning * remove unnecessary import * split up rename test into two parts * fix setting warning, fix data_type bugs and add relevant tests * remove ordinal data type * add test for small dataframe resetting index * add loc and iloc tests * fix attribute access directly to dataframe * add small changes to code * added test for qcut and cut * add check if dtype is Interval * added qcut test * fix Record KeyError * add tests * take care of reset_index case * small edits * add data_model to column_group Clause * small edits for row_group * fixes to row group * add config for start and cap for samples * finish sampling config and tests * black formatting * add documentation for sampling config * remove small added issues * minor changes to docs * implement heatmap flag and add tests * black formatting and documentation edits * add pd.io equalities for DataFrames Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Merge master into sql-engine + minor mergeconflict fixes * Removing the PYNB * Cleaning up obsolete code * Configuration for topk and sort order (#206) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * various fixes to support int columns * skip series vis for df.iterrows series element * config setting for modifying top K and sorting * note about regenerated config * Version lock for jupyter-client (#211) * move to single requirements-dev without lux-widget install manually * pin jedi version * pin jupyter-client version * add back old travis and requirement-dev * Mixed dtype issue (#205) * coalesce data_types into data_type_lookup * merge fixed * merge conflicts * add warning and suggestion on how to fix * formatting for warnings version * change to internal data * legibility update * test added * update test * test updated * xlrd in dev reqs * black * update link * changes to test logic, minor string format for warning Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Fixes issue where value_counts was not returning LuxSeries (#210) * add series equality and value counts test * black formatting * fix old value counts test instead * minor fix Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * bump version * update README Co-authored-by: Caitlyn Chen <caitlynachen@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> Co-authored-by: Kunal Agarwal <32151899+westernguy2@users.noreply.github.com> Co-authored-by: jinimukh <46768380+jinimukh@users.noreply.github.com> Co-authored-by: thyneb19 <thyneboonmark@berkeley.edu> Co-authored-by: 19thyneb <thyne.boonmark@gmail.com> Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu>
lux-org · Jan 10, 2021 · af0e742 · af0e742
1 parent 289f670
commit af0e742
Show file tree

Hide file tree

Showing 21 changed files with 86 additions and 39 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -4,18 +4,10 @@ python:
 services:
   - postgresql
 install:
+  - pip install jupyter-client==6.1.6
   - pip install -r requirements.txt
   - pip install -r requirements-dev.txt
-  - pip install git+https://github.com/lux-org/lux-widget
-  #- npm i lux-widget
-before_script:
-   - psql -c "ALTER USER postgres WITH PASSWORD 'lux';" -U postgres 
-   - psql -c "ALTER USER postgres WITH SUPERUSER;" -U postgres
-   - psql -c "ALTER DATABASE postgres OWNER TO travis;"
-   - psql -c "DROP schema public cascade;" -U postgres
-   - psql -c "CREATE schema public;" -U postgres
-#   - psql -c "CREATE DATABASE postgres;" -U postgres
-# command to upload data to test environment SQL database and run tests
+# command to run tests
 script:
   - python lux/data/upload_car_data.py
   - black --target-version py37 --line-length 105 --check .

diff --git a/README.md b/README.md
@@ -157,8 +157,7 @@ To use Lux in [Jupyter Lab](https://github.com/jupyterlab/jupyterlab), activate
     jupyter labextension install @jupyter-widgets/jupyterlab-manager
     jupyter labextension install luxwidget
 ```
-
-Note that JupyterLab and VSCode is supported only for lux-widget version >=0.1.2, if you have an earlier version, please upgrade to the latest version of [lux-widget](https://pypi.org/project/lux-widget/). Lux currently only works with the Chrome browser. 
+Lux is only compatible with Jupyter Lab version 2.2.9 and below. Support for the recent [JupyterLab 3](https://blog.jupyter.org/jupyterlab-3-0-is-out-4f58385e25bb) will come soon. Note that JupyterLab and VSCode is supported only for lux-widget version >=0.1.2, if you have an earlier version, please upgrade to the latest version of [lux-widget](https://pypi.org/project/lux-widget/). Lux currently only works with the Chrome browser. 
 
 If you encounter issues with the installation, please refer to [this page](https://lux-api.readthedocs.io/en/latest/source/guide/FAQ.html#troubleshooting-tips) to troubleshoot the installation. Follow [these instructions](https://lux-api.readthedocs.io/en/latest/source/getting_started/installation.html#manual-installation-dev-setup) to set up Lux for development purposes.
 

diff --git a/doc/source/reference/gen/lux.vis.VisList.VisList.rst b/doc/source/reference/gen/lux.vis.VisList.VisList.rst
@@ -14,7 +14,6 @@ lux.vis.VisList.VisList
    .. autosummary::
 
       ~VisList.__init__
-      ~VisList.bottomK
       ~VisList.get
       ~VisList.map
       ~VisList.normalize_score
@@ -23,8 +22,8 @@ lux.vis.VisList.VisList
       ~VisList.remove_index
       ~VisList.set
       ~VisList.set_intent
+      ~VisList.showK
       ~VisList.sort
-      ~VisList.topK
 
 
 

diff --git a/doc/source/reference/gen/lux.vislib.altair.AltairChart.AltairChart.rst b/doc/source/reference/gen/lux.vislib.altair.AltairChart.AltairChart.rst
@@ -19,6 +19,7 @@ lux.vislib.altair.AltairChart.AltairChart
       ~AltairChart.apply_default_config
       ~AltairChart.encode_color
       ~AltairChart.initialize_chart
+      ~AltairChart.sanitize_dataframe
 
 
 

diff --git a/doc/source/reference/gen/lux.vislib.altair.BarChart.BarChart.rst b/doc/source/reference/gen/lux.vislib.altair.BarChart.BarChart.rst
@@ -20,6 +20,7 @@ lux.vislib.altair.BarChart.BarChart
       ~BarChart.apply_default_config
       ~BarChart.encode_color
       ~BarChart.initialize_chart
+      ~BarChart.sanitize_dataframe
 
 
 

diff --git a/doc/source/reference/gen/lux.vislib.altair.Histogram.Histogram.rst b/doc/source/reference/gen/lux.vislib.altair.Histogram.Histogram.rst
@@ -19,6 +19,7 @@ lux.vislib.altair.Histogram.Histogram
       ~Histogram.apply_default_config
       ~Histogram.encode_color
       ~Histogram.initialize_chart
+      ~Histogram.sanitize_dataframe
 
 
 

diff --git a/doc/source/reference/gen/lux.vislib.altair.LineChart.LineChart.rst b/doc/source/reference/gen/lux.vislib.altair.LineChart.LineChart.rst
@@ -19,6 +19,7 @@ lux.vislib.altair.LineChart.LineChart
       ~LineChart.apply_default_config
       ~LineChart.encode_color
       ~LineChart.initialize_chart
+      ~LineChart.sanitize_dataframe
 
 
 

diff --git a/doc/source/reference/gen/lux.vislib.altair.ScatterChart.ScatterChart.rst b/doc/source/reference/gen/lux.vislib.altair.ScatterChart.ScatterChart.rst
@@ -19,6 +19,7 @@ lux.vislib.altair.ScatterChart.ScatterChart
       ~ScatterChart.apply_default_config
       ~ScatterChart.encode_color
       ~ScatterChart.initialize_chart
+      ~ScatterChart.sanitize_dataframe
 
 
 

diff --git a/lux/_config/config.py b/lux/_config/config.py
@@ -3,7 +3,8 @@
 For more resources, see https://github.com/pandas-dev/pandas/blob/master/pandas/_config
 """
 from collections import namedtuple
-from typing import Any, Callable, Dict, Iterable, List, Optional
+from typing import Any, Callable, Dict, Iterable, List, Optional, Union
+import lux
 import warnings
 import lux
 

diff --git a/lux/_version.py b/lux/_version.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python
 # coding: utf-8
 
-version_info = (0, 2, 1, 2)
+version_info = (0, 2, 2)
 __version__ = ".".join(map(str, version_info))
diff --git a/lux/action/correlation.py b/lux/action/correlation.py
@@ -76,7 +76,8 @@ def correlation(ldf: LuxDataFrame, ignore_transpose: bool = True):
     if ignore_rec_flag:
         recommendation["collection"] = []
         return recommendation
-    vlist = vlist.topK(15)
+    vlist.sort()
+    vlist = vlist.showK()
     recommendation["collection"] = vlist
     return recommendation
 

diff --git a/lux/action/enhance.py b/lux/action/enhance.py
@@ -66,6 +66,7 @@ def enhance(ldf):
     for vis in vlist:
         vis.score = interestingness(vis, ldf)
 
-    vlist = vlist.topK(15)
+    vlist.sort()
+    vlist = vlist.showK()
     recommendation["collection"] = vlist
     return recommendation
diff --git a/lux/action/generalize.py b/lux/action/generalize.py
@@ -93,5 +93,6 @@ def generalize(ldf):
 
     vlist.remove_duplicates()
     vlist.sort(remove_invalid=True)
+    vlist._collection = list(filter(lambda x: x.score != -1, vlist._collection))
     recommendation["collection"] = vlist
     return recommendation
diff --git a/lux/action/univariate.py b/lux/action/univariate.py
@@ -82,7 +82,6 @@ def univariate(ldf, *args):
     vlist = VisList(intent, ldf)
     for vis in vlist:
         vis.score = interestingness(vis, ldf)
-    # vlist = vlist.topK(15) # Basic visualizations should not be capped
     vlist.sort()
     recommendation["collection"] = vlist
     return recommendation
diff --git a/lux/core/__init__.py b/lux/core/__init__.py
@@ -26,8 +26,38 @@ def setOption(overridePandas=True):
     if overridePandas:
         pd.DataFrame = (
             pd.io.json._json.DataFrame
-        ) = pd.io.parsers.DataFrame = pd.core.frame.DataFrame = LuxDataFrame
-        pd.Series = LuxSeries
+        ) = (
+            pd.io.parsers.DataFrame
+        ) = (
+            pd.io.sql.DataFrame
+        ) = (
+            pd.io.excel.DataFrame
+        ) = (
+            pd.io.formats.DataFrame
+        ) = (
+            pd.io.sas.DataFrame
+        ) = (
+            pd.io.clipboards.DataFrame
+        ) = (
+            pd.io.common.DataFrame
+        ) = (
+            pd.io.feather_format.DataFrame
+        ) = (
+            pd.io.gbq.DataFrame
+        ) = (
+            pd.io.html.DataFrame
+        ) = (
+            pd.io.orc.DataFrame
+        ) = (
+            pd.io.parquet.DataFrame
+        ) = (
+            pd.io.pickle.DataFrame
+        ) = (
+            pd.io.pytables.DataFrame
+        ) = (
+            pd.io.spss.DataFrame
+        ) = pd.io.stata.DataFrame = pd.io.api.DataFrame = pd.core.frame.DataFrame = LuxDataFrame
+        pd.Series = pd.core.series.Series = LuxSeries
     else:
         pd.DataFrame = pd.io.parsers.DataFrame = pd.core.frame.DataFrame = originalDF
         pd.Series = originalSeries

diff --git a/lux/executor/PandasExecutor.py b/lux/executor/PandasExecutor.py
@@ -237,10 +237,20 @@ def execute_aggregate(vis: Vis, isFiltered=True):
                         assert (
                             len(list(vis.data[groupby_attr.attribute])) == N_unique_vals
                         ), f"Aggregated data missing values compared to original range of values of `{groupby_attr.attribute}`."
-            vis._vis_data = vis.data.dropna(subset=[measure_attr.attribute])
-            vis._vis_data = vis.data.sort_values(by=groupby_attr.attribute, ascending=True)
-            vis._vis_data = vis.data.reset_index()
-            vis._vis_data = vis.data.drop(columns="index")
+
+            vis._vis_data = vis._vis_data.dropna(subset=[measure_attr.attribute])
+            try:
+                vis._vis_data = vis._vis_data.sort_values(by=groupby_attr.attribute, ascending=True)
+            except TypeError:
+                warnings.warn(
+                    f"\nLux detects that the attribute '{groupby_attr.attribute}' maybe contain mixed type."
+                    + f"\nTo visualize this attribute, you may want to convert the '{groupby_attr.attribute}' into a uniform type as follows:"
+                    + f"\n\tdf['{groupby_attr.attribute}'] = df['{groupby_attr.attribute}'].astype(str)"
+                )
+                vis._vis_data[groupby_attr.attribute] = vis._vis_data[groupby_attr.attribute].astype(str)
+                vis._vis_data = vis._vis_data.sort_values(by=groupby_attr.attribute, ascending=True)
+            vis._vis_data = vis._vis_data.reset_index()
+            vis._vis_data = vis._vis_data.drop(columns="index")
 
     @staticmethod
     def execute_binning(vis: Vis):

diff --git a/lux/vis/VisList.py b/lux/vis/VisList.py
@@ -233,18 +233,22 @@ def sort(self, remove_invalid=True, descending=True):
         # remove the items that have invalid (-1) score
         if remove_invalid:
             self._collection = list(filter(lambda x: x.score != -1, self._collection))
+        if lux.config.sort == "none":
+            return
+        elif lux.config.sort == "ascending":
+            descending = False
+        elif lux.config.sort == "descending":
+            descending = True
         # sort in-place by “score” by default if available, otherwise user-specified field to sort by
         self._collection.sort(key=lambda x: x.score, reverse=descending)
 
-    def topK(self, k):
-        # sort and truncate list to first K items
-        self.sort(remove_invalid=True)
-        return VisList(self._collection[:k])
-
-    def bottomK(self, k):
-        # sort and truncate list to first K items
-        self.sort(descending=False, remove_invalid=True)
-        return VisList(self._collection[:k])
+    def showK(self):
+        k = lux.config.topk
+        if k == False:
+            return self
+        elif isinstance(k, int):
+            k = abs(k)
+            return VisList(self._collection[:k])
 
     def normalize_score(self, invert_order=False):
         max_score = max(list(self.get("score")))

diff --git a/requirements-dev.txt b/requirements-dev.txt
@@ -2,4 +2,5 @@ pytest>=5.3.1
 pytest-cov>=2.8.1
 Sphinx>=3.0.2
 sphinx-rtd-theme>=0.4.3
+xlrd
 black
diff --git a/tests/test_config.py b/tests/test_config.py
@@ -29,7 +29,8 @@ def random_categorical(ldf):
         vlist = VisList(intent, ldf)
         for vis in vlist:
             vis.score = 10
-        vlist = vlist.topK(15)
+        vlist.sort()
+        vlist = vlist.showK()
         return {
             "action": "bars",
             "description": "Random list of Bar charts",
@@ -106,7 +107,8 @@ def random_categorical(ldf):
         vlist = VisList(intent, ldf)
         for vis in vlist:
             vis.score = 10
-        vlist = vlist.topK(15)
+        vlist.sort()
+        vlist = vlist.showK()
         return {
             "action": "bars",
             "description": "Random list of Bar charts",

diff --git a/tests/test_pandas_coverage.py b/tests/test_pandas_coverage.py
@@ -15,6 +15,8 @@
 from .context import lux
 import pytest
 import pandas as pd
+import numpy as np
+import warnings
 
 ###################
 # DataFrame Tests #
@@ -605,7 +607,7 @@ def test_value_counts(global_var):
     assert df.cardinality is not None
     series = df["Weight"]
     series.value_counts()
-    assert isinstance(series, lux.core.series.LuxSeries), "Derived series is type LuxSeries."
+    assert type(df["Brand"].value_counts()) == lux.core.series.LuxSeries
     assert df["Weight"]._metadata == [
         "_intent",
         "data_type",
@@ -677,4 +679,4 @@ def test_read_sas(global_var):
     df = pd.read_sas(url, format="sas7bdat")
     df._repr_html_()
     assert list(df.recommendation.keys()) == ["Correlation", "Distribution", "Temporal"]
-    assert len(df.data_type) == 6
+    assert len(df.data_type) == 6
diff --git a/tests/test_series.py b/tests/test_series.py
@@ -50,4 +50,4 @@ def test_print_dtypes(global_var):
     df = pytest.college_df
     with warnings.catch_warnings(record=True) as w:
         print(df.dtypes)
-        assert len(w) == 0, "Warning displayed when printing dtypes"
+        assert len(w) == 0, "Warning displayed when printing dtypes"