Skip to content

Commit

Permalink
Updating sql-engine after merge with the travis build fix (#213)
Browse files Browse the repository at this point in the history
* Similarity as a default action (#182)

* similarity formatting fixed

* added another similarity test case; fixed bug where colored heatmap dimension is temporal (invalidate all 2 msr 1 temporal case)

* filter and similarity together

* filter and similarity together

* remove filter

* black line length

* file reorg and clean; change sim metric

Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu>
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* bump numpy min version for travis

* Special character issue (#184)

* rename col

* broken

* fixed period replacement bug

* add tests

* refine tests

* refine tests

* remove cols

* fix tests

* add agg

* fixed tests

* clean up PR

Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu>
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* Colored bar interestingness bug (#189)
* rewrote chi2 contingency with pd.crosstab
* catching KeyError issue with chi2 contingency
* padding interestingness with warning instead of error
* interestingness now reuses ndim and nmsr computed in Compiler
* bug fix for parser with int values
* improve Vis repr to better display inferred intent when data is absent but fully compiled intent (all clauses)

* Add sampling parameters as a global config (#192)

* update export tutorial to add explanation for standalone argument

* minor fixes and remove cell output in notebooks

* added contributing doc

* fix bugs and uncomment some tests

* remove raise warning

* remove unnecessary import

* split up rename test into two parts

* fix setting warning, fix data_type bugs and add relevant tests

* remove ordinal data type

* add test for small dataframe resetting index

* add loc and iloc tests

* fix attribute access directly to dataframe

* add small changes to code

* added test for qcut and cut

* add check if dtype is Interval

* added qcut test

* fix Record KeyError

* add tests

* take care of reset_index case

* small edits

* add data_model to column_group Clause

* small edits for row_group

* fixes to row group

* add config for start and cap for samples

* finish sampling config and tests

* black formatting

* add documentation for sampling config

* remove small added issues

* minor changes to docs

* implement heatmap flag and add tests

* black formatting and documentation edits

Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* Coalesce all data_type attributes of frame into one (#185)

* coalesce data_types into data_type_lookup

* black reformat

* changed to better variable names

* lux not defined error

* fixed

* black format

* Update CONTRIBUTING.md

* Bug Fix: User-provided Index causes KeyError in Pandas Execution (#191)

* Moved Executor Parameters to Global Config

* Black formatting

* Moved table_name parameter to frame.py. Removed executor_type parameter

executor_type parameter no longer necessary to maintain

* Fixed reference to table_name parameter

table_name is now a parameter within frame.py

* Adjusted Functions to Set SQL Connection

Moved set_SQL_connection function to config. Added set_SQL_table function within frame.py to let users specify which database table will be associated with their dataframe

* Update SQLExecutor name parameter

* Fix Executor Reference

Update current_vis() to reference lux.config.executor

* Update frame.py

* Moved set functions to global config

* Fixed Index Issue in Pandas Executor

Issue caused when user sets an index. The Pandas Executor was not correctly renaming this new index column to Record in execute_aggregate()

* Added tests for set_index functions

* Black formatting

* Update Pandas Executor to handle NA values

Readded missing dropna parameter within execute_aggregate() groupby function call

* Updated Pandas Coverage Tests

Commented out set_index case which has not been addressed yet

* Black Formatting

* Update to Pandas Executor Index Handling

Cleaned up how execute_aggregrate renames index columns. Now retrieves the index name from vis.data instead of filtering out non-index columns.

Created separate test function for when user specifies an index in read_csv.

Co-authored-by: 19thyneb <thyne.boonmark@gmail.com>
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* Initialize Config once only during __init__ (#194)

* basic matplotlib chart example

* migrate register default action to init

* config class

* move actions

* fixed tests

* changes

* alright

* fix plot_config

* black reformat

* black reformat

Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>
Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu>
Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu>

* Update README.md

* Series Bugfix for describe and convert_dtypes (#197)

* bugfix for describe and convert_dtypes

* added back metadata series test

* black

* default to pandas display when df.dtypes printed

* Update Lux Docs (#195)

* add black to travis

* reformat all code and adjust test

* remove .idea

* fix contributing doc

* small change in contributing

* update

* reformat, update command to fix version

* remove dev dependencies

* first pass -- inline comments

* _config/config.py

* delete test notebook

* action

* line length 105

* executor

* interestingness

* processor

* vislib

* tests, travis, CONTRIBUTING

* .format
() changed

* replace tabs with escape chars

* update using black

* more rewrites and merges into single line

* update pyproject.toml and makefile

* coalesce data_types into data_type_lookup

* black reformat

* changed to better variable names

* lux not defined error

* fixed

* black format

* config doc updated

* fix link for executor

* more links

* fixed overview

* more links fixed

* pandas methods no longer included

* updates to some docstrings

* black reformat

* minor fixes

* minor fix

Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* Supporting dataframe with integer columns  (#203)

* bugfix for describe and convert_dtypes

* added back metadata series test

* black

* default to pandas display when df.dtypes printed

* various fixes to support int columns

* fixed merge conflict issues. vis.data shows None DF.

* Override Pandas DataFrames created from I/O pandas operations (#207)

* update export tutorial to add explanation for standalone argument

* minor fixes and remove cell output in notebooks

* added contributing doc

* fix bugs and uncomment some tests

* remove raise warning

* remove unnecessary import

* split up rename test into two parts

* fix setting warning, fix data_type bugs and add relevant tests

* remove ordinal data type

* add test for small dataframe resetting index

* add loc and iloc tests

* fix attribute access directly to dataframe

* add small changes to code

* added test for qcut and cut

* add check if dtype is Interval

* added qcut test

* fix Record KeyError

* add tests

* take care of reset_index case

* small edits

* add data_model to column_group Clause

* small edits for row_group

* fixes to row group

* add config for start and cap for samples

* finish sampling config and tests

* black formatting

* add documentation for sampling config

* remove small added issues

* minor changes to docs

* implement heatmap flag and add tests

* black formatting and documentation edits

* add pd.io equalities for DataFrames

Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* Merge master into sql-engine + minor mergeconflict fixes

* Removing the PYNB

* Cleaning up obsolete code

* Configuration for topk and sort order (#206)

* bugfix for describe and convert_dtypes

* added back metadata series test

* black

* default to pandas display when df.dtypes printed

* various fixes to support int columns

* skip series vis for df.iterrows series element

* config setting for modifying top K and sorting

* note about regenerated config

* Version lock for jupyter-client (#211)

* move to single requirements-dev without lux-widget install manually

* pin jedi version

* pin jupyter-client version

* add back old travis and requirement-dev

* Mixed dtype issue (#205)

* coalesce data_types into data_type_lookup

* merge fixed

* merge conflicts

* add warning and suggestion on how to fix

* formatting for warnings version

* change to internal data

* legibility update

* test added

* update test

* test updated

* xlrd in dev reqs

* black

* update link

* changes to test logic, minor string format for warning

Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* Fixes issue where value_counts was not returning LuxSeries (#210)

* add series equality and value counts test

* black formatting

* fix old value counts test instead

* minor fix

Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* bump version

* update README

Co-authored-by: Caitlyn Chen <caitlynachen@gmail.com>
Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu>
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>
Co-authored-by: Kunal Agarwal <32151899+westernguy2@users.noreply.github.com>
Co-authored-by: jinimukh <46768380+jinimukh@users.noreply.github.com>
Co-authored-by: thyneb19 <thyneboonmark@berkeley.edu>
Co-authored-by: 19thyneb <thyne.boonmark@gmail.com>
Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu>
  • Loading branch information
9 people committed Jan 10, 2021
1 parent 289f670 commit af0e742
Show file tree
Hide file tree
Showing 21 changed files with 86 additions and 39 deletions.
12 changes: 2 additions & 10 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,10 @@ python:
services:
- postgresql
install:
- pip install jupyter-client==6.1.6
- pip install -r requirements.txt
- pip install -r requirements-dev.txt
- pip install git+https://github.com/lux-org/lux-widget
#- npm i lux-widget
before_script:
- psql -c "ALTER USER postgres WITH PASSWORD 'lux';" -U postgres
- psql -c "ALTER USER postgres WITH SUPERUSER;" -U postgres
- psql -c "ALTER DATABASE postgres OWNER TO travis;"
- psql -c "DROP schema public cascade;" -U postgres
- psql -c "CREATE schema public;" -U postgres
# - psql -c "CREATE DATABASE postgres;" -U postgres
# command to upload data to test environment SQL database and run tests
# command to run tests
script:
- python lux/data/upload_car_data.py
- black --target-version py37 --line-length 105 --check .
Expand Down
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,7 @@ To use Lux in [Jupyter Lab](https://github.com/jupyterlab/jupyterlab), activate
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install luxwidget
```

Note that JupyterLab and VSCode is supported only for lux-widget version >=0.1.2, if you have an earlier version, please upgrade to the latest version of [lux-widget](https://pypi.org/project/lux-widget/). Lux currently only works with the Chrome browser.
Lux is only compatible with Jupyter Lab version 2.2.9 and below. Support for the recent [JupyterLab 3](https://blog.jupyter.org/jupyterlab-3-0-is-out-4f58385e25bb) will come soon. Note that JupyterLab and VSCode is supported only for lux-widget version >=0.1.2, if you have an earlier version, please upgrade to the latest version of [lux-widget](https://pypi.org/project/lux-widget/). Lux currently only works with the Chrome browser.

If you encounter issues with the installation, please refer to [this page](https://lux-api.readthedocs.io/en/latest/source/guide/FAQ.html#troubleshooting-tips) to troubleshoot the installation. Follow [these instructions](https://lux-api.readthedocs.io/en/latest/source/getting_started/installation.html#manual-installation-dev-setup) to set up Lux for development purposes.

Expand Down
3 changes: 1 addition & 2 deletions doc/source/reference/gen/lux.vis.VisList.VisList.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ lux.vis.VisList.VisList
.. autosummary::

~VisList.__init__
~VisList.bottomK
~VisList.get
~VisList.map
~VisList.normalize_score
Expand All @@ -23,8 +22,8 @@ lux.vis.VisList.VisList
~VisList.remove_index
~VisList.set
~VisList.set_intent
~VisList.showK
~VisList.sort
~VisList.topK



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ lux.vislib.altair.AltairChart.AltairChart
~AltairChart.apply_default_config
~AltairChart.encode_color
~AltairChart.initialize_chart
~AltairChart.sanitize_dataframe



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ lux.vislib.altair.BarChart.BarChart
~BarChart.apply_default_config
~BarChart.encode_color
~BarChart.initialize_chart
~BarChart.sanitize_dataframe



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ lux.vislib.altair.Histogram.Histogram
~Histogram.apply_default_config
~Histogram.encode_color
~Histogram.initialize_chart
~Histogram.sanitize_dataframe



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ lux.vislib.altair.LineChart.LineChart
~LineChart.apply_default_config
~LineChart.encode_color
~LineChart.initialize_chart
~LineChart.sanitize_dataframe



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ lux.vislib.altair.ScatterChart.ScatterChart
~ScatterChart.apply_default_config
~ScatterChart.encode_color
~ScatterChart.initialize_chart
~ScatterChart.sanitize_dataframe



Expand Down
3 changes: 2 additions & 1 deletion lux/_config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
For more resources, see https://github.com/pandas-dev/pandas/blob/master/pandas/_config
"""
from collections import namedtuple
from typing import Any, Callable, Dict, Iterable, List, Optional
from typing import Any, Callable, Dict, Iterable, List, Optional, Union
import lux
import warnings
import lux

Expand Down
2 changes: 1 addition & 1 deletion lux/_version.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env python
# coding: utf-8

version_info = (0, 2, 1, 2)
version_info = (0, 2, 2)
__version__ = ".".join(map(str, version_info))
3 changes: 2 additions & 1 deletion lux/action/correlation.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,8 @@ def correlation(ldf: LuxDataFrame, ignore_transpose: bool = True):
if ignore_rec_flag:
recommendation["collection"] = []
return recommendation
vlist = vlist.topK(15)
vlist.sort()
vlist = vlist.showK()
recommendation["collection"] = vlist
return recommendation

Expand Down
3 changes: 2 additions & 1 deletion lux/action/enhance.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ def enhance(ldf):
for vis in vlist:
vis.score = interestingness(vis, ldf)

vlist = vlist.topK(15)
vlist.sort()
vlist = vlist.showK()
recommendation["collection"] = vlist
return recommendation
1 change: 1 addition & 0 deletions lux/action/generalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,5 +93,6 @@ def generalize(ldf):

vlist.remove_duplicates()
vlist.sort(remove_invalid=True)
vlist._collection = list(filter(lambda x: x.score != -1, vlist._collection))
recommendation["collection"] = vlist
return recommendation
1 change: 0 additions & 1 deletion lux/action/univariate.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,6 @@ def univariate(ldf, *args):
vlist = VisList(intent, ldf)
for vis in vlist:
vis.score = interestingness(vis, ldf)
# vlist = vlist.topK(15) # Basic visualizations should not be capped
vlist.sort()
recommendation["collection"] = vlist
return recommendation
34 changes: 32 additions & 2 deletions lux/core/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,38 @@ def setOption(overridePandas=True):
if overridePandas:
pd.DataFrame = (
pd.io.json._json.DataFrame
) = pd.io.parsers.DataFrame = pd.core.frame.DataFrame = LuxDataFrame
pd.Series = LuxSeries
) = (
pd.io.parsers.DataFrame
) = (
pd.io.sql.DataFrame
) = (
pd.io.excel.DataFrame
) = (
pd.io.formats.DataFrame
) = (
pd.io.sas.DataFrame
) = (
pd.io.clipboards.DataFrame
) = (
pd.io.common.DataFrame
) = (
pd.io.feather_format.DataFrame
) = (
pd.io.gbq.DataFrame
) = (
pd.io.html.DataFrame
) = (
pd.io.orc.DataFrame
) = (
pd.io.parquet.DataFrame
) = (
pd.io.pickle.DataFrame
) = (
pd.io.pytables.DataFrame
) = (
pd.io.spss.DataFrame
) = pd.io.stata.DataFrame = pd.io.api.DataFrame = pd.core.frame.DataFrame = LuxDataFrame
pd.Series = pd.core.series.Series = LuxSeries
else:
pd.DataFrame = pd.io.parsers.DataFrame = pd.core.frame.DataFrame = originalDF
pd.Series = originalSeries
Expand Down
18 changes: 14 additions & 4 deletions lux/executor/PandasExecutor.py
Original file line number Diff line number Diff line change
Expand Up @@ -237,10 +237,20 @@ def execute_aggregate(vis: Vis, isFiltered=True):
assert (
len(list(vis.data[groupby_attr.attribute])) == N_unique_vals
), f"Aggregated data missing values compared to original range of values of `{groupby_attr.attribute}`."
vis._vis_data = vis.data.dropna(subset=[measure_attr.attribute])
vis._vis_data = vis.data.sort_values(by=groupby_attr.attribute, ascending=True)
vis._vis_data = vis.data.reset_index()
vis._vis_data = vis.data.drop(columns="index")

vis._vis_data = vis._vis_data.dropna(subset=[measure_attr.attribute])
try:
vis._vis_data = vis._vis_data.sort_values(by=groupby_attr.attribute, ascending=True)
except TypeError:
warnings.warn(
f"\nLux detects that the attribute '{groupby_attr.attribute}' maybe contain mixed type."
+ f"\nTo visualize this attribute, you may want to convert the '{groupby_attr.attribute}' into a uniform type as follows:"
+ f"\n\tdf['{groupby_attr.attribute}'] = df['{groupby_attr.attribute}'].astype(str)"
)
vis._vis_data[groupby_attr.attribute] = vis._vis_data[groupby_attr.attribute].astype(str)
vis._vis_data = vis._vis_data.sort_values(by=groupby_attr.attribute, ascending=True)
vis._vis_data = vis._vis_data.reset_index()
vis._vis_data = vis._vis_data.drop(columns="index")

@staticmethod
def execute_binning(vis: Vis):
Expand Down
22 changes: 13 additions & 9 deletions lux/vis/VisList.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,18 +233,22 @@ def sort(self, remove_invalid=True, descending=True):
# remove the items that have invalid (-1) score
if remove_invalid:
self._collection = list(filter(lambda x: x.score != -1, self._collection))
if lux.config.sort == "none":
return
elif lux.config.sort == "ascending":
descending = False
elif lux.config.sort == "descending":
descending = True
# sort in-place by “score” by default if available, otherwise user-specified field to sort by
self._collection.sort(key=lambda x: x.score, reverse=descending)

def topK(self, k):
# sort and truncate list to first K items
self.sort(remove_invalid=True)
return VisList(self._collection[:k])

def bottomK(self, k):
# sort and truncate list to first K items
self.sort(descending=False, remove_invalid=True)
return VisList(self._collection[:k])
def showK(self):
k = lux.config.topk
if k == False:
return self
elif isinstance(k, int):
k = abs(k)
return VisList(self._collection[:k])

def normalize_score(self, invert_order=False):
max_score = max(list(self.get("score")))
Expand Down
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ pytest>=5.3.1
pytest-cov>=2.8.1
Sphinx>=3.0.2
sphinx-rtd-theme>=0.4.3
xlrd
black
6 changes: 4 additions & 2 deletions tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ def random_categorical(ldf):
vlist = VisList(intent, ldf)
for vis in vlist:
vis.score = 10
vlist = vlist.topK(15)
vlist.sort()
vlist = vlist.showK()
return {
"action": "bars",
"description": "Random list of Bar charts",
Expand Down Expand Up @@ -106,7 +107,8 @@ def random_categorical(ldf):
vlist = VisList(intent, ldf)
for vis in vlist:
vis.score = 10
vlist = vlist.topK(15)
vlist.sort()
vlist = vlist.showK()
return {
"action": "bars",
"description": "Random list of Bar charts",
Expand Down
6 changes: 4 additions & 2 deletions tests/test_pandas_coverage.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
from .context import lux
import pytest
import pandas as pd
import numpy as np
import warnings

###################
# DataFrame Tests #
Expand Down Expand Up @@ -605,7 +607,7 @@ def test_value_counts(global_var):
assert df.cardinality is not None
series = df["Weight"]
series.value_counts()
assert isinstance(series, lux.core.series.LuxSeries), "Derived series is type LuxSeries."
assert type(df["Brand"].value_counts()) == lux.core.series.LuxSeries
assert df["Weight"]._metadata == [
"_intent",
"data_type",
Expand Down Expand Up @@ -677,4 +679,4 @@ def test_read_sas(global_var):
df = pd.read_sas(url, format="sas7bdat")
df._repr_html_()
assert list(df.recommendation.keys()) == ["Correlation", "Distribution", "Temporal"]
assert len(df.data_type) == 6
assert len(df.data_type) == 6
2 changes: 1 addition & 1 deletion tests/test_series.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,4 @@ def test_print_dtypes(global_var):
df = pytest.college_df
with warnings.catch_warnings(record=True) as w:
print(df.dtypes)
assert len(w) == 0, "Warning displayed when printing dtypes"
assert len(w) == 0, "Warning displayed when printing dtypes"

0 comments on commit af0e742

Please sign in to comment.