diff --git a/README.md b/README.md index 86746406..cc854e62 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ Documentation Status - + Slack @@ -147,9 +147,9 @@ conda install -c conda-forge lux-api Both the PyPI and conda installation include includes the Lux Jupyter widget frontend, [lux-widget](https://pypi.org/project/lux-widget/). -## Setup in Jupyter Notebook, VSCode +## Setup in Jupyter Notebook, VSCode, JupyterHub -To use Lux in [Jupyter notebook](https://github.com/jupyter/notebook) or [VSCode](https://code.visualstudio.com/docs/python/jupyter-support), activate the notebook extension: +To use Lux with any Jupyter notebook-based frontends (e.g., [Jupyter notebook](https://github.com/jupyter/notebook), [JupyterHub](https://github.com/jupyterhub/jupyterhub), or [VSCode](https://code.visualstudio.com/docs/python/jupyter-support)), activate the notebook extension: ```bash jupyter nbextension install --py luxwidget @@ -179,5 +179,5 @@ Other additional resources: - Sign up for the early-user [mailing list](https://forms.gle/XKv3ejrshkCi3FJE6) to stay tuned for upcoming releases, updates, or user studies. - Visit [ReadTheDoc](https://lux-api.readthedocs.io/en/latest/) for more detailed documentation. - Try out these hands-on [exercises](https://mybinder.org/v2/gh/lux-org/lux-binder/master?urlpath=tree/exercise) or [tutorials](https://mybinder.org/v2/gh/lux-org/lux-binder/master?urlpath=tree/tutorial) on [Binder](https://mybinder.org/v2/gh/lux-org/lux-binder/master). Or clone and run [lux-binder](https://github.com/lux-org/lux-binder) locally. -- Join our community [Slack](https://lux-project.slack.com/join/shared_invite/zt-lilu4e87-TM4EDTq9HWzlDRycFsrkLg) to discuss and ask questions. +- Join our community [Slack](https://communityinviter.com/apps/lux-project/lux) to discuss and ask questions. - Report any bugs, issues, or requests through [Github Issues](https://github.com/lux-org/lux/issues). diff --git a/doc/source/advanced/custom.rst b/doc/source/advanced/custom.rst index 59836ea6..cd8bf8c2 100644 --- a/doc/source/advanced/custom.rst +++ b/doc/source/advanced/custom.rst @@ -11,37 +11,51 @@ In this tutorial, we will look at how you can register custom recommendation act df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/hpi.csv") df["G10"] = df["Country"].isin(["Belgium","Canada","France","Germany","Italy","Japan","Netherlands","United Kingdom","Switzerland","Sweden","United States"]) lux.config.default_display = "lux" + +As we can see, Lux registers a set of default recommendations to display to users, such as Correlation, Distribution, etc. + +.. code-block:: python + df -As we can see, Lux displays several recommendation actions, such as Correlation and Distributions, which is globally registered by default. +.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/custom-3.png?raw=true + :width: 700 + :align: center + :alt: Displays default actions after print df. Registering Custom Actions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Let's define a custom function to generate the recommendations on the dataframe. In this example, we register a custom action called `G10` to generate a collection of visualizations that showcases numerical measures that differs significantly across `G10 `_ and non-G10 countries. In other words, we want to understand how the G10 and non-G10 countries differs based on the measures present in the dataframe. +Let's define a custom function to generate the recommendations on the dataframe. In this example, we register a custom action that showcases numerical measures that differs significantly across G10 and non-G10 countries. `G10 countries` are composed of the ten most industrialized countries in the world, so comparing G10 and non-G10 countries allows us to understand how industrialized and non-industrialized economies differs based on the measures present in the dataframe. Here, we first generate a VisList that looks at how various quantitative attributes breakdown between G10 and non-G10 countries. Then, we score and rank these visualization by calculating the percentage difference in means across G10 v.s. non-G10 countries. .. code-block:: python from lux.vis.VisList import VisList + # Create a VisList containing G10 with respect to all possible quantitative columns in the dataframe intent = [lux.Clause("?",data_type="quantitative"),lux.Clause("G10")] vlist = VisList(intent,df) - + for vis in vlist: - # Percentage Change Between G10 v.s. non-G10 countries + # Percentage Change Between G10 v.s. non-G10 countries a = vis.data.iloc[0,1] b = vis.data.iloc[1,1] vis.score = (b-a)/a - lux.config.topK = 15 - vlist = vlist.showK() + vlist.sort() + vlist.showK() + +.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/custom-0.png?raw=true + :width: 700 + :align: center + :alt: Custom VisList of G10 v.s. non G10 countries -Let's define a custom function to generate the recommendations on the dataframe. In this example, we will use G10 to generate a VisList to calculate the percentage change of means Between G10 v.s. non-G10 countries. +To define a custom action, we simply wrap our earlier VisList example into a function. We can even use short texts and emojis as the title to display on the tabs for the custom recommendation. .. code-block:: python def G10_mean_difference(ldf): - # Define a VisList of quantitative distribution between G10 and non-G10 countries + # Define a VisList of quantitative distribution between G10 and non-G10 countries intent = [lux.Clause("?",data_type="quantitative"),lux.Clause("G10")] vlist = VisList(intent,ldf) @@ -50,11 +64,13 @@ Let's define a custom function to generate the recommendations on the dataframe. a = vis.data.iloc[0,1] b = vis.data.iloc[1,1] vis.score = (b-a)/a - lux.config.topK = 15 - vlist = vlist.showK() - return {"action":"G10", "description": "Percentage Change of Means Between G10 v.s. non-G10 countries", "collection": vlist} + vlist.sort() + vlist.showK() + return {"action":"Compare 🏭🏦🌎", + "description": "Percentage Change of Means Between G10 v.s. non-G10 countries", + "collection": vlist} -In the code below, we define a display condition function to determine whether or not we want to generate recommendations for the custom action. In this example, we simply check if we are using the HPI dataset to generate recommendations for the custom action `G10`. +In the code below, we define a display condition function to determine whether or not we want to generate recommendations for the custom action. In this example, we simply check if we are using the HPI dataset to generate recommendations for the `Compare industrialized` action. .. code-block:: python @@ -68,13 +84,13 @@ In the code below, we define a display condition function to determine whether o except: return False -To register the `G10` action in Lux, we apply the `register_action` function, which takes a name and action as inputs, as well as a display condition and additional arguments as optional parameters. +To register the `Compare industrialized` action in Lux, we apply the :code:`register_action` function, which takes a name and action as inputs, as well as a display condition and additional arguments as optional parameters. .. code-block:: python - lux.config.register_action("G10", G10_mean_difference, is_G10_hpi_dataset) + lux.config.register_action("Compare industrialized", G10_mean_difference, is_G10_hpi_dataset) -After registering the action, the G10 recomendation action is automatically generated when we display the Lux dataframe again. +After registering the action, the custom action is automatically generated when we display the Lux dataframe again. .. code-block:: python @@ -93,7 +109,7 @@ Since the registered action is globally defined, the G10 action is displayed whe df[df["GDPPerCapita"]>40000] -.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/custom-1.png?raw=true +.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/custom-1-filtered.png?raw=true :width: 700 :align: center :alt: Displays countries with GDPPerCapita > 40000 to compare G10 results. @@ -103,17 +119,22 @@ As we can see, there is a less of a distinction between G10 and non-G10 countrie Navigating the Action Manager ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You can inspect a list of actions that are currently registered in the Lux Action Manager. The following code displays both default and user-defined actions. +You can inspect a list of actions that are currently registered in Lux's Action Manager. The following code displays both default and user-defined actions. .. code-block:: python lux.config.actions +.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/custom-5.png?raw=true + :width: 700 + :align: center + :alt: Retrieves a list of actions from Lux's action manager. + You can also get a single action attribute by calling this function with the action's name. .. code-block:: python - lux.config.actions.get("G10") + lux.config.actions.get("Compare industrialized") .. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/custom-2.png?raw=true :width: 700 @@ -123,19 +144,20 @@ You can also get a single action attribute by calling this function with the act Removing Custom Actions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Let's say that we are no longer in looking at the `G10` action, the `remove_action` function allows you to remove from Lux's action manager an action with its id. The action will no longer display with the Lux dataframe. +Let's say that we are no longer interested in looking at the `Compare industrialized` action, the `remove_action` function allows you to remove from Lux's action manager an action with its id. The action will no longer display with the Lux dataframe. .. code-block:: python - lux.config.remove_action("G10") + lux.config.remove_action("Compare industrialized") + +After removing the action, when we print the dataframe again, the `Compare industrialized` action is no longer displayed. -After removing the action, when we print the dataframe again, the `G10` action is no longer displayed. .. code-block:: python df -.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/custom-4.png?raw=true +.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/custom-3.png?raw=true :width: 700 :align: center :alt: Demonstrates removing custom action from Lux Action Manager. \ No newline at end of file diff --git a/doc/source/advanced/executor.rst b/doc/source/advanced/executor.rst index 93dadd84..e5347e98 100644 --- a/doc/source/advanced/executor.rst +++ b/doc/source/advanced/executor.rst @@ -11,12 +11,12 @@ Please refer to :mod:`lux.executor.Executor`, if you are interested in extending SQL Executor ============= -Lux extends its visualization exploration operations to data within SQL databases. By using the SQL Executor, users can specify a SQL database to connect a Lux Dataframe for generating all the visualizations recommended in Lux. +Lux extends its visualization exploration operations to data within SQL databases. By using the SQL Executor, users can specify a SQL database to connect a LuxSQLTable for generating all the visualizations recommended in Lux. Connecting Lux to a Database ---------------------------- -Before Lux can operate on data within a Postgresql database, users have to connect their Lux Dataframe to their database. +Before Lux can operate on data within a Postgresql database, users have to connect their LuxSQLTable to their database. To do this, users first need to specify a connection to their SQL database. This can be done using the psycopg2 package's functionality. .. code-block:: python @@ -24,28 +24,44 @@ To do this, users first need to specify a connection to their SQL database. This import psycopg2 connection = psycopg2.connect("dbname=example_database user=example_user, password=example_password") -Once this connection is created, users can connect their Lux Dataframe to the database using the Lux Dataframe's set_SQL_connection command. +Once this connection is created, users can connect the lux config to the database using the set_SQL_connection command. .. code-block:: python - lux_df.set_SQL_connection(connection, "my_table") + lux.config.set_SQL_connection(connection) -When the set_SQL_connection function is called, Lux will then populate the Dataframe with all the metadata it needs to run its intent from the database table. +When the set_SQL_connection function is called, Lux will then populate the LuxSQLTable with all the metadata it needs to run its intent from the database table. + +Connecting a LuxSQLTable to a Table/View +-------------------------- + +LuxSQLTables can be connected to individual tables or views created within your Postgresql database. This can be done by either specifying the table/view name in the constructor. + +.. code-block:: python + + sql_tbl = LuxSQLTable(table_name = "my_table") + +You can also connect a LuxSQLTable to a table/view by using the set_SQL_table function. + +.. code-block:: python + + sql_tbl = LuxSQLTable() + sql_tbl.set_SQL_table("my_table") Choosing an Executor -------------------------- Once a user has created a connection to their Postgresql database, they need to change Lux's execution engine so that the system can collect and process the data properly. -By default Lux uses the Pandas executor to process local data in the Lux Dataframe, but users need to use the SQL executor when their Lux Dataframe is connected to a database. -Users can specify the executor that a Lux Dataframe will use via the set_executor_type function as follows: +By default Lux uses the Pandas executor to process local data in the LuxDataframe, but users will use the SQL executor when their LuxSQLTable is connected to a database. +Users can specify the executor that Lux will use via the set_executor_type function as follows: .. code-block:: python lux_df.set_executor_type("SQL") -Once a Lux Dataframe has been connected to a Postgresql table and set to use the SQL Executor, users can take full advantage of Lux's visual exploration capabilities as-is. Users can set their intent to specify which variables they are most interested in and discover insightful visualizations from their database. +Once a LuxSQLTable has been connected to a Postgresql table and set to use the SQL Executor, users can take full advantage of Lux's visual exploration capabilities as-is. Users can set their intent to specify which variables they are most interested in and discover insightful visualizations from their database. SQL Executor Limitations -------------------------- -While users can make full use of Lux's functionalities on data within a database table, they will not be able to use any of Pandas' Dataframe functions to manipulate the data. Since the Lux SQL Executor delegates most data processing to the Postgresql database, it does not pull in the entire dataset into the Lux Dataframe. As such there is no actual data within the Lux Dataframe to manipulate, only the relevant metadata required to for Lux to manage its intent. Thus, if users are interested in manipulating or querying their data, this needs to be done through SQL or an alternative RDBMS interface. \ No newline at end of file +While users can make full use of Lux's functionalities on data within a database table, they will not be able to use any of Pandas' Dataframe functions to manipulate the data in the LuxSQLTable object. Since the Lux SQL Executor delegates most data processing to the Postgresql database, it does not pull in the entire dataset into the Lux Dataframe. As such there is no actual data within the LuxSQLTable to manipulate, only the relevant metadata required to for Lux to manage its intent. Thus, if users are interested in manipulating or querying their data, this needs to be done through SQL or an alternative RDBMS interface. \ No newline at end of file diff --git a/doc/source/guide/FAQ.rst b/doc/source/guide/FAQ.rst index 80095abb..cd5a0433 100644 --- a/doc/source/guide/FAQ.rst +++ b/doc/source/guide/FAQ.rst @@ -12,7 +12,7 @@ Note that you must perform :code:`import lux` before you load in or create the d What if my data is stored in a relational database? """""""""""""""""""""""""""""""""""""""""""""""""""""""" - Lux has `some limited support `__ for SQL (currently only tested for Postgres). We are actively working on extending Lux to databases. If you are interested in using this feature, please `contact us `_ for more information. + Lux has `some limited support `__ for SQL (currently only tested for Postgres). We are actively working on extending Lux to databases. If you are interested in using this feature, please `contact us `_ for more information. What do I do with date-related attributes in my dataset? """""""""""""""""""""""""""""""""""""""""""""""""""""""" @@ -153,7 +153,7 @@ I'm not able to export my visualizations via the :code:`exported` property. I have an issue that is not addressed by any of the FAQs. """""""""""""""""""""""""""""""""""""""""""""""""""""""""" -Please submit a `Github Issue `__ or ask a question on `Slack `__. +Please submit a `Github Issue `__ or ask a question on `Slack `__. .. Not Currently Supported .. - What do I do if I want to change the data type of an attribute? diff --git a/lux/_config/config.py b/lux/_config/config.py index 12db034a..89c9a941 100644 --- a/lux/_config/config.py +++ b/lux/_config/config.py @@ -364,6 +364,8 @@ def set_executor_type(self, exe): self.SQLconnection = "" self.executor = PandasExecutor() + else: + raise ValueError("Executor type must be either 'Pandas' or 'SQL'") def warning_format(message, category, filename, lineno, file=None, line=None): diff --git a/lux/action/correlation.py b/lux/action/correlation.py index 6679b538..94b7c429 100644 --- a/lux/action/correlation.py +++ b/lux/action/correlation.py @@ -62,7 +62,7 @@ def correlation(ldf: LuxDataFrame, ignore_transpose: bool = True): } ignore_rec_flag = False # Doesn't make sense to compute correlation if less than 4 data values - if ldf.length < 5: + if ldf._length < 5: ignore_rec_flag = True # Then use the data populated in the vis list to compute score for vis in vlist: diff --git a/lux/action/univariate.py b/lux/action/univariate.py index 978c726f..24cd9e66 100644 --- a/lux/action/univariate.py +++ b/lux/action/univariate.py @@ -46,9 +46,7 @@ def univariate(ldf, *args): ignore_rec_flag = False if data_type_constraint == "quantitative": possible_attributes = [ - c - for c in ldf.columns - if ldf.data_type[c] == "quantitative" and ldf.cardinality[c] > 5 and c != "Number of Records" + c for c in ldf.columns if ldf.data_type[c] == "quantitative" and c != "Number of Records" ] intent = [lux.Clause(possible_attributes)] intent.extend(filter_specs) @@ -61,18 +59,16 @@ def univariate(ldf, *args): "long_description": f"Distribution displays univariate histogram distributions of all quantitative attributes{examples}. Visualizations are ranked from most to least skewed.", } # Doesn't make sense to generate a histogram if there is less than 5 datapoints (pre-aggregated) - if ldf.length < 5: + if ldf._length < 5: ignore_rec_flag = True elif data_type_constraint == "nominal": possible_attributes = [ - c - for c in ldf.columns - if ldf.data_type[c] == "nominal" and ldf.cardinality[c] > 5 and c != "Number of Records" + c for c in ldf.columns if ldf.data_type[c] == "nominal" and c != "Number of Records" ] examples = "" if len(possible_attributes) >= 1: examples = f" (e.g., {possible_attributes[0]})" - intent = [lux.Clause("?", data_type="nominal")] + intent = [lux.Clause(possible_attributes)] intent.extend(filter_specs) recommendation = { "action": "Occurrence", @@ -81,9 +77,7 @@ def univariate(ldf, *args): } elif data_type_constraint == "geographical": possible_attributes = [ - c - for c in ldf.columns - if ldf.data_type[c] == "geographical" and ldf.cardinality[c] > 5 and c != "Number of Records" + c for c in ldf.columns if ldf.data_type[c] == "geographical" and c != "Number of Records" ] examples = "" if len(possible_attributes) >= 1: @@ -104,7 +98,7 @@ def univariate(ldf, *args): "long_description": "Temporal displays line charts for all attributes related to datetimes in the dataframe.", } # Doesn't make sense to generate a line chart if there is less than 3 datapoints (pre-aggregated) - if ldf.length < 3: + if ldf._length < 3: ignore_rec_flag = True if ignore_rec_flag: recommendation["collection"] = [] diff --git a/lux/core/frame.py b/lux/core/frame.py index 4514ed4d..82e0d4cc 100644 --- a/lux/core/frame.py +++ b/lux/core/frame.py @@ -82,6 +82,7 @@ def __init__(self, *args, **kw): self._toggle_pandas_display = True self._message = Message() self._pandas_only = False + self._length = len(self) # Metadata self._data_type = {} self.unique_values = None @@ -337,10 +338,6 @@ def current_vis(self): def current_vis(self, current_vis: Dict): self._current_vis = current_vis - def __repr__(self): - # TODO: _repr_ gets called from _repr_html, need to get rid of this call - return "" - def _append_rec(self, rec_infolist, recommendations: Dict): if recommendations["collection"] is not None and len(recommendations["collection"]) > 0: rec_infolist.append(recommendations) @@ -537,7 +534,7 @@ def set_intent_on_click(self, change): self._widget.observe(self.remove_deleted_recs, names="deletedIndices") self._widget.observe(self.set_intent_on_click, names="selectedIntentIndex") - def _repr_html_(self): + def _ipython_display_(self): from IPython.display import display from IPython.display import clear_output import ipywidgets as widgets diff --git a/lux/core/series.py b/lux/core/series.py index 3c717642..2b902730 100644 --- a/lux/core/series.py +++ b/lux/core/series.py @@ -50,7 +50,6 @@ class LuxSeries(pd.Series): "_pandas_only", "pre_aggregated", "_type_override", - "name", ] _default_metadata = { @@ -106,7 +105,7 @@ def to_pandas(self) -> pd.Series: return lux.core.originalSeries(self, copy=False) - def __repr__(self): + def _ipython_display_(self): from IPython.display import display from IPython.display import clear_output import ipywidgets as widgets @@ -189,7 +188,6 @@ def on_button_clicked(b): ) warnings.warn(traceback.format_exc()) display(self.to_pandas()) - return "" @property def recommendation(self): diff --git a/lux/core/sqltable.py b/lux/core/sqltable.py index e51b8c09..07bab347 100644 --- a/lux/core/sqltable.py +++ b/lux/core/sqltable.py @@ -31,7 +31,7 @@ class LuxSQLTable(lux.LuxDataFrame): """ - A subclass of pd.DataFrame that supports all dataframe operations while housing other variables and functions for generating visual recommendations. + A subclass of Lux.LuxDataFrame that houses other variables and functions for generating visual recommendations. Does not support normal pandas functionality. """ # MUST register here for new properties!! @@ -63,15 +63,19 @@ def __init__(self, *args, table_name="", **kw): lux.config.executor = SQLExecutor() + self._length = 0 if table_name != "": self.set_SQL_table(table_name) warnings.formatwarning = lux.warning_format + def len(self): + return self._length + def set_SQL_table(self, t_name): # function that ties the Lux Dataframe to a SQL database table if self.table_name != "": warnings.warn( - f"\nThis dataframe is already tied to a database table. Please create a new Lux dataframe and connect it to your table '{t_name}'.", + f"\nThis LuxSQLTable is already tied to a database table. Please create a new Lux dataframe and connect it to your table '{t_name}'.", stacklevel=2, ) else: @@ -88,8 +92,8 @@ def set_SQL_table(self, t_name): stacklevel=2, ) - def _repr_html_(self): - from IPython.display import HTML, display + def _ipython_display_(self): + from IPython.display import HTML, Markdown, display from IPython.display import clear_output import ipywidgets as widgets @@ -130,11 +134,28 @@ def on_button_clicked(b): if b: self._toggle_pandas_display = not self._toggle_pandas_display clear_output() + + # create connection string to display + connect_str = self.table_name + connection_type = str(type(lux.config.SQLconnection)) + if "psycopg2.extensions.connection" in connection_type: + connection_dsn = lux.config.SQLconnection.get_dsn_parameters() + host_name = connection_dsn["host"] + host_port = connection_dsn["port"] + dbname = connection_dsn["dbname"] + connect_str = host_name + ":" + host_port + "/" + dbname + + elif "sqlalchemy.engine.base.Engine" in connection_type: + db_connection = str(lux.config.SQLconnection) + db_start = db_connection.index("@") + 1 + db_end = len(db_connection) - 1 + connect_str = db_connection[db_start:db_end] + if self._toggle_pandas_display: - notification = widgets.Label( - value="Preview of the database table: " + self.table_name + notification = "Here is a preview of the **{}** database table: **{}**".format( + self.table_name, connect_str ) - display(notification, self._sampled.display_pandas()) + display(Markdown(notification), self._sampled.display_pandas()) else: # b.layout.display = "none" display(self._widget) diff --git a/lux/executor/PandasExecutor.py b/lux/executor/PandasExecutor.py index e5534821..58c8335e 100644 --- a/lux/executor/PandasExecutor.py +++ b/lux/executor/PandasExecutor.py @@ -132,6 +132,7 @@ def execute_aggregate(vis: Vis, isFiltered=True): has_color = False groupby_attr = "" measure_attr = "" + attr_unique_vals = [] if x_attr.aggregation is None or y_attr.aggregation is None: return if y_attr.aggregation != "": @@ -143,7 +144,7 @@ def execute_aggregate(vis: Vis, isFiltered=True): measure_attr = x_attr agg_func = x_attr.aggregation if groupby_attr.attribute in vis.data.unique_values.keys(): - attr_unique_vals = vis.data.unique_values[groupby_attr.attribute] + attr_unique_vals = vis.data.unique_values.get(groupby_attr.attribute) # checks if color is specified in the Vis if len(vis.get_attr_by_channel("color")) == 1: color_attr = vis.get_attr_by_channel("color")[0] @@ -426,7 +427,7 @@ def compute_data_type(self, ldf: LuxDataFrame): if ( convertible2int and ldf.cardinality[attr] != len(ldf) - and ldf.cardinality[attr] < 20 + and (len(ldf[attr].convert_dtypes().unique() < 20)) ): ldf._data_type[attr] = "nominal" else: @@ -515,7 +516,7 @@ def compute_stats(self, ldf: LuxDataFrame): ldf.unique_values = {} ldf._min_max = {} ldf.cardinality = {} - ldf.length = len(ldf) + ldf._length = len(ldf) for attribute in ldf.columns: @@ -525,8 +526,11 @@ def compute_stats(self, ldf: LuxDataFrame): else: attribute_repr = attribute - ldf.unique_values[attribute_repr] = list(ldf[attribute_repr].unique()) - ldf.cardinality[attribute_repr] = len(ldf.unique_values[attribute_repr]) + if ldf.dtypes[attribute] != "float64" or ldf[attribute].isnull().values.any(): + ldf.unique_values[attribute_repr] = list(ldf[attribute].unique()) + ldf.cardinality[attribute_repr] = len(ldf.unique_values[attribute]) + else: + ldf.cardinality[attribute_repr] = 999 # special value for non-numeric attribute if pd.api.types.is_float_dtype(ldf.dtypes[attribute]) or pd.api.types.is_integer_dtype( ldf.dtypes[attribute] diff --git a/lux/executor/SQLExecutor.py b/lux/executor/SQLExecutor.py index abf43f06..5031a4df 100644 --- a/lux/executor/SQLExecutor.py +++ b/lux/executor/SQLExecutor.py @@ -133,7 +133,7 @@ def add_quotes(var_name): query = "SELECT {} FROM {} {}".format(required_variables, tbl.table_name, where_clause) data = pandas.read_sql(query, lux.config.SQLconnection) view._vis_data = utils.pandas_to_lux(data) - view._vis_data.length = list(length_query["length"])[0] + # view._vis_data.length = list(length_query["length"])[0] tbl._message.add_unique( f"Large scatterplots detected: Lux is automatically binning scatterplots to heatmaps.", @@ -217,7 +217,7 @@ def execute_aggregate(view: Vis, tbl: LuxSQLTable, isFiltered=True): view._vis_data = pandas.read_sql(count_query, lux.config.SQLconnection) view._vis_data = view._vis_data.rename(columns={"count": "Record"}) view._vis_data = utils.pandas_to_lux(view._vis_data) - view._vis_data.length = list(length_query["length"])[0] + # view._vis_data.length = list(length_query["length"])[0] # aggregate barchart case, need aggregate data (mean, sum, max) for each group else: where_clause, filterVars = SQLExecutor.execute_filter(view) @@ -358,7 +358,7 @@ def execute_aggregate(view: Vis, tbl: LuxSQLTable, isFiltered=True): view._vis_data = view._vis_data.sort_values(by=groupby_attr.attribute, ascending=True) view._vis_data = view._vis_data.reset_index() view._vis_data = view._vis_data.drop(columns="index") - view._vis_data.length = list(length_query["length"])[0] + # view._vis_data.length = list(length_query["length"])[0] @staticmethod def execute_binning(view: Vis, tbl: LuxSQLTable): @@ -443,7 +443,7 @@ def execute_binning(view: Vis, tbl: LuxSQLTable): columns=[bin_attribute.attribute, "Number of Records"], ) view._vis_data = utils.pandas_to_lux(view.data) - view._vis_data.length = list(length_query["length"])[0] + # view._vis_data.length = list(length_query["length"])[0] @staticmethod def execute_2D_binning(view: Vis, tbl: LuxSQLTable): @@ -535,9 +535,13 @@ def execute_filter(view: Vis): filter_vars: list of strings list of variables that have been used as filters """ - where_clause = [] filters = utils.get_filter_specs(view._inferred_intent) + return SQLExecutor.create_where_clause(filters, view=view) + + def create_where_clause(filter_specs, view=""): + where_clause = [] filter_vars = [] + filters = filter_specs if filters: for f in range(0, len(filters)): if f == 0: @@ -555,23 +559,23 @@ def execute_filter(view: Vis): ) if filters[f].attribute not in filter_vars: filter_vars.append(filters[f].attribute) - - attributes = utils.get_attrs_specs(view._inferred_intent) - - # need to ensure that no null values are included in the data - # null values breaks binning queries - for a in attributes: - if a.attribute != "Record": - if where_clause == []: - where_clause.append("WHERE") - else: - where_clause.append("AND") - where_clause.extend( - [ - '"' + str(a.attribute) + '"', - "IS NOT NULL", - ] - ) + if view != "": + attributes = utils.get_attrs_specs(view._inferred_intent) + + # need to ensure that no null values are included in the data + # null values breaks binning queries + for a in attributes: + if a.attribute != "Record": + if where_clause == []: + where_clause.append("WHERE") + else: + where_clause.append("AND") + where_clause.extend( + [ + '"' + str(a.attribute) + '"', + "IS NOT NULL", + ] + ) if where_clause == []: return ("", []) @@ -579,6 +583,16 @@ def execute_filter(view: Vis): where_clause = " ".join(where_clause) return (where_clause, filter_vars) + def get_filtered_size(filter_specs, tbl): + clause_info = SQLExecutor.create_where_clause(filter_specs=filter_specs, view="") + where_clause = clause_info[0] + filter_intents = filter_specs[0] + filtered_length = pandas.read_sql( + "SELECT COUNT(1) as length FROM {} {}".format(tbl.table_name, where_clause), + lux.config.SQLconnection, + ) + return list(filtered_length["length"])[0] + ####################################################### ########## Metadata, type, model schema ############### ####################################################### @@ -652,7 +666,7 @@ def compute_stats(self, tbl: LuxSQLTable): "SELECT COUNT(1) as length FROM {}".format(tbl.table_name), lux.config.SQLconnection, ) - tbl.length = list(length_query["length"])[0] + tbl._length = list(length_query["length"])[0] self.get_unique_values(tbl) for attribute in tbl.columns: diff --git a/lux/interestingness/interestingness.py b/lux/interestingness/interestingness.py index 32b0f1db..33a06545 100644 --- a/lux/interestingness/interestingness.py +++ b/lux/interestingness/interestingness.py @@ -231,9 +231,11 @@ def deviation_from_overall( vdata = vis.data v_filter_size = get_filtered_size(filter_specs, ldf) v_size = len(vis.data) - else: - v_filter_size = vis._vis_data.length - v_size = ldf.length + elif lux.config.executor.name == "SQLExecutor": + from lux.executor.SQLExecutor import SQLExecutor + + v_filter_size = SQLExecutor.get_filtered_size(filter_specs, ldf) + v_size = ldf.len() vdata = vis.data v_filter = vdata[msr_attribute] total = v_filter.sum() @@ -360,7 +362,7 @@ def monotonicity(vis: Vis, attr_specs: list, ignore_identity: bool = True) -> in warnings.filterwarnings("error") try: score = np.abs(pearsonr(v_x, v_y)[0]) - except (RuntimeWarning): + except: # RuntimeWarning: invalid value encountered in true_divide (occurs when v_x and v_y are uniform, stdev in denominator is zero, leading to spearman's correlation as nan), ignore these cases. score = -1 diff --git a/lux/vis/Vis.py b/lux/vis/Vis.py index f70abd71..77b26c38 100644 --- a/lux/vis/Vis.py +++ b/lux/vis/Vis.py @@ -111,7 +111,7 @@ def set_intent(self, intent: List[Clause]) -> None: self._intent = intent self.refresh_source(self._source) - def _repr_html_(self): + def _ipython_display_(self): from IPython.display import display check_import_lux_widget() @@ -351,17 +351,7 @@ def refresh_source(self, ldf): # -> Vis: self._source = ldf self._inferred_intent = Parser.parse(self._intent) Validator.validate_intent(self._inferred_intent, ldf) - vlist = [Compiler.compile_vis(ldf, self)] - lux.config.executor.execute(vlist, ldf) - # Copying properties over since we can not redefine `self` within class function - if len(vlist) > 0: - vis = vlist[0] - self.title = vis.title - self._mark = vis._mark - self._inferred_intent = vis._inferred_intent - self._vis_data = vis.data - self._min_max = vis._min_max - self._postbin = vis._postbin + Compiler.compile_vis(ldf, self) lux.config.executor.execute([self], ldf) diff --git a/lux/vis/VisList.py b/lux/vis/VisList.py index e3bdfa3e..d1746f68 100644 --- a/lux/vis/VisList.py +++ b/lux/vis/VisList.py @@ -257,7 +257,7 @@ def normalize_score(self, invert_order=False): if invert_order: dobj.score = 1 - dobj.score - def _repr_html_(self): + def _ipython_display_(self): self._widget = None from IPython.display import display from lux.core.frame import LuxDataFrame diff --git a/lux/vislib/altair/Choropleth.py b/lux/vislib/altair/Choropleth.py index bf71b010..37bf8695 100644 --- a/lux/vislib/altair/Choropleth.py +++ b/lux/vislib/altair/Choropleth.py @@ -69,9 +69,12 @@ def initialize_chart(self): alt.Chart(geo_map) .mark_geoshape() .encode( - color=f"{y_attr_abv}:Q", + color=f"{str(y_attr.attribute)}:Q", + ) + .transform_lookup( + lookup="id", + from_=alt.LookupData(self.data, str(x_attr.attribute), [str(y_attr.attribute)]), ) - .transform_lookup(lookup="id", from_=alt.LookupData(self.data, x_attr_abv, [y_attr_abv])) .project(type=map_type) .properties( width=width, height=height, title=f"Mean of {y_attr_abv} across {geographical_name}" @@ -91,10 +94,10 @@ def initialize_chart(self): background = {background_str} points = alt.Chart({geo_map_str}).mark_geoshape().encode( - color='{y_attr_abv}:Q', + color='{str(y_attr.attribute)}:Q', ).transform_lookup( lookup='id', - from_=alt.LookupData({dfname}, "{x_attr_abv}", ["{y_attr_abv}"]) + from_=alt.LookupData({dfname}, "{str(x_attr.attribute)}", ["{str(y_attr.attribute)}"]) ).project( type="{map_type}" ).properties( diff --git a/lux/vislib/altair/Histogram.py b/lux/vislib/altair/Histogram.py index 9673ac9b..01d846ce 100644 --- a/lux/vislib/altair/Histogram.py +++ b/lux/vislib/altair/Histogram.py @@ -53,7 +53,7 @@ def initialize_chart(self): # Default when bin too small if markbar < (x_range / 24): - markbar = (x_max - x_min) / 12 + markbar = x_max - x_min / 12 self.data = AltairChart.sanitize_dataframe(self.data) end_attr_abv = str(msr_attr.attribute) + "_end" diff --git a/tests/conftest.py b/tests/conftest.py index da02ea60..8ee3ddbb 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -30,5 +30,4 @@ def global_var(): "_pandas_only", "pre_aggregated", "_type_override", - "name", ] diff --git a/tests/test_action.py b/tests/test_action.py index 893c13b7..5e22938c 100644 --- a/tests/test_action.py +++ b/tests/test_action.py @@ -23,7 +23,7 @@ def test_vary_filter_val(global_var): df = pytest.olympic vis = Vis(["Height", "SportType=Ball"], df) df.set_intent_as_vis(vis) - df._repr_html_() + df._ipython_display_() assert len(df.recommendation["Filter"]) == len(df["SportType"].unique()) - 1 linechart = list(filter(lambda x: x.mark == "line", df.recommendation["Enhance"]))[0] assert ( @@ -42,7 +42,7 @@ def test_filter_inequality(global_var): lux.Clause(attribute="Acceleration", filter_op=">", value=10), ] ) - df._repr_html_() + df._ipython_display_() from lux.utils.utils import get_filter_specs @@ -59,7 +59,7 @@ def test_generalize_action(global_var): df["Year"], format="%Y" ) # change pandas dtype for the column "Year" to datetype df.set_intent(["Acceleration", "MilesPerGal", "Cylinders", "Origin=USA"]) - df._repr_html_() + df._ipython_display_() assert len(df.recommendation["Generalize"]) == 4 v1 = df.recommendation["Generalize"][0] v2 = df.recommendation["Generalize"][1] @@ -86,14 +86,14 @@ def test_row_column_group(global_var): tseries[tseries.columns.min()] = tseries[tseries.columns.min()].fillna(0) tseries[tseries.columns.max()] = tseries[tseries.columns.max()].fillna(tseries.max(axis=1)) tseries = tseries.interpolate("zero", axis=1) - tseries._repr_html_() + tseries._ipython_display_() assert list(tseries.recommendation.keys()) == ["Temporal"] def test_groupby(global_var): df = pytest.college_df groupbyResult = df.groupby("Region").sum() - groupbyResult._repr_html_() + groupbyResult._ipython_display_() assert list(groupbyResult.recommendation.keys()) == ["Column Groups"] @@ -160,7 +160,7 @@ def test_crosstab(): df = pd.DataFrame(d, columns=["Name", "Exam", "Subject", "Result"]) result = pd.crosstab([df.Exam], df.Result) - result._repr_html_() + result._ipython_display_() assert list(result.recommendation.keys()) == ["Row Groups", "Column Groups"] @@ -169,7 +169,7 @@ def test_custom_aggregation(global_var): df = pytest.college_df df.set_intent(["HighestDegree", lux.Clause("AverageCost", aggregation=np.ptp)]) - df._repr_html_() + df._ipython_display_() assert list(df.recommendation.keys()) == ["Enhance", "Filter", "Generalize"] df.clear_intent() @@ -178,7 +178,7 @@ def test_year_filter_value(global_var): df = pytest.car_df df["Year"] = pd.to_datetime(df["Year"], format="%Y") df.set_intent(["Acceleration", "Horsepower"]) - df._repr_html_() + df._ipython_display_() list_of_vis_with_year_filter = list( filter( lambda vis: len( @@ -210,7 +210,7 @@ def test_similarity(global_var): lux.Clause("Origin=USA"), ] ) - df._repr_html_() + df._ipython_display_() assert len(df.recommendation["Similarity"]) == 2 ranked_list = df.recommendation["Similarity"] @@ -264,7 +264,7 @@ def test_similarity2(): def test_intent_retained(): df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/employee.csv") df.intent = ["Attrition"] - df._repr_html_() + df._ipython_display_() df["%WorkingYearsAtCompany"] = df["YearsAtCompany"] / df["TotalWorkingYears"] assert df.current_vis != None @@ -272,5 +272,5 @@ def test_intent_retained(): assert df._recs_fresh == False assert df._metadata_fresh == False - df._repr_html_() + df._ipython_display_() assert list(df.recommendation.keys()) == ["Enhance", "Filter"] diff --git a/tests/test_compiler.py b/tests/test_compiler.py index 28e80f8a..d425e41d 100644 --- a/tests/test_compiler.py +++ b/tests/test_compiler.py @@ -109,10 +109,10 @@ def test_underspecified_single_vis(global_var, test_recs): def test_set_intent_as_vis(global_var, test_recs): lux.config.set_executor_type("Pandas") df = pytest.car_df - df._repr_html_() + df._ipython_display_() vis = df.recommendation["Correlation"][0] df.intent = vis - df._repr_html_() + df._ipython_display_() test_recs(df, ["Enhance", "Filter", "Generalize"]) connection = psycopg2.connect("host=localhost dbname=postgres user=postgres password=lux") @@ -128,7 +128,7 @@ def test_set_intent_as_vis(global_var, test_recs): @pytest.fixture def test_recs(): def test_recs_function(df, actions): - df._repr_html_() + df._ipython_display_() assert len(df.recommendation) > 0 recKeys = list(df.recommendation.keys()) list_equal(recKeys, actions) @@ -633,7 +633,7 @@ def test_populate_options(global_var): lux.Clause(attribute="MilesPerGal"), ] ) - df._repr_html_() + df._ipython_display_() col_set = set() for specOptions in Compiler.populate_wildcard_options(df._intent, df)["attributes"]: for clause in specOptions: @@ -683,7 +683,7 @@ def test_remove_all_invalid(global_var): lux.Clause(attribute="Origin"), ] ) - df._repr_html_() + df._ipython_display_() assert len(df.current_vis) == 0 df.clear_intent() diff --git a/tests/test_config.py b/tests/test_config.py index 23bf65d8..bd61366d 100644 --- a/tests/test_config.py +++ b/tests/test_config.py @@ -52,7 +52,7 @@ def contain_horsepower(df): def test_default_actions_registered(global_var): lux.config.set_executor_type("Pandas") df = pytest.car_df - df._repr_html_() + df._ipython_display_() assert "Distribution" in df.recommendation assert len(df.recommendation["Distribution"]) > 0 @@ -68,7 +68,7 @@ def test_default_actions_registered(global_var): def test_fail_validator(): df = register_new_action() - df._repr_html_() + df._ipython_display_() assert ( "bars" not in df.recommendation, "Bars should not be rendered when there is no intent 'horsepower' specified.", @@ -78,7 +78,7 @@ def test_fail_validator(): def test_pass_validator(): df = register_new_action() df.set_intent(["Acceleration", "Horsepower"]) - df._repr_html_() + df._ipython_display_() assert len(df.recommendation["bars"]) > 0 assert ( "bars" in df.recommendation, @@ -88,7 +88,7 @@ def test_pass_validator(): def test_no_validator(): df = register_new_action(False) - df._repr_html_() + df._ipython_display_() assert len(df.recommendation["bars"]) > 0 assert "bars" in df.recommendation @@ -122,7 +122,7 @@ def random_categorical(ldf): def test_remove_action(): df = register_new_action() df.set_intent(["Acceleration", "Horsepower"]) - df._repr_html_() + df._ipython_display_() assert ( "bars" in df.recommendation, "Bars should be rendered after it has been registered with correct intent.", @@ -132,7 +132,7 @@ def test_remove_action(): "Bars should be rendered after it has been registered with correct intent.", ) lux.config.remove_action("bars") - df._repr_html_() + df._ipython_display_() assert ( "bars" not in df.recommendation, "Bars should not be rendered after it has been removed.", @@ -149,22 +149,22 @@ def test_remove_invalid_action(global_var): # TODO: This test does not pass in pytest but is working in Jupyter notebook. def test_remove_default_actions(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() lux.config.remove_action("distribution") - df._repr_html_() + df._ipython_display_() assert "Distribution" not in df.recommendation lux.config.remove_action("occurrence") - df._repr_html_() + df._ipython_display_() assert "Occurrence" not in df.recommendation lux.config.remove_action("temporal") - df._repr_html_() + df._ipython_display_() assert "Temporal" not in df.recommendation lux.config.remove_action("correlation") - df._repr_html_() + df._ipython_display_() assert "Correlation" not in df.recommendation assert ( @@ -174,7 +174,7 @@ def test_remove_default_actions(global_var): df = register_new_action() df.set_intent(["Acceleration", "Horsepower"]) - df._repr_html_() + df._ipython_display_() assert ( "bars" in df.recommendation, "Bars should be rendered after it has been registered with correct intent.", @@ -196,7 +196,7 @@ def add_title(fig, ax): df = pd.read_csv("lux/data/car.csv") lux.config.plotting_style = add_title - df._repr_html_() + df._ipython_display_() title_addition = 'ax.set_title("Test Title")' exported_code_str = df.recommendation["Correlation"][0].to_Altair() assert title_addition in exported_code_str @@ -212,7 +212,7 @@ def change_color_make_transparent_add_title(chart): df = pd.read_csv("lux/data/car.csv") lux.config.plotting_style = change_color_make_transparent_add_title - df._repr_html_() + df._ipython_display_() config_mark_addition = 'chart = chart.configure_mark(color="green", opacity=0.2)' title_addition = 'chart.title = "Test Title"' exported_code_str = df.recommendation["Correlation"][0].to_Altair() @@ -222,23 +222,23 @@ def change_color_make_transparent_add_title(chart): def test_sampling_flag_config(): df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/airbnb_nyc.csv") - df._repr_html_() + df._ipython_display_() assert df.recommendation["Correlation"][0].data.shape[0] == 30000 lux.config.sampling = False df = df.copy() - df._repr_html_() + df._ipython_display_() assert df.recommendation["Correlation"][0].data.shape[0] == 48895 lux.config.sampling = True def test_sampling_parameters_config(): df = pd.read_csv("lux/data/car.csv") - df._repr_html_() + df._ipython_display_() assert df.recommendation["Correlation"][0].data.shape[0] == 392 lux.config.sampling_start = 50 lux.config.sampling_cap = 100 df = pd.read_csv("lux/data/car.csv") - df._repr_html_() + df._ipython_display_() assert df.recommendation["Correlation"][0].data.shape[0] == 100 lux.config.sampling_cap = 30000 lux.config.sampling_start = 10000 @@ -246,11 +246,11 @@ def test_sampling_parameters_config(): def test_heatmap_flag_config(): df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/airbnb_nyc.csv") - df._repr_html_() + df._ipython_display_() assert df.recommendation["Correlation"][0]._postbin lux.config.heatmap = False df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/airbnb_nyc.csv") - df._repr_html_() + df._ipython_display_() assert not df.recommendation["Correlation"][0]._postbin lux.config.heatmap = True @@ -258,11 +258,11 @@ def test_heatmap_flag_config(): def test_topk(global_var): df = pd.read_csv("lux/data/college.csv") lux.config.topk = False - df._repr_html_() + df._ipython_display_() assert len(df.recommendation["Correlation"]) == 45, "Turn off top K" lux.config.topk = 20 df = pd.read_csv("lux/data/college.csv") - df._repr_html_() + df._ipython_display_() assert len(df.recommendation["Correlation"]) == 20, "Show top 20" for vis in df.recommendation["Correlation"]: assert vis.score > 0.2 @@ -271,20 +271,20 @@ def test_topk(global_var): def test_sort(global_var): df = pd.read_csv("lux/data/college.csv") lux.config.topk = 15 - df._repr_html_() + df._ipython_display_() assert len(df.recommendation["Correlation"]) == 15, "Show top 15" for vis in df.recommendation["Correlation"]: assert vis.score > 0.5 df = pd.read_csv("lux/data/college.csv") lux.config.sort = "ascending" - df._repr_html_() + df._ipython_display_() assert len(df.recommendation["Correlation"]) == 15, "Show bottom 15" for vis in df.recommendation["Correlation"]: assert vis.score < 0.35 lux.config.sort = "none" df = pd.read_csv("lux/data/college.csv") - df._repr_html_() + df._ipython_display_() scorelst = [x.score for x in df.recommendation["Distribution"]] assert sorted(scorelst) != scorelst, "unsorted setting" lux.config.sort = "descending" @@ -301,7 +301,7 @@ def test_sort(global_var): # df.plot_config = change_color_add_title -# df._repr_html_() +# df._ipython_display_() # vis_code = df.recommendation["Correlation"][0].to_Altair() # print (vis_code) diff --git a/tests/test_dates.py b/tests/test_dates.py index dc530fc7..d6295b05 100644 --- a/tests/test_dates.py +++ b/tests/test_dates.py @@ -88,7 +88,7 @@ def test_refresh_inplace(): } ) with pytest.warns(UserWarning, match="Lux detects that the attribute 'date' may be temporal."): - df._repr_html_() + df._ipython_display_() assert df.data_type["date"] == "temporal" from lux.vis.Vis import Vis diff --git a/tests/test_display.py b/tests/test_display.py index 7716867c..96ee644e 100644 --- a/tests/test_display.py +++ b/tests/test_display.py @@ -26,16 +26,22 @@ def test_to_pandas(global_var): def test_display_LuxDataframe(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() def test_display_Vis(global_var): df = pytest.car_df vis = Vis(["Horsepower", "Acceleration"], df) - vis._repr_html_() + vis._ipython_display_() def test_display_VisList(global_var): df = pytest.car_df vislist = VisList(["?", "Acceleration"], df) - vislist._repr_html_() + vislist._ipython_display_() + + +def test_repr(global_var): + df = pytest.car_df + output = df.__repr__() + assert "MilesPerGal" in output diff --git a/tests/test_error_warning.py b/tests/test_error_warning.py index 6106bfb8..d4d72859 100644 --- a/tests/test_error_warning.py +++ b/tests/test_error_warning.py @@ -56,7 +56,7 @@ def test_vis_private_properties(global_var): df = pytest.car_df vis = Vis(["Horsepower", "Weight"], df) - vis._repr_html_() + vis._ipython_display_() assert isinstance(vis.data, lux.core.frame.LuxDataFrame) with pytest.raises(AttributeError, match="can't set attribute"): vis.data = "some val" @@ -77,10 +77,10 @@ def test_vis_private_properties(global_var): # Test DataFrame Properties give Lux Warning but not UserWarning def test_lux_warnings(global_var): df = pd.DataFrame() - df._repr_html_() + df._ipython_display_() assert df._widget.message == f"
  • Lux cannot operate on an empty DataFrame.
" df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) - df._repr_html_() + df._ipython_display_() assert ( df._widget.message == f"
  • The DataFrame is too small to visualize. To generate visualizations in Lux, the DataFrame must contain at least 5 rows.
" @@ -88,7 +88,7 @@ def test_lux_warnings(global_var): df = pytest.car_df df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = df.set_index(["Name", "Cylinders"]) - new_df._repr_html_() + new_df._ipython_display_() assert ( new_df._widget.message == f"
  • Lux does not currently support visualizations in a DataFrame with hierarchical indexes.\nPlease convert the DataFrame into a flat table via pandas.DataFrame.reset_index.
" diff --git a/tests/test_groupby.py b/tests/test_groupby.py index b5841cc8..724c524a 100644 --- a/tests/test_groupby.py +++ b/tests/test_groupby.py @@ -19,54 +19,54 @@ def test_agg(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() new_df = df[["Horsepower", "Brand"]].groupby("Brand").agg(sum) - new_df._repr_html_() + new_df._ipython_display_() assert new_df.history[0].name == "groupby" assert new_df.pre_aggregated def test_shortcut_agg(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() new_df = df[["MilesPerGal", "Brand"]].groupby("Brand").sum() - new_df._repr_html_() + new_df._ipython_display_() assert new_df.history[0].name == "groupby" assert new_df.pre_aggregated def test_agg_mean(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() new_df = df.groupby("Origin").mean() - new_df._repr_html_() + new_df._ipython_display_() assert new_df.history[0].name == "groupby" assert new_df.pre_aggregated def test_agg_size(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() new_df = df.groupby("Brand").size().to_frame() - new_df._repr_html_() + new_df._ipython_display_() assert new_df.history[0].name == "groupby" assert new_df.pre_aggregated def test_filter(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() new_df = df.groupby("Origin").filter(lambda x: x["Weight"].mean() > 3000) - new_df._repr_html_() + new_df._ipython_display_() assert new_df.history[0].name == "groupby" assert not new_df.pre_aggregated def test_get_group(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() new_df = df.groupby("Origin").get_group("Japan") - new_df._repr_html_() + new_df._ipython_display_() assert new_df.history[0].name == "groupby" assert not new_df.pre_aggregated diff --git a/tests/test_interestingness.py b/tests/test_interestingness.py index 7e6036f9..c2d017b5 100644 --- a/tests/test_interestingness.py +++ b/tests/test_interestingness.py @@ -16,6 +16,7 @@ import pytest import pandas as pd import numpy as np +import psycopg2 from lux.interestingness.interestingness import interestingness @@ -25,7 +26,7 @@ def test_interestingness_1_0_0(global_var): df["Year"] = pd.to_datetime(df["Year"], format="%Y") df.set_intent([lux.Clause(attribute="Origin")]) - df._repr_html_() + df._ipython_display_() # check that top recommended enhance graph score is not none and that ordering makes intuitive sense assert interestingness(df.recommendation["Enhance"][0], df) != None rank1 = -1 @@ -69,17 +70,35 @@ def test_interestingness_1_0_1(global_var): lux.Clause(attribute="Cylinders"), ] ) - df._repr_html_() + df._ipython_display_() assert df.current_vis[0].score == 0 df.clear_intent() + connection = psycopg2.connect("host=localhost dbname=postgres user=postgres password=lux") + tbl = lux.LuxSQLTable() + lux.config.set_SQL_connection(connection) + tbl.set_SQL_table("car") + + tbl.set_intent( + [ + lux.Clause(attribute="Origin", filter_op="=", value="USA"), + lux.Clause(attribute="Cylinders"), + ] + ) + tbl._repr_html_() + filter_score = tbl.recommendation["Filter"][0].score + assert tbl.current_vis[0].score == 0 + assert filter_score > 0 + tbl.clear_intent() + def test_interestingness_0_1_0(global_var): + lux.config.set_executor_type("Pandas") df = pytest.car_df df["Year"] = pd.to_datetime(df["Year"], format="%Y") df.set_intent([lux.Clause(attribute="Horsepower")]) - df._repr_html_() + df._ipython_display_() # check that top recommended enhance graph score is not none and that ordering makes intuitive sense assert interestingness(df.recommendation["Enhance"][0], df) != None rank1 = -1 @@ -129,18 +148,35 @@ def test_interestingness_0_1_1(global_var): lux.Clause(attribute="MilesPerGal"), ] ) - df._repr_html_() + df._ipython_display_() assert interestingness(df.recommendation["Current Vis"][0], df) != None assert str(df.recommendation["Current Vis"][0]._inferred_intent[2].value) == "USA" df.clear_intent() + connection = psycopg2.connect("host=localhost dbname=postgres user=postgres password=lux") + tbl = lux.LuxSQLTable() + lux.config.set_SQL_connection(connection) + tbl.set_SQL_table("car") + + tbl.set_intent( + [ + lux.Clause(attribute="Origin", filter_op="=", value="?"), + lux.Clause(attribute="MilesPerGal"), + ] + ) + tbl._repr_html_() + assert interestingness(tbl.recommendation["Current Vis"][0], tbl) != None + assert str(tbl.recommendation["Current Vis"][0]._inferred_intent[2].value) == "USA" + tbl.clear_intent() + def test_interestingness_1_1_0(global_var): + lux.config.set_executor_type("Pandas") df = pytest.car_df df["Year"] = pd.to_datetime(df["Year"], format="%Y") df.set_intent([lux.Clause(attribute="Horsepower"), lux.Clause(attribute="Year")]) - df._repr_html_() + df._ipython_display_() # check that top recommended Enhance graph score is not none (all graphs here have same score) assert interestingness(df.recommendation["Enhance"][0], df) != None @@ -176,7 +212,7 @@ def test_interestingness_1_1_1(global_var): lux.Clause(attribute="Origin", filter_op="=", value="USA", bin_size=20), ] ) - df._repr_html_() + df._ipython_display_() # check that top recommended Enhance graph score is not none and that ordering makes intuitive sense assert interestingness(df.recommendation["Enhance"][0], df) != None rank1 = -1 @@ -204,12 +240,31 @@ def test_interestingness_1_1_1(global_var): assert interestingness(df.recommendation["Filter"][0], df) != None df.clear_intent() + connection = psycopg2.connect("host=localhost dbname=postgres user=postgres password=lux") + tbl = lux.LuxSQLTable() + lux.config.set_SQL_connection(connection) + tbl.set_SQL_table("car") + + tbl.set_intent( + [ + lux.Clause(attribute="Horsepower"), + lux.Clause(attribute="Origin", filter_op="=", value="USA", bin_size=20), + ] + ) + tbl._repr_html_() + assert interestingness(tbl.recommendation["Enhance"][0], tbl) != None + + # check for top recommended Filter graph score is not none + assert interestingness(tbl.recommendation["Filter"][0], tbl) != None + tbl.clear_intent() + def test_interestingness_1_2_0(global_var): from lux.vis.Vis import Vis from lux.vis.Vis import Clause from lux.interestingness.interestingness import interestingness + lux.config.set_executor_type("Pandas") df = pytest.car_df y_clause = Clause(attribute="Name", channel="y") color_clause = Clause(attribute="Cylinders", channel="color") @@ -227,7 +282,7 @@ def test_interestingness_0_2_0(global_var): df["Year"] = pd.to_datetime(df["Year"], format="%Y") df.set_intent([lux.Clause(attribute="Horsepower"), lux.Clause(attribute="Acceleration")]) - df._repr_html_() + df._ipython_display_() # check that top recommended enhance graph score is not none and that ordering makes intuitive sense assert interestingness(df.recommendation["Enhance"][0], df) != None rank1 = -1 @@ -263,7 +318,7 @@ def test_interestingness_0_2_1(global_var): lux.Clause(attribute="Acceleration", filter_op=">", value=10), ] ) - df._repr_html_() + df._ipython_display_() # check that top recommended Generalize graph score is not none assert interestingness(df.recommendation["Generalize"][0], df) != None df.clear_intent() diff --git a/tests/test_maintainence.py b/tests/test_maintainence.py index 4e18994a..447fff6f 100644 --- a/tests/test_maintainence.py +++ b/tests/test_maintainence.py @@ -21,15 +21,15 @@ def test_metadata_subsequent_display(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() assert df._metadata_fresh == True, "Failed to maintain metadata after display df" - df._repr_html_() + df._ipython_display_() assert df._metadata_fresh == True, "Failed to maintain metadata after display df" def test_metadata_subsequent_vis(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() assert df._metadata_fresh == True, "Failed to maintain metadata after display df" vis = Vis(["Acceleration", "Horsepower"], df) assert df._metadata_fresh == True, "Failed to maintain metadata after display df" @@ -37,7 +37,7 @@ def test_metadata_subsequent_vis(global_var): def test_metadata_inplace_operation(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() assert df._metadata_fresh == True, "Failed to maintain metadata after display df" df.dropna(inplace=True) assert df._metadata_fresh == False, "Failed to expire metadata after in-place Pandas operation" @@ -45,7 +45,7 @@ def test_metadata_inplace_operation(global_var): def test_metadata_new_df_operation(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() assert df._metadata_fresh == True, "Failed to maintain metadata after display df" df[["MilesPerGal", "Acceleration"]] assert df._metadata_fresh == True, "Failed to maintain metadata after display df" @@ -61,7 +61,7 @@ def test_metadata_column_group_reset_df(global_var): result = df.groupby("Cylinders").mean() assert not hasattr(result, "_metadata_fresh") # Note that this should trigger two compute metadata (one for df, and one for an intermediate df.reset_index used to feed inside created Vis) - result._repr_html_() + result._ipython_display_() assert result._metadata_fresh == True, "Failed to maintain metadata after display df" colgroup_recs = result.recommendation["Column Groups"] @@ -72,12 +72,12 @@ def test_metadata_column_group_reset_df(global_var): def test_recs_inplace_operation(global_var): df = pytest.college_df - df._repr_html_() + df._ipython_display_() assert df._recs_fresh == True, "Failed to maintain recommendation after display df" assert len(df.recommendation["Occurrence"]) == 6 df.drop(columns=["Name"], inplace=True) assert "Name" not in df.columns, "Failed to perform `drop` operation in-place" assert df._recs_fresh == False, "Failed to maintain recommendation after in-place Pandas operation" - df._repr_html_() + df._ipython_display_() assert len(df.recommendation["Occurrence"]) == 5 assert df._recs_fresh == True, "Failed to maintain recommendation after display df" diff --git a/tests/test_nan.py b/tests/test_nan.py index 1701215f..7003468c 100644 --- a/tests/test_nan.py +++ b/tests/test_nan.py @@ -24,7 +24,7 @@ def test_nan_column(global_var): df = pytest.college_df old_geo = df["Geography"] df["Geography"] = np.nan - df._repr_html_() + df._ipython_display_() for visList in df.recommendation.keys(): for vis in df.recommendation[visList]: assert vis.get_attr_by_attr_name("Geography") == [] @@ -84,7 +84,7 @@ def test_apply_nan_filter(): test = pd.DataFrame(dataset) vis = Vis(["some_nan", "some_nan2=nan"], test) - vis._repr_html_() + vis._ipython_display_() assert vis.mark == "bar" @@ -111,5 +111,5 @@ def test_nan_series_occurence(): } nan_series = LuxSeries(dvalues) ldf = pd.DataFrame(nan_series, columns=["col"]) - ldf._repr_html_() + ldf._ipython_display_() assert ldf.recommendation["Occurrence"][0].mark == "bar" diff --git a/tests/test_pandas.py b/tests/test_pandas.py index 4c8f896a..db0f9584 100644 --- a/tests/test_pandas.py +++ b/tests/test_pandas.py @@ -19,16 +19,16 @@ def test_head_tail(global_var): df = pytest.car_df - df._repr_html_() + df._ipython_display_() assert df._message.to_html() == "" - df.head()._repr_html_() + df.head()._ipython_display_() assert ( "Lux is visualizing the previous version of the dataframe before you applied head." in df._message.to_html() ) - df._repr_html_() + df._ipython_display_() assert df._message.to_html() == "" - df.tail()._repr_html_() + df.tail()._ipython_display_() assert ( "Lux is visualizing the previous version of the dataframe before you applied tail." in df._message.to_html() @@ -38,12 +38,12 @@ def test_head_tail(global_var): def test_describe(global_var): df = pytest.college_df summary = df.describe() - summary._repr_html_() + summary._ipython_display_() assert len(summary.columns) == 10 def test_convert_dtype(global_var): df = pytest.college_df cdf = df.convert_dtypes() - cdf._repr_html_() + cdf._ipython_display_() assert list(cdf.recommendation.keys()) == ["Correlation", "Distribution", "Occurrence"] diff --git a/tests/test_pandas_coverage.py b/tests/test_pandas_coverage.py index 21014f60..c1039591 100644 --- a/tests/test_pandas_coverage.py +++ b/tests/test_pandas_coverage.py @@ -26,20 +26,20 @@ def test_deepcopy(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") - df._repr_html_() + df._ipython_display_() saved_df = df.copy(deep=True) - saved_df._repr_html_() + saved_df._ipython_display_() check_metadata_equal(df, saved_df) def test_rename_inplace(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") - df._repr_html_() + df._ipython_display_() new_df = df.copy(deep=True) df.rename(columns={"Name": "Car Name"}, inplace=True) - df._repr_html_() - new_df._repr_html_() + df._ipython_display_() + new_df._ipython_display_() # new_df is the old dataframe (df) with the new column name changed inplace new_df, df = df, new_df @@ -79,9 +79,9 @@ def test_rename_inplace(global_var): def test_rename(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") - df._repr_html_() + df._ipython_display_() new_df = df.rename(columns={"Name": "Car Name"}, inplace=False) - new_df._repr_html_() + new_df._ipython_display_() assert df.data_type != new_df.data_type assert df.data_type["Name"] == new_df.data_type["Car Name"] @@ -131,7 +131,7 @@ def test_rename3(global_var): "col9", "col10", ] - df._repr_html_() + df._ipython_display_() assert list(df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -147,7 +147,7 @@ def test_concat(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = pd.concat([df.loc[:, "Name":"Cylinders"], df.loc[:, "Year":"Origin"]], axis="columns") - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == [ "Distribution", "Occurrence", @@ -160,7 +160,7 @@ def test_groupby_agg(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = df.groupby("Year").agg(sum) - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == ["Column Groups"] assert len(new_df.cardinality) == 7 @@ -168,7 +168,7 @@ def test_groupby_agg(global_var): def test_groupby_agg_big(global_var): df = pd.read_csv("lux/data/car.csv") new_df = df.groupby("Brand").agg(sum) - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == ["Column Groups"] assert len(new_df.cardinality) == 8 year_vis = list( @@ -180,7 +180,7 @@ def test_groupby_agg_big(global_var): assert year_vis.mark == "bar" assert year_vis.get_attr_by_channel("x")[0].attribute == "Year" new_df = new_df.T - new_df._repr_html_() + new_df._ipython_display_() year_vis = list( filter( lambda vis: vis.get_attr_by_attr_name("Year") != [], @@ -195,13 +195,13 @@ def test_qcut(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") df["Weight"] = pd.qcut(df["Weight"], q=3) - df._repr_html_() + df._ipython_display_() def test_cut(global_var): df = pd.read_csv("lux/data/car.csv") df["Weight"] = pd.cut(df["Weight"], bins=[0, 2500, 7500, 10000], labels=["small", "medium", "large"]) - df._repr_html_() + df._ipython_display_() def test_groupby_agg_very_small(global_var): @@ -209,7 +209,7 @@ def test_groupby_agg_very_small(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = df.groupby("Origin").agg(sum).reset_index() - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == ["Column Groups"] assert len(new_df.cardinality) == 7 @@ -219,7 +219,7 @@ def test_groupby_agg_very_small(global_var): # df = pd.read_csv(url) # df["Year"] = pd.to_datetime(df["Year"], format='%Y') # new_df = df.groupby(["Year", "Cylinders"]).agg(sum).stack().reset_index() -# new_df._repr_html_() +# new_df._ipython_display_() # assert list(new_df.recommendation.keys() ) == ['Column Groups'] # TODO # assert len(new_df.cardinality) == 7 # TODO @@ -228,7 +228,7 @@ def test_query(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = df.query("Weight > 3000") - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -242,7 +242,7 @@ def test_pop(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") df.pop("Weight") - df._repr_html_() + df._ipython_display_() assert list(df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -256,8 +256,8 @@ def test_transform(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = df.iloc[:, 1:].groupby("Origin").transform(sum) - new_df._repr_html_() - assert list(new_df.recommendation.keys()) == ["Correlation", "Occurrence"] + new_df._ipython_display_() + assert list(new_df.recommendation.keys()) == ["Correlation", "Distribution", "Occurrence"] assert len(new_df.cardinality) == 7 @@ -266,7 +266,7 @@ def test_get_group(global_var): df["Year"] = pd.to_datetime(df["Year"], format="%Y") gbobj = df.groupby("Origin") new_df = gbobj.get_group("Japan") - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -281,7 +281,7 @@ def test_applymap(global_var): df["Year"] = pd.to_datetime(df["Year"], format="%Y") mapping = {"USA": 0, "Europe": 1, "Japan": 2} df["Origin"] = df[["Origin"]].applymap(mapping.get) - df._repr_html_() + df._ipython_display_() assert list(df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -295,7 +295,7 @@ def test_strcat(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") df["combined"] = df["Origin"].str.cat(df["Brand"], sep=", ") - df._repr_html_() + df._ipython_display_() assert list(df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -313,7 +313,7 @@ def test_named_agg(global_var): max_weight=("Weight", "max"), mean_displacement=("Displacement", "mean"), ) - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == ["Column Groups"] assert len(new_df.cardinality) == 4 @@ -322,7 +322,7 @@ def test_change_dtype(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") df["Cylinders"] = pd.Series(df["Cylinders"], dtype="Int64") - df._repr_html_() + df._ipython_display_() assert list(df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -336,7 +336,7 @@ def test_get_dummies(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = pd.get_dummies(df) - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -351,7 +351,7 @@ def test_drop(global_var): df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = df.drop([0, 1, 2], axis="rows") new_df2 = new_df.drop(["Name", "MilesPerGal", "Cylinders"], axis="columns") - new_df2._repr_html_() + new_df2._ipython_display_() assert list(new_df2.recommendation.keys()) == [ "Correlation", "Distribution", @@ -366,7 +366,7 @@ def test_merge(global_var): df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = df.drop([0, 1, 2], axis="rows") new_df2 = pd.merge(df, new_df, how="left", indicator=True) - new_df2._repr_html_() + new_df2._ipython_display_() assert list(new_df2.recommendation.keys()) == [ "Correlation", "Distribution", @@ -380,7 +380,7 @@ def test_prefix(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = df.add_prefix("1_") - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -395,7 +395,7 @@ def test_loc(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = df.loc[:, "Displacement":"Origin"] - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -404,18 +404,18 @@ def test_loc(global_var): ] assert len(new_df.cardinality) == 6 new_df = df.loc[0:10, "Displacement":"Origin"] - new_df._repr_html_() - assert list(new_df.recommendation.keys()) == ["Correlation", "Distribution"] + new_df._ipython_display_() + assert list(new_df.recommendation.keys()) == ["Correlation", "Distribution", "Occurrence"] assert len(new_df.cardinality) == 6 new_df = df.loc[0:10, "Displacement":"Horsepower"] - new_df._repr_html_() - assert list(new_df.recommendation.keys()) == ["Correlation", "Distribution"] + new_df._ipython_display_() + assert list(new_df.recommendation.keys()) == ["Distribution", "Occurrence"] assert len(new_df.cardinality) == 2 import numpy as np inter_df = df.groupby("Brand")[["Acceleration", "Weight", "Horsepower"]].agg(np.mean) new_df = inter_df.loc["chevrolet":"fiat", "Acceleration":"Weight"] - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == ["Column Groups"] assert len(new_df.cardinality) == 3 @@ -424,7 +424,7 @@ def test_iloc(global_var): df = pd.read_csv("lux/data/car.csv") df["Year"] = pd.to_datetime(df["Year"], format="%Y") new_df = df.iloc[:, 3:9] - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -433,18 +433,18 @@ def test_iloc(global_var): ] assert len(new_df.cardinality) == 6 new_df = df.iloc[0:11, 3:9] - new_df._repr_html_() - assert list(new_df.recommendation.keys()) == ["Correlation", "Distribution"] + new_df._ipython_display_() + assert list(new_df.recommendation.keys()) == ["Correlation", "Distribution", "Occurrence"] assert len(new_df.cardinality) == 6 new_df = df.iloc[0:11, 3:5] - new_df._repr_html_() - assert list(new_df.recommendation.keys()) == ["Correlation", "Distribution"] + new_df._ipython_display_() + assert list(new_df.recommendation.keys()) == ["Distribution", "Occurrence"] assert len(new_df.cardinality) == 2 import numpy as np inter_df = df.groupby("Brand")[["Acceleration", "Weight", "Horsepower"]].agg(np.mean) new_df = inter_df.iloc[5:10, 0:2] - new_df._repr_html_() + new_df._ipython_display_() assert list(new_df.recommendation.keys()) == ["Column Groups"] assert len(new_df.cardinality) == 3 @@ -521,29 +521,29 @@ def test_index(global_var): df = df.set_index(["Name"]) # if this assert fails, then the index column has not properly been removed from the dataframe's column and registered as an index assert "Name" not in df.columns and df.index.name == "Name" - df._repr_html_() + df._ipython_display_() assert len(df.recommendation) > 0 df = df.reset_index() assert "Name" in df.columns and df.index.name != "Name" - df._repr_html_() + df._ipython_display_() assert len(df.recommendation) > 0 df.set_index(["Name"], inplace=True) assert "Name" not in df.columns and df.index.name == "Name" - df._repr_html_() + df._ipython_display_() assert len(df.recommendation) > 0 df.reset_index(inplace=True) assert "Name" in df.columns and df.index.name != "Name" - df._repr_html_() + df._ipython_display_() assert len(df.recommendation) > 0 df = df.set_index(["Name"]) assert "Name" not in df.columns and df.index.name == "Name" - df._repr_html_() + df._ipython_display_() assert len(df.recommendation) > 0 df = df.reset_index(drop=True) assert "Name" not in df.columns and df.index.name != "Name" - df._repr_html_() + df._ipython_display_() assert len(df.recommendation) > 0 @@ -551,17 +551,17 @@ def test_index_col(global_var): df = pd.read_csv("lux/data/car.csv", index_col="Name") # if this assert fails, then the index column has not properly been removed from the dataframe's column and registered as an index assert "Name" not in df.columns and df.index.name == "Name" - df._repr_html_() + df._ipython_display_() assert len(df.recommendation) > 0 df = df.reset_index() assert "Name" in df.columns and df.index.name != "Name" - df._repr_html_() + df._ipython_display_() assert len(df.recommendation) > 0 # this case is not yet addressed, need to have a check that eliminates bar charts with duplicate column names # df = df.set_index(["Name"], drop=False) # assert "Name" not in df.columns and df.index.name == "Name" - # df._repr_html_() + # df._ipython_display_() # assert len(df.recommendation) > 0 # df = df.reset_index(drop=True) # assert "Name" not in df.columns and df.index.name != "Name" @@ -575,7 +575,7 @@ def test_index_col(global_var): def test_df_to_series(global_var): # Ensure metadata is kept when going from df to series df = pd.read_csv("lux/data/car.csv") - df._repr_html_() # compute metadata + df._ipython_display_() # compute metadata assert df.cardinality is not None series = df["Weight"] assert isinstance(series, lux.core.series.LuxSeries), "Derived series is type LuxSeries." @@ -589,7 +589,7 @@ def test_df_to_series(global_var): def test_value_counts(global_var): df = pd.read_csv("lux/data/car.csv") - df._repr_html_() # compute metadata + df._ipython_display_() # compute metadata assert df.cardinality is not None series = df["Weight"] series.value_counts() @@ -603,7 +603,7 @@ def test_value_counts(global_var): def test_str_replace(global_var): df = pd.read_csv("lux/data/car.csv") - df._repr_html_() # compute metadata + df._ipython_display_() # compute metadata assert df.cardinality is not None series = df["Brand"].str.replace("chevrolet", "chevy") assert isinstance(series, lux.core.series.LuxSeries), "Derived series is type LuxSeries." @@ -622,7 +622,7 @@ def test_str_replace(global_var): def test_read_json(global_var): url = "https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/car.json" df = pd.read_json(url) - df._repr_html_() + df._ipython_display_() assert list(df.recommendation.keys()) == [ "Correlation", "Distribution", @@ -635,7 +635,7 @@ def test_read_json(global_var): def test_read_sas(global_var): url = "https://github.com/lux-org/lux-datasets/blob/master/data/airline.sas7bdat?raw=true" df = pd.read_sas(url, format="sas7bdat") - df._repr_html_() + df._ipython_display_() assert list(df.recommendation.keys()) == ["Correlation", "Distribution", "Temporal"] assert len(df.data_type) == 6 @@ -644,5 +644,5 @@ def test_read_multi_dtype(global_var): url = "https://github.com/lux-org/lux-datasets/blob/master/data/car-data.xls?raw=true" df = pd.read_excel(url) with pytest.warns(UserWarning, match="mixed type") as w: - df._repr_html_() + df._ipython_display_() assert "df['Car Type'] = df['Car Type'].astype(str)" in str(w[-1].message) diff --git a/tests/test_parser.py b/tests/test_parser.py index 333977aa..bac6d45d 100644 --- a/tests/test_parser.py +++ b/tests/test_parser.py @@ -78,7 +78,7 @@ def test_case5(global_var): def test_case6(global_var): df = pytest.car_df df.set_intent(["Horsepower", "Origin=?"]) - df._repr_html_() + df._ipython_display_() assert type(df._intent[0]) is lux.Clause assert df._intent[0].attribute == "Horsepower" assert type(df._intent[1]) is lux.Clause @@ -90,7 +90,7 @@ def test_case6(global_var): def test_case7(global_var): df = pytest.car_df df.intent = [["Horsepower", "MilesPerGal", "Acceleration"], "Origin"] - df._repr_html_() + df._ipython_display_() assert len(df.current_vis) == 3 df.clear_intent() diff --git a/tests/test_performance.py b/tests/test_performance.py index 256d45fb..fca6a1c6 100644 --- a/tests/test_performance.py +++ b/tests/test_performance.py @@ -23,10 +23,10 @@ def test_q1_performance_census(global_var): df = pd.read_csv("https://github.com/lux-org/lux-datasets/blob/master/data/census.csv?raw=true") tic = time.perf_counter() - df._repr_html_() + df._ipython_display_() toc = time.perf_counter() delta = toc - tic - df._repr_html_() + df._ipython_display_() toc2 = time.perf_counter() delta2 = toc2 - toc print(f"1st display Performance: {delta:0.4f} seconds") diff --git a/tests/test_series.py b/tests/test_series.py index f8ff4fad..83b6b818 100644 --- a/tests/test_series.py +++ b/tests/test_series.py @@ -20,7 +20,7 @@ def test_df_to_series(): # Ensure metadata is kept when going from df to series df = pd.read_csv("lux/data/car.csv") - df._repr_html_() # compute metadata + df._ipython_display_() # compute metadata assert df.cardinality is not None series = df["Weight"] assert isinstance(series, lux.core.series.LuxSeries), "Derived series is type LuxSeries." @@ -47,7 +47,6 @@ def test_df_to_series(): "_pandas_only", "pre_aggregated", "_type_override", - "name", ], "Metadata is lost when going from Dataframe to Series." assert df.cardinality is not None, "Metadata is lost when going from Dataframe to Series." assert series.name == "Weight", "Pandas Series original `name` property not retained." diff --git a/tests/test_type.py b/tests/test_type.py index f70bed11..5395c661 100644 --- a/tests/test_type.py +++ b/tests/test_type.py @@ -39,7 +39,7 @@ def test_check_int_id(): df = pd.read_csv( "https://github.com/lux-org/lux-datasets/blob/master/data/instacart_sample.csv?raw=true" ) - df._repr_html_() + df._ipython_display_() inverted_data_type = lux.config.executor.invert_data_type(df.data_type) assert len(inverted_data_type["id"]) == 3 assert ( @@ -50,7 +50,7 @@ def test_check_int_id(): def test_check_str_id(): df = pd.read_csv("https://github.com/lux-org/lux-datasets/blob/master/data/churn.csv?raw=true") - df._repr_html_() + df._ipython_display_() assert ( "customerID is not visualized since it resembles an ID field." in df._message.to_html() @@ -229,7 +229,7 @@ def test_set_data_type(): "https://github.com/lux-org/lux-datasets/blob/master/data/real_estate_tutorial.csv?raw=true" ) with pytest.warns(UserWarning) as w: - df._repr_html_() + df._ipython_display_() assert "starter template that you can use" in str(w[-1].message) assert "df.set_data_type" in str(w[-1].message) @@ -238,7 +238,7 @@ def test_set_data_type(): assert df.data_type["Year"] == "nominal" with warnings.catch_warnings() as w: warnings.simplefilter("always") - df._repr_html_() + df._ipython_display_() assert not w diff --git a/tests/test_vis.py b/tests/test_vis.py index 4514be42..4caf74ea 100644 --- a/tests/test_vis.py +++ b/tests/test_vis.py @@ -79,7 +79,7 @@ def test_refresh_collection(global_var): df = pytest.car_df df["Year"] = pd.to_datetime(df["Year"], format="%Y") df.set_intent([lux.Clause(attribute="Acceleration"), lux.Clause(attribute="Horsepower")]) - df._repr_html_() + df._ipython_display_() enhanceCollection = df.recommendation["Enhance"] enhanceCollection.refresh_source(df[df["Origin"] == "USA"]) df.clear_intent() @@ -168,10 +168,10 @@ def test_vis_set_intent(global_var): df = pytest.car_df vis = Vis(["Weight", "Horsepower"], df) - vis._repr_html_() + vis._ipython_display_() assert "Horsepower" in str(vis._code) vis.intent = ["Weight", "MilesPerGal"] - vis._repr_html_() + vis._ipython_display_() assert "MilesPerGal" in str(vis._code) @@ -180,11 +180,11 @@ def test_vis_list_set_intent(global_var): df = pytest.car_df vislist = VisList(["Horsepower", "?"], df) - vislist._repr_html_() + vislist._ipython_display_() for vis in vislist: assert vis.get_attr_by_attr_name("Horsepower") != [] vislist.intent = ["Weight", "?"] - vislist._repr_html_() + vislist._ipython_display_() for vis in vislist: assert vis.get_attr_by_attr_name("Weight") != [] @@ -194,7 +194,7 @@ def test_text_not_overridden(): df = pd.read_csv("lux/data/college.csv") vis = Vis(["Region", "Geography"], df) - vis._repr_html_() + vis._ipython_display_() code = vis.to_Altair() assert 'color = "#ff8e04"' in code @@ -247,6 +247,15 @@ def test_colored_bar_chart(global_var): assert "ax.set_ylabel('Cylinders')" in vis_code +def test_bar_uniform(): + df = pd.read_csv("lux/data/car.csv") + df["Year"] = pd.to_datetime(df["Year"], format="%Y") + df["Type"] = "A" + vis = Vis(["Type"], df) + vis_code = vis.to_Altair() + assert "y = alt.Y('Type', type= 'nominal'" in vis_code + + def test_scatter_chart(global_var): df = pytest.car_df lux.config.plotting_backend = "vegalite" @@ -361,6 +370,15 @@ def test_histogram_chart(global_var): assert "ax.set_ylabel('Number of Records')" in vis_code +def test_histogram_uniform(): + df = pd.read_csv("lux/data/car.csv") + df["Year"] = pd.to_datetime(df["Year"], format="%Y") + df["Units"] = 4.0 + vis = Vis(["Units"], df) + vis_code = vis.to_Altair() + assert "y = alt.Y('Units', type= 'nominal'" in vis_code + + def test_heatmap_chart(global_var): df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/airbnb_nyc.csv") lux.config.plotting_backend = "vegalite" @@ -420,7 +438,7 @@ def test_colored_heatmap_chart(global_var): def test_vegalite_default_actions_registered(global_var): df = pytest.car_df lux.config.plotting_backend = "vegalite" - df._repr_html_() + df._ipython_display_() # Histogram Chart assert "Distribution" in df.recommendation assert len(df.recommendation["Distribution"]) > 0 @@ -463,7 +481,7 @@ def test_vegalite_default_actions_registered_2(global_var): def test_matplotlib_default_actions_registered(global_var): lux.config.plotting_backend = "matplotlib" df = pytest.car_df - df._repr_html_() + df._ipython_display_() # Histogram Chart assert "Distribution" in df.recommendation assert len(df.recommendation["Distribution"]) > 0 @@ -484,7 +502,7 @@ def test_matplotlib_default_actions_registered(global_var): def test_vegalite_heatmap_flag_config(): df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/airbnb_nyc.csv") lux.config.plotting_backend = "vegalite" - df._repr_html_() + df._ipython_display_() # Heatmap Chart assert df.recommendation["Correlation"][0]._postbin lux.config.heatmap = False @@ -498,7 +516,7 @@ def test_vegalite_heatmap_flag_config(): def test_matplotlib_heatmap_flag_config(): df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/airbnb_nyc.csv") lux.config.plotting_backend = "matplotlib" - df._repr_html_() + df._ipython_display_() # Heatmap Chart assert df.recommendation["Correlation"][0]._postbin lux.config.heatmap = False