Added Interestingness Scoring for Colored Bar and Line charts #59

19thyneb · 2020-08-09T18:03:57Z

Added test case to test_interestingness.py under test_interestingness_1_2_0 function. For addressing issue #52

Added scoring functions for skew, kurtosis, and number of peaks

…o Interestingness

Also added test for this case in test_interestingness.py

Fixed bug where vis' stats and metadata were not being calculated in a specific case

19thyneb · 2020-08-09T20:46:25Z

Fixed bug in the Pandas executor where colored bar/line chart meta data was not being calculated. This also decreased performance on the census data by around 1.5 seconds.

dorisjlee · 2020-08-10T14:45:53Z

This seems to be causing an issue in test_q1_performance_census? Not sure why its doing this.

19thyneb · 2020-08-11T20:01:30Z

@dorisjlee so the error in the build check was due to the metadata of certain Vis objects not being calculated. So in that case, the vis.data.cardinality object was of NoneType, which caused the key error.

I've fixed this issue and the test runs fine locally, not sure why the build isn't working properly after I pushed my changes though.

Updated PandasExecutor to recompute stats and metadata for colored charts since non-colored charts do not require this data to compute interestingness scores. Reverted test_performance to previous version since performance was improved

dorisjlee · 2020-08-12T22:15:29Z

lux/executor/PandasExecutor.py

@@ -200,6 +200,11 @@ def execute_aggregate(view: Vis,isFiltered = True):
                    for col in columns[1:]:
                        view.data[col] = view.data[col].fillna(0)
                    assert len(list(view.data[groupby_attr.attribute])) == len(all_attr_vals), f"Aggregated data missing values compared to original range of values of `{groupby_attr.attribute}`."
+            #need to compute the statistics and metadata for the view's data if no new rows were added
+            else:
+                if view.data.cardinality is None and has_color:


Is there a reason why we need to recompute the metadata here?

dorisjlee · 2020-08-14T12:08:42Z

Merging this in for now, will see if the new metadata refresh changes will fix this.

…xecutor by explicitly passing _metadata from GroupBy object to DataFrame view.data

dorisjlee · 2020-08-14T14:26:05Z

This issue is fixed after we pull in the latest code for metadata maintenance. I tried to remove the hack line 203-205 in PandasExecutor.py but it was still required. This is happening because when we do
view.data = groupby_result.agg(agg_func).reset_index()
The metadata is not propagated over in __finalize__. In particular, the metadata is lost in the step where we perform the aggregation groupby_result.agg(agg_func).
This is happening because groupby_result is a GroupBy object. This means that when we are propagating the metadata over, it is lost when the DataFrame gets transformed into the GroupBy object.
I resolved this bug in commit eaed32a by explicitly passing the original metadata to the resulting intermediate result using __finalize__.

groupby_result = groupby_result.agg(agg_func)
intermediate = groupby_result.reset_index()
view.data = intermediate.__finalize__(view.data)

…g#59) * Modular Scores Added scoring functions for skew, kurtosis, and number of peaks * Correlation, Mutual Information, Skew * Removing old unused files * Added Intesestingness Scoring for Colored Bar and Line charts Also added test for this case in test_interestingness.py * Bug fix Pandas Executor Fixed bug where vis' stats and metadata were not being calculated in a specific case * Updated PandasExecutor Updated PandasExecutor to recompute stats and metadata for colored charts since non-colored charts do not require this data to compute interestingness scores. Reverted test_performance to previous version since performance was improved Former-commit-id: b04472d

…PandasExecutor by explicitly passing _metadata from GroupBy object to DataFrame view.data Former-commit-id: eaed32a

19thyneb added 6 commits February 26, 2020 17:35

Modular Scores

946f657

Added scoring functions for skew, kurtosis, and number of peaks

Correlation, Mutual Information, Skew

2a7b9d8

Merge branch 'Interestingness' of https://github.com/thyneb19/lux int…

c88cdfb

…o Interestingness

Removing old unused files

560c49d

Added Intesestingness Scoring for Colored Bar and Line charts

0385f16

Also added test for this case in test_interestingness.py

Bug fix Pandas Executor

6947002

Fixed bug where vis' stats and metadata were not being calculated in a specific case

Updated PandasExecutor

c08909a

Updated PandasExecutor to recompute stats and metadata for colored charts since non-colored charts do not require this data to compute interestingness scores. Reverted test_performance to previous version since performance was improved

dorisjlee reviewed Aug 12, 2020

View reviewed changes

dorisjlee merged commit b04472d into lux-org:master Aug 14, 2020

dorisjlee added a commit that referenced this pull request Aug 14, 2020

remove the bugfix from #59 to avoid computing metadata within PandasE…

eaed32a

…xecutor by explicitly passing _metadata from GroupBy object to DataFrame view.data

dorisjlee mentioned this pull request Aug 14, 2020

Metadata propagation not reliable if object type changes #65

Open

thyneb19 deleted the Interestingness branch October 23, 2020 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Interestingness Scoring for Colored Bar and Line charts #59

Added Interestingness Scoring for Colored Bar and Line charts #59

19thyneb commented Aug 9, 2020 •

edited by dorisjlee

Loading

19thyneb commented Aug 9, 2020

dorisjlee commented Aug 10, 2020

19thyneb commented Aug 11, 2020

dorisjlee Aug 12, 2020

dorisjlee commented Aug 14, 2020

dorisjlee commented Aug 14, 2020 •

edited

Loading

Added Interestingness Scoring for Colored Bar and Line charts #59

Added Interestingness Scoring for Colored Bar and Line charts #59

Conversation

19thyneb commented Aug 9, 2020 • edited by dorisjlee Loading

19thyneb commented Aug 9, 2020

dorisjlee commented Aug 10, 2020

19thyneb commented Aug 11, 2020

dorisjlee Aug 12, 2020

Choose a reason for hiding this comment

dorisjlee commented Aug 14, 2020

dorisjlee commented Aug 14, 2020 • edited Loading

19thyneb commented Aug 9, 2020 •

edited by dorisjlee

Loading

dorisjlee commented Aug 14, 2020 •

edited

Loading