Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Interestingness Scoring for Colored Bar and Line charts #59

Merged
merged 7 commits into from
Aug 14, 2020

Conversation

19thyneb
Copy link
Contributor

@19thyneb 19thyneb commented Aug 9, 2020

Added test case to test_interestingness.py under test_interestingness_1_2_0 function. For addressing issue #52

Added scoring functions for skew, kurtosis, and number of peaks
Also added test for this case in test_interestingness.py
Fixed bug where vis' stats and metadata were not being calculated in a specific case
@19thyneb
Copy link
Contributor Author

19thyneb commented Aug 9, 2020

Fixed bug in the Pandas executor where colored bar/line chart meta data was not being calculated. This also decreased performance on the census data by around 1.5 seconds.

@dorisjlee
Copy link
Member

This seems to be causing an issue in test_q1_performance_census? Not sure why its doing this.

@19thyneb
Copy link
Contributor Author

@dorisjlee so the error in the build check was due to the metadata of certain Vis objects not being calculated. So in that case, the vis.data.cardinality object was of NoneType, which caused the key error.

I've fixed this issue and the test runs fine locally, not sure why the build isn't working properly after I pushed my changes though.

Updated PandasExecutor to recompute stats and metadata for colored charts since non-colored charts do not require this data to compute interestingness scores.

Reverted test_performance to previous version since performance was improved
@@ -200,6 +200,11 @@ def execute_aggregate(view: Vis,isFiltered = True):
for col in columns[1:]:
view.data[col] = view.data[col].fillna(0)
assert len(list(view.data[groupby_attr.attribute])) == len(all_attr_vals), f"Aggregated data missing values compared to original range of values of `{groupby_attr.attribute}`."
#need to compute the statistics and metadata for the view's data if no new rows were added
else:
if view.data.cardinality is None and has_color:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we need to recompute the metadata here?

@dorisjlee dorisjlee merged commit b04472d into lux-org:master Aug 14, 2020
@dorisjlee
Copy link
Member

Merging this in for now, will see if the new metadata refresh changes will fix this.

dorisjlee added a commit that referenced this pull request Aug 14, 2020
…xecutor by explicitly passing _metadata from GroupBy object to DataFrame view.data
@dorisjlee
Copy link
Member

dorisjlee commented Aug 14, 2020

This issue is fixed after we pull in the latest code for metadata maintenance. I tried to remove the hack line 203-205 in PandasExecutor.py but it was still required. This is happening because when we do
view.data = groupby_result.agg(agg_func).reset_index()
The metadata is not propagated over in __finalize__. In particular, the metadata is lost in the step where we perform the aggregation groupby_result.agg(agg_func).
This is happening because groupby_result is a GroupBy object. This means that when we are propagating the metadata over, it is lost when the DataFrame gets transformed into the GroupBy object.
I resolved this bug in commit eaed32a by explicitly passing the original metadata to the resulting intermediate result using __finalize__.

groupby_result = groupby_result.agg(agg_func)
intermediate = groupby_result.reset_index()
view.data = intermediate.__finalize__(view.data)

westernguy2 pushed a commit to westernguy2/lux that referenced this pull request Sep 2, 2020
…g#59)

* Modular Scores

Added scoring functions for skew, kurtosis, and number of peaks

* Correlation, Mutual Information, Skew

* Removing old unused files

* Added Intesestingness Scoring for Colored Bar and Line charts

Also added test for this case in test_interestingness.py

* Bug fix Pandas Executor

Fixed bug where vis' stats and metadata were not being calculated in a specific case

* Updated PandasExecutor

Updated PandasExecutor to recompute stats and metadata for colored charts since non-colored charts do not require this data to compute interestingness scores.

Reverted test_performance to previous version since performance was improved

Former-commit-id: b04472d
westernguy2 pushed a commit to westernguy2/lux that referenced this pull request Sep 2, 2020
…PandasExecutor by explicitly passing _metadata from GroupBy object to DataFrame view.data

Former-commit-id: eaed32a
@thyneb19 thyneb19 deleted the Interestingness branch October 23, 2020 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants