Revert Cardinality Requirement for Histograms #301

jinimukh · 2021-03-12T07:42:22Z

Overview

Now that we have a new mechanism of finding bin widths, we can remove the cardinality requirement that was put into place to fix a divide-by-zero error that occurred if the range of values was 0.

Changes

remove cardinality requirement in univariate.py
take the absolute value for the default markbar in histogram.py
adjust tests to current output

Example Output

Now, we can see uniform distributions. Here is an example:

micahtyong · 2021-03-12T07:50:05Z

lux/vislib/altair/Histogram.py

@@ -53,7 +53,7 @@ def initialize_chart(self):

        # Default when bin too small
        if markbar < (x_range / 24):
-            markbar = (x_max - x_min) / 12
+            markbar = abs(x_max - x_min) / 12


Would there ever be a case such that calling abs is necessary? By construction, shouldn't x_max be greater than or equal to x_min?

I agree, I added this as a final check for integer overflow so that at least our code doesn't error and it just means the user inputs are too extreme (positive or negative) to handle. @dorisjlee should we even worry about this?

I think xmax will always be larger than xmin.

codecov · 2021-03-12T07:52:46Z

Codecov Report

Merging #301 (6fa19d7) into master (127806f) will increase coverage by 0.01%.
The diff coverage is 88.88%.

❗ Current head 6fa19d7 differs from pull request most recent head 853844a. Consider uploading reports for the commit 853844a to get more accurate results

@@            Coverage Diff             @@
##           master     #301      +/-   ##
==========================================
+ Coverage   80.96%   80.98%   +0.01%     
==========================================
  Files          50       50              
  Lines        3615     3618       +3     
==========================================
+ Hits         2927     2930       +3     
  Misses        688      688

Impacted Files	Coverage Δ
lux/interestingness/interestingness.py	`86.48% <0.00%> (ø)`
lux/action/univariate.py	`90.38% <100.00%> (ø)`
lux/executor/PandasExecutor.py	`96.01% <100.00%> (+0.04%)`	⬆️
lux/vislib/altair/Histogram.py	`89.58% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 127806f...853844a. Read the comment docs.

micahtyong · 2021-03-12T07:54:40Z

lux/action/univariate.py

-            c
-            for c in ldf.columns
-            if ldf.data_type[c] == "quantitative" and ldf.cardinality[c] > 5 and c != "Number of Records"
+            c for c in ldf.columns if ldf.data_type[c] == "quantitative" and c != "Number of Records"


Just to clarify, does ldf.cardinality[c] return the number of rows for a given column? In this case, I wonder if removing this check would lead to some irregular behavior when there are very few rows. Is there some other part in the code that checks for this?

no. it returns the unique number of values -- we only compute histograms for dataframes > 5 rows.

dorisjlee · 2021-03-12T12:02:28Z

Thanks @jinimukh! Could you explain in the PR what is the benefits of removing the cardinality check? In particular, what was the issue the original implementation did for columns with cardinality lower than 5? Would the charts simply not display or is there an error? Also this change seems like it was applied to both bars and histograms (for both quantitative and nominal in univariate.py)? If that's the case, we can update the description accordingly.

jinimukh · 2021-03-15T01:27:38Z

@dorisjlee If we had cardinality lower than 5, the histogram/bar graph would not show. We talked about this with respect to series — if they all have the same value, I think it would be nice to show a uniform distribution. Was there an initial reason to not even compute it in the first place other than having one (or a few) less visualizations to compute?

dorisjlee · 2021-03-15T12:09:20Z

Yes, in this case what you are showing i.e., the uniform distribution (as a bar chart) makes the most sense. I think the original concern was that if it was a single column with floats, i.e. Units = 4.0, it might be recognized as a quantitative column --> plotting a histogram where the range is 0 causes errors (I believe we already fixed this in #99). Could you add a few low cardinality test cases case for each of the data types (quantitative, nominal, etc.) that you modified part of the PR?

…iform_bar_graphs

dorisjlee · 2021-03-22T02:53:48Z

Thanks @jinimukh!

jinimukh added 16 commits December 22, 2020 20:55

coalesce data_types into data_type_lookup

2cef000

Merge branch 'master' of https://github.com/lux-org/lux

cad3e84

merge

f197884

merge fixed

1ed9655

merge conflicts

c56f79d

merged

c0388df

Merge branch 'master' of https://github.com/jinimukh/lux

cf045de

Merge branch 'master' into foo

6ae9767

merge upstream

0db6376

first commit

1e6f572

conflicts

7836abf

requirements.txt updated for pandas 1.2.2

9c17abb

Merge branch 'master' of https://github.com/jinimukh/lux

b3509f6

Merge branch 'master' of https://github.com/lux-org/lux

d428289

revert cardinality requiremment

3e6389f

black reformat

f6d344c

micahtyong reviewed Mar 12, 2021

View reviewed changes

jinimukh and others added 7 commits March 20, 2021 18:44

Merge branch 'master' of https://github.com/lux-org/lux into patch/un…

e32f50c

…iform_bar_graphs

all tests passing with cardinality optimization

33b0ad3

remove abs value

cf947e0

tests added

fce74a3

black reformat

f9d6b59

minor fixes

d9d39c2

black

853844a

dorisjlee merged commit 950eba6 into lux-org:master Mar 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert Cardinality Requirement for Histograms #301

Revert Cardinality Requirement for Histograms #301

jinimukh commented Mar 12, 2021

micahtyong Mar 12, 2021

jinimukh Mar 12, 2021

dorisjlee Mar 12, 2021

codecov bot commented Mar 12, 2021 •

edited

Loading

micahtyong Mar 12, 2021

jinimukh Mar 12, 2021

dorisjlee commented Mar 12, 2021

jinimukh commented Mar 15, 2021

dorisjlee commented Mar 15, 2021 •

edited

Loading

dorisjlee commented Mar 22, 2021

Revert Cardinality Requirement for Histograms #301

Revert Cardinality Requirement for Histograms #301

Conversation

jinimukh commented Mar 12, 2021

Overview

Changes

Example Output

micahtyong Mar 12, 2021

Choose a reason for hiding this comment

jinimukh Mar 12, 2021

Choose a reason for hiding this comment

dorisjlee Mar 12, 2021

Choose a reason for hiding this comment

codecov bot commented Mar 12, 2021 • edited Loading

Codecov Report

micahtyong Mar 12, 2021

Choose a reason for hiding this comment

jinimukh Mar 12, 2021

Choose a reason for hiding this comment

dorisjlee commented Mar 12, 2021

jinimukh commented Mar 15, 2021

dorisjlee commented Mar 15, 2021 • edited Loading

dorisjlee commented Mar 22, 2021

codecov bot commented Mar 12, 2021 •

edited

Loading

dorisjlee commented Mar 15, 2021 •

edited

Loading