Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: remove mizani as dependency, re-implement logic internally #271

Merged
merged 9 commits into from
Apr 10, 2024

Conversation

machow
Copy link
Collaborator

@machow machow commented Apr 9, 2024

This PR removes mizani as a dependency, so that we don't transitively depend on scipy and pandas. Note that our internal implementation does not depend on numpy, so that we can drop it as a dependency in a later PR.

This implementation is not as elegant as mizani's, or quite as optimized, but should do well for our purposes.

Speed

Our default implementation was about 50% slower than mizani in a simple test. But both are very fast for simple table displays (1.52 ms for 1,000 points for ours, 1ms for mizani.)

from great_tables._data_color.palettes import GradientPalette, CoeffSequence
from mizani.palettes import gradient_n_pal

palette = GradientPalette(["red", "orange", "blue", "grey", "yellow", "red", "orange"])
palette2 = GradientPalette(["red", "orange", "blue", "grey", "yellow", "red", "orange"], cls_coeff_sequence=CoeffSequence)
palette3 = gradient_n_pal(["red", "orange", "blue", "grey", "yellow", "red", "orange"])


%%timeit
# internal + bisect lookup: 1.52 ms ± 5.85 µs per loop
palette([x / 1000. for x in range(1000)])

%%timeit
# 1.76 ms ± 6.27 µs per loop
palette2([x / 1000. for x in range(1000)])

%%timeit
# mizani: 1.03 ms ± 9.82 µs per loop
palette3([x / 1000. for x in range(1000)])

The main difference is that we're using a simple bisect lookup to find cutoff corresponding to a value (in order to get coefficients for transforming within a cutoff band). Mizani uses a table lookup, which cuts the input/response space into 256 bins.

Fixes: #7

@github-actions github-actions bot temporarily deployed to pr-271 April 10, 2024 18:28 Destroyed
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT changes to snapshots are due to differences in rounding (and/or the resolution of mizani's table lookup).

@github-actions github-actions bot temporarily deployed to pr-271 April 10, 2024 18:40 Destroyed
@codecov-commenter
Copy link

codecov-commenter commented Apr 10, 2024

Codecov Report

Attention: Patch coverage is 93.02326% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 81.87%. Comparing base (ece39e8) to head (05578a6).
Report is 125 commits behind head on main.

Files Patch % Lines
great_tables/_data_color/palettes.py 92.77% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #271      +/-   ##
==========================================
+ Coverage   77.01%   81.87%   +4.86%     
==========================================
  Files          40       41       +1     
  Lines        4229     4310      +81     
==========================================
+ Hits         3257     3529     +272     
+ Misses        972      781     -191     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot temporarily deployed to pr-271 April 10, 2024 18:50 Destroyed
@github-actions github-actions bot temporarily deployed to pr-271 April 10, 2024 19:54 Destroyed
@github-actions github-actions bot temporarily deployed to pr-271 April 10, 2024 19:58 Destroyed
@rich-iannone
Copy link
Member

Thank you for the huge amount of work you put into this and the related PR! Will review shortly.

Copy link
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@machow
Copy link
Collaborator Author

machow commented Apr 10, 2024

@abstractqqq in case it's useful for polars_ds, this PR should remove pandas as a dependency (except for importing from great_tables.data, which we'll tackle later!). We should be able to release soon. Definitely let us know if you run into any issues with it.

@machow machow merged commit ed09f9d into main Apr 10, 2024
9 checks passed
@rich-iannone rich-iannone deleted the feat-gradient-func branch April 11, 2024 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

epic: make pandas an optional dependency
3 participants