-
Notifications
You must be signed in to change notification settings - Fork 191
Add skrub.Report
#984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add skrub.Report
#984
Conversation
|
Nitpick: I think that it should be called "TableReport" :) |
|
Nitpick: I think that it should be called "TableReport" :)
good point -- we may want to add other kinds of reports eg on models
|
|
ok I'll have another look tomorrow but I think I have addressed most comments from @GaelVaroquaux |
|
Do you want to work on your on-line visualizer and add it in the docs, or should we rather merge before (probably a good idea)? |
This reverts commit a5cdf17.
|
I think we can merge this PR.
If at some point we are happy enough with the online visualizer to add
it to the docs we can do another PR. It might be a good idea to wait for
the online thing to advertise or release the reports, but not to merge
the PR IMO.
I also want to exclude the `js_tests/` folder from the distribution
package in a `MANIFEST.in` (it saves a few kb in the wheel), but i'll do
that in a dedicated PR because there are other files we may want to
exclude -- eg we can divide the size of the source tarball by > 5 if we
exclude the parquet files in `benchmarks/`
|
|
Failing test (a quick look suggests that it is related to the PR). Playing with the report, I just had an idea of what I think is a useful functionality that's probably very easy to implement: if there are very similar columns (threshold to be defined, I'd say .9), add in the drop-down menu that selects columns to display an entry named "very similar columns" that would only select columns that have a similarity measure with another column above the threshold. |
|
Failing test (a quick look suggests that it is related to the PR).
yes I'm on it :) I made a small change on how labels are rotated to
prevent overlap but apparently used something the oldest supported
matplotlib doesn't like
Playing with the report, I just had an idea of what I think is a useful functionality that's probably very easy to implement: if there are very similar columns (threshold to be defined, I'd say .9), add in the drop-down menu that selects columns to display an entry named "very similar columns" that would only select columns that have a similarity measure with another column above the threshold.
I like it! Indeed it's just a question of adding a list of those column
names in a dictionary somewhere so I'll do it in this PR
|
Awesome. I definitely see myself using this functionality |
|
Awesome. I definitely see myself using this functionality
any suggestion on a short description of those columns for the drop-down?
"Columns with high similarity"?
|
|
"Columns with high similarity"?
👍
|
|
you can see the new filter in action here the matplotlib thingy is fixed too so the PR should be ok now |
GaelVaroquaux
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two tiny comments and then merge
skrub/_dataframe/_common.py
Outdated
| # anything with them; cast raises an exception. | ||
| # polars emits a performance warning when using map_elements | ||
| with warnings.catch_warnings(): | ||
| warnings.simplefilter("ignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would be nice if we where a bit more specific in the warning that we catch
Co-authored-by: Gael Varoquaux <gael.varoquaux@normalesup.org>
|
Hurray, merged!!! |
moving skrubview into skrub