Skip to content

pandas index & multiindex in TableReport#1083

Merged
TheooJ merged 9 commits intoskrub-data:mainfrom
jeromedockes:pandas-multiindex
Sep 23, 2024
Merged

pandas index & multiindex in TableReport#1083
TheooJ merged 9 commits intoskrub-data:mainfrom
jeromedockes:pandas-multiindex

Conversation

@jeromedockes
Copy link
Copy Markdown
Member

@jeromedockes jeromedockes commented Sep 23, 2024

supersedes #1074

adds displaying pandas index, handle correctly multiIndex (both for index and columns) and index/columns level names if any are present

allows controlling how many rows are displayed in the sample table

also some refactoring of how the html table is constructed

@jeromedockes jeromedockes marked this pull request as draft September 23, 2024 10:16
@jeromedockes
Copy link
Copy Markdown
Member Author

jeromedockes commented Sep 23, 2024

todo:

  • keep only one parameter for controlling number of displayed rows

@jeromedockes jeromedockes marked this pull request as ready for review September 23, 2024 10:46
@jeromedockes
Copy link
Copy Markdown
Member Author

I reduced the number of rows from 10 to 5 in the srkub homepage.

BTW, not really related to this PR but in the home page if we want the report to take yet less vertical space one way is to give it a bit more horizontal space (either making the sk-landing-page a bit wider or giving the report more space compared to the code snippet on its right) because then the column filters select will drop to be on the same line as the tabs

@jeromedockes
Copy link
Copy Markdown
Member Author

ok this one is ready for review

here is an example to generate a report with multiple column and index levels:

Details
# %%
import datetime

import pandas as pd
import numpy as np

from skrub import TableReport

df = pd.DataFrame(
    {
        "A": ["one", "one", "two", "three"] * 6,
        "B": ["A", "B", "C"] * 8,
        "C": ["foo", "foo", "foo", "bar", "bar", "bar"] * 4,
        "D": np.random.randn(24),
        "E": np.random.randn(24),
        "F": [datetime.datetime(2013, i, 1) for i in range(1, 13)]
        + [datetime.datetime(2013, i, 15) for i in range(1, 13)],
    }
)

df = pd.pivot_table(
    df,
    values="E",
    index=["B", "C"],
    columns=["A"],
    aggfunc=["sum", "mean"],
)

report = TableReport(df)
report

# %%
report.open()

@jeromedockes jeromedockes added this to the 0.3.1 milestone Sep 23, 2024
order_by=None,
with_plots=True,
title=None,
max_top_slice_size=5,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe top_rows and bottom_rows ? Or n_top_rows ?
Feels like max_top_slice_size is kind of long

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed it is long, but there are quite a few "rows" flying around, and we need to distinguish between the maximum number of displayed rows and the actual number so I wanted to be very explicit here.

note this is a parameter of a private helper, the parameter exposed to the user is just n_rows

Copy link
Copy Markdown
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I would prefer if the "!important" in CSS could be removed

@jeromedockes
Copy link
Copy Markdown
Member Author

LGTM, but I would prefer if the "!important" in CSS could be removed

yep I removed it already. it's just that pure has a selector that is quite specific already (table class name + nth pseudo-class for alternating rows + elem) and I didn't want to have to come up with an artificially specific one but actually I remembered I wrapped the report in an element with an id which comes in handy for that

@TheooJ TheooJ merged commit f074d00 into skrub-data:main Sep 23, 2024
@TheooJ
Copy link
Copy Markdown
Contributor

TheooJ commented Sep 23, 2024

Merged, thanks @jeromedockes !

@jeromedockes jeromedockes deleted the pandas-multiindex branch September 23, 2024 14:20
jeromedockes added a commit to jeromedockes/skrub that referenced this pull request Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants