Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an executive summary in the reports #5

Open
richierocks opened this issue Mar 27, 2017 · 5 comments
Open

Provide an executive summary in the reports #5

richierocks opened this issue Mar 27, 2017 · 5 comments
Assignees

Comments

@richierocks
Copy link

It would be useful to have an executive summary page near the start of the report generated by clean that provides an overview of any problems found. (This is particularly useful for larger datasets.)

This summary could contain

  • How many columns of each type the dataset contains.
  • The distribution of the fraction of missing values for each column.
  • The names of the top 5 most problematic columns.
@ekstroem
Copy link
Owner

ekstroem commented Mar 27, 2017

Something along those lines were added about a week ago. If you have suggestions/ideas to include in the table then please let us know.

@richierocks
Copy link
Author

I've just installed the development version, and I see that the number of rows and columns in the dataset are shown, along with a table of which checks were performed.

The point of an executive summary would be to minimize the time for readers to find the big problems with their dataset. So a table of the top 5 columns ordered by most rows failing their checks would be useful.

You could also look at individual checks, and see which columns have the most rows failing that particular check. For example, the top 5 rows failing the "missing values" check.

Hyperlinks to the sections describing those columns would be a bonus.

@richierocks
Copy link
Author

Oh, hang on, I see that there is a summary table with missing values by column now too.

I want something like this, but ordered in decreasing number of missing values. And the same for other things like number of duplicates.

@ekstroem
Copy link
Owner

The names in the table are already hyperlinks to the relevant space in the document but aren't currently underlined.
For html output it might be an idea to have this table as an html datatable where the user an sort the table according to different columns.

@ekstroem ekstroem self-assigned this Sep 3, 2018
@ekstroem ekstroem added this to the 1.2 milestone Sep 3, 2018
@ekstroem ekstroem removed this from the 1.2 milestone Oct 3, 2018
@ekstroem
Copy link
Owner

ekstroem commented Oct 3, 2018

I looked a bit into trying to include this table as a DT widget, so that might be a path to pursue. However, inserting widgets wasn't working right out of the box so this might take some extra time to get implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants