Skip to content
This repository has been archived by the owner on Jun 4, 2024. It is now read-only.

Ability to export data as excel or csv #313

Closed
chriddyp opened this issue Dec 19, 2018 · 8 comments · Fixed by #499
Closed

Ability to export data as excel or csv #313

chriddyp opened this issue Dec 19, 2018 · 8 comments · Fixed by #499
Assignees
Labels
dash-meta-sponsored Development that has been sponsored by an organization https://plot.ly/products/consulting-and-oem/ dash-type-enhancement New feature or request size: 8
Milestone

Comments

@chriddyp
Copy link
Member

  • For Excel files, only XLSX (not XLS) will be supported
  • Only the data will be exported, formatting will not be exported
  • The export will include the data in the current view. For example, if columns are hidden, sorted, or filtered, then the exported file will display the current view.
  • Export will not protect users from “CSV Injection” attacks (https://www.owasp.org/index.php/CSV_Injection)
    Exporting Excel files may require a large 3rd party open source library. This library is too large to include in the default version of dash-table, so we will need to engineer dynamic JavaScript module loading as part of this requirement.

We will consider the UI for this feature in a separate issue (we need to design a UI needs encompasses all of the new features that we're adding).

@chriddyp chriddyp added the dash-meta-sponsored Development that has been sponsored by an organization https://plot.ly/products/consulting-and-oem/ label Dec 19, 2018
@chriddyp chriddyp changed the title dash-table - Ability to export data as excel or csv [Sponsored: Due April 1] Ability to export data as excel or csv [Sponsored: Due April 1] Dec 19, 2018
@Marc-Andre-Rivet
Copy link
Contributor

Marc-Andre-Rivet commented Jun 17, 2019

XLSX: One option that seems well supported https://github.com/SheetJS/js-xlsx -- also has a pro/commercial tier with unknown conditions. The community version is under Apache License; the implications of that might need to be checked out to be safe.

CSV: Many options exist Including SheetJS above.

As for other optional operations on the table, implement this in the most isolated manner possible to keep it well isolated from the table's implementation. Internally should be its own component.

The feature's implementation must be purely front-end and will involve some JS magic to initiate what looks like a file download -- this is a known problem / been solved may times.

The implementation should take into consideration the possibility of exporting the data as it is displayed in the table for future feature/improvement.

Lazy-loading would be great but is not needed for this feature to be implemented. Will increase the table's bundle size significantly though.

With a purely functional UI, this shouldn't take more than a few days to implement and test.

@Marc-Andre-Rivet
Copy link
Contributor

Just dropping this here, the pro version of xlsx-js offers styling support
https://sheetjs.com/pro

No idea about licensing.

@Marc-Andre-Rivet
Copy link
Contributor

Marc-Andre-Rivet commented Jun 21, 2019

@alinastarkov From our previous discussion, I'd say the CSV should export with the column id's and the XLSX should export with the column header names (multi-row / merged if required)

The idea being that if the user does a CSV export I would expect him to be somewhat aware of the data structure or planning on further processing the data in system x-y-z. Exporting headers would potentially cause the CSV to have multiple header rows which is not processed by default (in pandas, can do headers=[0,1] for rows 0 and 1 being used as headers) -- also, exporting the headers might result in duplicated names.

For XLSX, the user would arguably want to present or further process the data manually, not programmatically, and in this case the header names are more relevant.

@alexcjohnson, additional insight?

@chriddyp
Copy link
Member Author

chriddyp commented Jul 9, 2019

Similar to my comment in #318 (comment), the behaviour should be predictable and customizable by the Dash developer to start. The Dash developer will define the import/export UI for their App to serve a particular, structured workflow.

So, to start:

  • CSV & Excel should have the same behavior
  • Header names by default.
  • A follow-up PR could make this customizable to header IDs programatically
  • Merged rows should be expanded to use duplicated values
  • Export type should be customizable - Excel or CSV to start.

In the future, we could enable the end-user to choose the behavior when they export as part of #316. However, that's not needed for the first pass.

A possible API could look like:

DataTable(
    export_enabled=True,
    export_filetype='csv' # | 'xlsx'
    export_headers='names' # | 'ids' <- this isn't needed for first release
)

The UI could be simple to start - display an "export" button at the bottom of the table, similar to pagination.

To not lock us into this UI, we could require the users to specify the button location as a property so that when we implement #316 in the future, we could change the default. i.e.

DataTable(
    export_ui='button',  # this default might become 'menu' in the future with #316 
    export_position='bottom'
)

@Marc-Andre-Rivet
Copy link
Contributor

Marc-Andre-Rivet commented Jul 9, 2019

DataTable(
    export_enabled=True,
    export_filetype='csv' # | 'xlsx'
    export_headers='names' # | 'ids' <- this isn't needed for first release
)

With our 1.0 stability/maturity statements in mind, I would advice against creating new props that only take one value and wait for the feature/support to be added before adding them. With that in mind I would defer export_position, export_ui. Both could become moot, defaults misaligned or lack the desired flexibility / lock us in.

export_enabled and export_filetype is a prop combo that is similar to what we used to do for filter/paging/sorting -- I would suggest a similar approach to what we currently do for these and have a export_action that can be 'none'|'csv'|'excel', we can later add support for multiple export types by additionally supporting arrays of values.

  • CSV & Excel should have the same behavior
  • Header names by default.

Good point for consistency. Although, I’m wondering if CSV and Excel exports should actually be consistent to start with as they might represent different use cases. Hence the differential behavior suggested above. @chriddyp could you delve deeper on the motivation behind this feature and usage scenarios?

Merged rows should be expanded to use duplicated values

If the table had merged header rows and we are currently capable of merging those rows, my feeling is that we should make the output (XLSX) match the source as best we can. I'd even envision us exporting the cell styling in the future.

@chriddyp
Copy link
Member Author

chriddyp commented Jul 9, 2019

they might represent different use cases

There are equally valid and common use cases for human readable CSVs and machine-readable, non-duplicated CSVs, so I vote for picking one of them, keeping it consistent with excel, and allowing it to be overridable.
Header names is more intuitive, so I suggest that is the default.

@chriddyp
Copy link
Member Author

chriddyp commented Jul 9, 2019

If the table had merged header rows and we are currently capable of merging those rows, my feeling is that we should make the output (XLSX) match the source as best we can.

I’d say this is another case where it’ll depend on the structured workflow defined by the dash developer, and as such something we should make configurable. But to start, we can just expand it out so that it matches CSV.

@alexcjohnson
Copy link
Collaborator

From talking with @Marc-Andre-Rivet yesterday it seems like the ability to choose export_headers=('ids'|'names'|'display'|'none') is nearly done already, so we needn't sacrifice API design for speed of shipping. (by 'display' I mean "what we displayed, ie with merged cells" - name up for discussion - so 'names' would be exactly as provided.)

Yes there are valid use-cases for human-readable CSV and machine-readable XLSX, but I'd argue the most common use cases, enough to warrant setting a filetype-dependent default for headers, are machine-readable CSV ('ids') and human-readable XLSX ('display'). The contrary use cases AFAICT are nearly all "fitting into an existing workflow that already expects this format" whereas if you're designing something new you'd choose CSV to feed to a machine and XLSX for human consumption. In particular, there are a lot of existing tools for CSV reading that will break with multiline headers, so it seems more robust to ask users to explicitly choose names if that's what they want.

(Incidentally MAR and I discussed eventually even allowing merged headers with CSV, taking into account left/right alignment... not for v1 though, CSV/display would for now behave as CSV/names)

@Marc-Andre-Rivet Marc-Andre-Rivet added dash-type-enhancement New feature or request size: 8 and removed size: 10 labels Jul 11, 2019
@Marc-Andre-Rivet Marc-Andre-Rivet changed the title Ability to export data as excel or csv [Sponsored: Due April 1] Ability to export data as excel or csv Jul 17, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dash-meta-sponsored Development that has been sponsored by an organization https://plot.ly/products/consulting-and-oem/ dash-type-enhancement New feature or request size: 8
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants