Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Improve CSV/TSV Export #2714

Closed
BeezyT opened this Issue · 37 comments

3 participants

Timo B. Matthieu Aubry Julien Moumné
Timo B.
Collaborator

The improvements include the following API extensions

  • expanded=1 should work for CSV/TSV exports. Labels have to be put together in a path.
  • New parameter includeInnerNodes to choose whether inner nodes (e.g. folders in the pages report) should be exported as well. If set to 1, they are present and the report is easier to read for humans. If 0, they are left out and it's easier to process the report, e.g. calculate sums.
  • New parameter translateColumnNames to get localized names of metrics.

The export link in Piwik should be adjusted

  • expanded=1 by default
  • translateColumnNames=1 by default for CSV and TSV format
  • Use the API_datatable_default_limit configuration for export links
Matthieu Aubry
Owner

Use the API_datatable_default_limit configuration for export links
What do you mean?

New parameter includeInnerNodes to choose whether inner nodes (e.g. folders in the pages report) should be exported as well.
Why would we want not to export some parts of the report? Not sure I understand correctly what you meant.

  • Please add unit tests for these new features.
  • The "expanded=1" feature request is covered in the ticket #2005
Timo B.
Collaborator
  1. When you click the export button, filter_limit=100 is added to the URL. The value is hardcoded in the template. This should be configurable (e.g. using API_datatable_default_limit).

  2. When we export folders (with aggregated values for metrics), this can be useful in some cases, but not always. For example, if you want to process the export further (e.g. calculate the sum of page views in excel), this might lead to unexpected/wrong results.

Matthieu Aubry
Owner

1) sounds good, definitely a bug

2) would includeInnerNodes=0 same as "hideRowsWithSubtable=1" ?
Not sure how useful is this feature, what exactly is the use case? Because to calculate the sum of page views would still require the folders to be in the output (since folders "nb_actions" is the aggregate page views of all pages inside this folder)?

Timo B.
Collaborator

2) Folders that are tracked, are reported under /path/to/folder/index. The folder /path/to/folder/ has only aggregated values.

Here's an example:

Actual views:

  • /path/to/folder 3 times
  • /path/to/folder/foo.html 2 times

Full report:

  • /path/to/folder 5 views (3+2)
  • /path/to/folder/index 3 views
  • /path/to/folder/foo.html 2 views

If we export it like this, it's easier to read for humans and might be useful in some cases. But if you want to sum the pageviews, the sum is 8, which is incorrect.

If we export without inner nodes, it looks like this:

  • /path/to/folder/index 3 views
  • /path/to/folder/foo.html 2 views

The sum is 5, which is correct.

hideRowsWithSubtable=1 would hide the entire row "/path/to/folder" with its subtable, right? I want to hide only the aggregated row and keep the subtable.

Matthieu Aubry
Owner

I finally understand, nothing beats a good example ;)

Maybe the parameter could be called "hideExpandedRows" ?

Timo B.
Collaborator

"hideExpandedRows" sounds a lot like "hideRowsWithSubtable". The name should make clear that only the inner nodes of the tree (i.e. the rows with aggregated values) but not their children are hidden. I couldn't come up with anything better than "includeInnerNodes"...

Timo B.
Collaborator

In general, the default for includeInnerNodes should be 0, since this is the version of the output that is most useful for further processing.

But what about the export icon in the Piwik UI? Should it add includeInnerNodes=1 by default?

Timo B.
Collaborator

(In [5334]) refs #2714 improved csv/tsv export

Timo B.
Collaborator

There are columns in the export that are not used in the UI. Therefore, they are not documented (and translated) in the metadata. I added generic translations to the CSV-Renderer (Piwik_DataTable_Renderer_Csv::translateColumnNames), which reduces the impact of this ticket on other parts of Piwik. Please review and consider alternative options: we could add some of the metrics to Piwik_API_API::getDefaultProcessedMetrics or extend the metadata output of all plugins.

Timo B.
Collaborator

On the build server, a unit test is failing that passes on my machine.

When exporting with language=de, the number format changes (1,7 instead of 1.7) on my machine but not on the build server. I think the behavior of the build server is correct, any ideas why the outputs differ?

The comma in the number also breaks the CSV format where commas are used to separate columns.

Timo B.
Collaborator

(In [5335]) refs #2714 changed expected number format of csv export according to behavior of build server

Timo B.
Collaborator

(In [5336]) refs #2714 using API_datatable_default_limit config for export icons, upping default filter limit to 100 (which was the value previously hardcoded in the datatable template)

Matthieu Aubry
Owner

(In [5345]) Refs #2714 remove echo

Matthieu Aubry
Owner

Cool!

Feedback

  • includeInnerNodes = false by default sounds good. This way we don't have to document or expose this parameter publicly.
  • For consistency, JSON and HTML exports (not XML) should also support the new parameters translateColumnNames... so the translation code could be refactored in the parent renderer class.
Timo B.
Collaborator

A general problem with CSV/TSV is, that the download starts right away and the URL can't be modified afterwards (e.g. to change includeInnerNodes). Since the possible parameters are getting more and more complex, it might be a good idea to introduce an export dialog. It could look like this:

  • When the export icon (floppy disc) is clicked, a dialog opens
  • You can select the format and general parameters
  • Depending on the format you get more parameters (e.g. translateColumnNames when it's available)
  • You can either copy the link or open the URL
  • The dialog closes

This way, normal users don't have to read the API documentation (which will probably be too hard to read for them) and still can use all the cool features.

What do you think?

Matthieu Aubry
Owner

Sounds interesting, but I think a bit too complicated. The simple approach of putting expanded=1&translateColumnNames=1 in the URL is good. Also as proposed, the includeInnerNodes would always be set to 0 (when expanded=1) but this is done in the code, not in the URL. Sane defaults are better than offering too much control ;)

Julien Moumné
Collaborator

Concerning comment:15, how about creating a new ticket/place holder on the subject of defining an advanced export page in Piwik ?

Matthieu Aubry
Owner

(In [5382]) Refs #2714 Remove idsubtable metadata column from output being tested since it has different value in build server and local dev station

Timo B.
Collaborator

(In [5419]) refs #2714 additional column translations and number format fix for csv exports, refs #1454 metadata for API.get report (needed for csv column translations)

Timo B.
Collaborator

(In [5420]) refs #2714, #1454 test fix for previous commit

Timo B.
Collaborator

At the moment, there's a list of translations in Piwik_DataTable_Renderer_Csv::translateColumnNames. This list is needed because plugins only supply translations for the metrics that are displayed and the API exposes more columns.

I'm not happy with this solution because the renderer should not care about the concrete translations. Also, when we add more "invisible" columns, we'll have to add them to the renderer, which is not intuitive at all. I think, we should include the invlisible columns in the meta data output. Adding them to "metrics" could be problematic because "metrics" should only contain columns that are relevant for the UI. How about adding a new array? We already have "metrics" and "processedMetrics", maybe "additionalMetrics"?

Matthieu Aubry
Owner

I'm not happy with this solution because the renderer should not care about the concrete translations. Also, when we add more "invisible" columns, we'll have to add them to the renderer, which is not intuitive at all. I think, we should include the invlisible columns in the meta data output. Adding them to "metrics" could be problematic because "metrics" should only contain columns that are relevant for the UI. How about adding a new array? We already have "metrics" and "processedMetrics", maybe "additionalMetrics"?

I agree that column labels in renderer is not ideal... but adding data in the output, that we don't yet use is maybe not better (performance overhead).

As a quick fix, maybe you can move the "Core/translation" logic out of the renderer into a public static function in API/API.php (in the plugin class rather than public API) - so that at least all logic is in the same place. I think it'd enough?

Timo B.
Collaborator

(In [5458]) refs #2714 removed outdated comment

Timo B.
Collaborator

(In [5463]) refs #2714 column translations for html and rss export, general fine tuning of export column translations

Timo B.
Collaborator

As far as I'm concerned, we can close this ticket as soon as #2765 is fixed.

Matthieu Aubry
Owner
  • '&includeInnerNodes=1'

we mentioned we didn't need to show this parameter publicly, can we remove from link and assume ==1 by default in code?

Otherwise, excellent changes! It feels good to see very old features being polished :)

Matthieu Aubry
Owner

(In [5468]) Refs #2714 Prevent warning when cookie not init

Timo B.
Collaborator

I still think we might need both options (includeInnerNodes=(0|1)). Leaving the inner nodes out is better for further processing because the sums are right. On the other hand, some reports contain valueable information on inner nodes only. For example, the search engines report has conversion metrics for the search engines but not for the keywords. If we leave inner nodes out, only the keywords are exported and we don't have the metrics.

At the moment includeInnerNodes is 1 when the export icon is clicked but the API default is 0. I did it this way because it's more likely that an auto-fetched report is processed further. Still, I'm not entirely happy...

  • We could ask the user whether inner nodes should be included when he clicks the export icon (simple yes/no-js-dialog)
  • We could always default to 1 (also in the API) for consistency
  • We could add another export link XTSV (extended TSV) that includes the inner nodes
  • We could introduce an export dialog (as mentioned before). The user can also configure whether column labels should be translated, which is not possible at the moment.
Matthieu Aubry
Owner

Thx for clarifications.

We could always default to 1 (also in the API) for consistency

Sounds good to me... There is no known problem with that, we will add a FAQ if users ask about it.

It is the easiest so the way to go maybe since we want to KISS...

Timo B.
Collaborator

(In [5477]) fixes #2714 includeInnerNodes=1 by default, nice export filename for API.get report

Timo B.
Collaborator

The realted ticket #2765 is still open. If it was closed, CSV export would be polished completely. Matt, did you work on it yet?

Matthieu Aubry
Owner

No and I will probably not fix it for the next release I'm afraid...

Timo B.
Collaborator

(In [5495]) refs #2714 use default language of user for export file name

Timo B.
Collaborator

(In [5496]) refs #2714 export filename compatible with firefox

Timo B.
Collaborator

(In [5529]) fixes #2809, refs #2714: making signature of Piwik_DataTable_Renderer::renderHeader() compatible across subclasses

Matthieu Aubry
Owner

(In [5838]) Refs #2714
Exporting TSV/RSS links with human readable column names, from the API page.

Timo B.
Collaborator

(In [6181]) refs #2714, refs #3073

  • flat settings are passed to graph views and exports
  • csv renderer does not do flattening anymore. the new api parameters can be used instead - js and tests updated accordingly
  • aggregate rows are italic (imo, the plus logo would be misleading because it would suggest clickability)
  • disable row evolution on flat tables for performance reasons
  • minor language adjustments
Timo B. BeezyT added this to the 1.7 Piwik 1.7 milestone
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.