The goal is to implement segmentation in Piwik. First a simpler version allowing pre-chosen segments and reports, then evolve into a more open segmentation model similar to what GA offers.
All get* functions returning analytics reports will get a new parameter $segment. This parameter, by default empty, means we return data for all visitors (current behavior).
When set eg. $segment = "country==fr"
the report returned is segmented to visits having this custom value.
We must define and document a list of available dimensions to query, eg.:
Later we can imagine extending the syntax to support AND and OR eg. $segment = "customName1==loggedIn,customValue1=yes" or even support 'contain', 'is not' operators, etc.
Archiving reports for segments mean: filter these segments, then aggregate.
Queries doing the Piwik archiving have to be updated
AND segment1 = $value
GROUP BY label, segment1
For example segment= country==fr, the query for best browsers would become select count ... where ... and location_country=fr ..... group by visitor_browser, location_country
Archives stored must also somehow store the segment query as well as site,date1,date2: do we need a new key in the archive tables? or probably this can be encoded in the archive.name attribute somehow (one more reason we need to keep CONCAT(custom names + values) small in length)
Segmentation: Known segments control list
To keep things fast for 'known segments', we can
allow user to create a list of segments he is interested in.
User would also provide a list of reports to pre-process, rather than pre-processing all reports.
'keyword==Piwik' // known valid segment
=> array('Referer.getKeywords','Page.getPageUrls', .. ) // reports forced to be pre-processed
In this case Piwik will, during archiving, only process these reports and not more. If later the user would want to access more reports, and if logs are still available, he could change the list which would affect reports going forward.
In V1, if a requested segment/report pair wasn't pre-processed, archiving will return no data.
Possibly huge depending on data set. Real time performance likely to be slow with mysql, so pre-processing reports highly recommended (defining a list of known reports, see above)
See other ticket #2092
Please submit any feedback/question.
I am going to work on Phase 1: API / Archiving modification to allow custom segment querying.
Phase 2 (post 1.2 release) will include UI modifications to create, edit, delete and visualize segments.
(In ) Cosmetic changes/refactoring preparing for code reuse for Segmentation Refs #1736
(In ) Various code changes to prepare for Segmentation refs #1736
(In ) Refs #1736
(In ) Refs #1736 Sorting segments list to prevent random order test fail
Renaming one segment
(In ) Refs #1736 Adding new setting to disable Segmentation for Anonymous user, as a preventive measure
(In ) Refs #1736 Adding new setting to force the list of Segments to process during cron execution.
Example in config.ini.php
; Pre-process the visitor types segment
(In ) Refs #1736 Only showing the widget "Top Keywords for Page" when segmentation is enabled (ie. if anonymous user, check setting)
(In ) Refs #1736 - Segmentation doc now online: http://piwik.org/docs/analytics-api/segmentation/
(In ) Refs #1736
V1 implemented, see #2092 for the next iteration