The information contained in the tables _log_* should be purged automatically and regularly. Keeping all time logs in a single table significantly slows down the stats logging process (mysql having to rebuild indices, selecting from this million rows table is time consuming, maintenance is hard, etc.).
The goal of this task is to provide automatic purge of the tracking logs, every day or every month and with optional backup in a yearly table (customized with UI settings).
Processing uniques over weeks / months without using logs
In #409, we will implement a cookie store mechanism based on a mysql lookup table. This table will contain enough data (idvisitor, ip, idsite, date first visit, date last visit) to process unique visitors over a week or a month
Log purge execution
The purge task would be triggered during the ‘Maintenance process’ (see #1184), and once a day maximum it would execute and try to purge logs for the day (or month) before.
New Super User admin settings
A few interesting resources
After pruning, we can also use MySQL's "COMPRESS" on the corresponding archive tables. A side-effect is that the archive table is read-only, but that's ok if the raw visit information no longer exists to regenerate those archives.
Risk of this task is high, and time is too limited. Performance are fine long term, the downside of not purging logs is of course much higher disk space usage, but this is less an issue for 1.0 than performance.
delaying to post 1.0.
This should probably be done at same time as #53
Note that when Purging is enabled, we should review the "Unique Visitors" processing or disable it when logs are not available for the requested date range.
Be careful when implementing this, with regards to archives that can be deleted after the fact in another ticket: #2328 - probably, in #2328, we should delete archives for days where the logs have already been deleted
When this is implemented, we should update various FAQs that explain that it is safe to delete archive tables & have them re-processed from logs. It will not be the case, if logs have been deleted, and this should be clearly stated in all FAQs and documentations.
(In ) PrivacyManager / Delete old statistics from database; Refs #2233, #53, #5
(In ) Fixes #2233, Refs #5425
(In ) Refs #2233, #53, #5
Fixed, we can always open another ticket for more advanced "pruning" and to move data to "archived" tables. Great to have this feature in 1.5!!
I see a potential bug: the task is set to "Not yet rescheduled" after I clicked "Yes" and clicked Save, and after reload the task is not scheduled. Thoughts?
(see attached screenshot)
Attachment: Delete old logs piwik - Not yet rescheduled task
not yet rescheduler.png
(In ) Fixes #5425 - refactored the display "last run" and "next scheduled run" method. - Since the Scheduled Tasks Timetable is not immediately updated when the plugin is enabled, we can not rely on that value. Now the displayed times should be correct and we don't have to compensate the not yet set Schedule-Timetable value with the "not yet rescheduled" phrase.
(In ) Refs #5425 - Make sure, log deletion will not be triggered before calculated "next scheduled deletion" time.
see follow up ticket to purge log_action #2805
Make Git the default VCS for future builds.