Tables log_ are not purged after archiving is done #5425

Closed
mattab opened this Issue Jan 3, 2008 · 16 comments

Comments

Projects
None yet
4 participants
@mattab
Member

mattab commented Jan 3, 2008

The information contained in the tables log\* should be purged automatically and regularly. Keeping all time logs in a single table significantly slows down the stats logging process (mysql having to rebuild indices, selecting from this million rows table is time consuming, maintenance is hard, etc.).

The goal of this task is to provide automatic purge of the tracking logs, every day or every month and with optional backup in a yearly table (customized with UI settings).

Processing uniques over weeks / months without using logs
In #409, we will implement a cookie store mechanism based on a mysql lookup table. This table will contain enough data (idvisitor, ip, idsite, date first visit, date last visit) to process unique visitors over a week or a month

Log purge execution
The purge task would be triggered during the ‘Maintenance process’ (see #1184), and once a day maximum it would execute and try to purge logs for the day (or month) before.
- read settings (purge every day? every month?)
- ensure all archives have been processed for the day archives contained in the log tables. Weeks/months are not necessary to be processed as they are the sum of daily archives (and uniques can be processed using the cookie store rather than keeping the full visitor logs for the period)
- purge tracker for all websites at once
- backs up the logs into a yearly archive table
– INSERT INTO log_X_backups_2010 SELECT \* FROM log_X
– one backup per purged table (log_visit, log_conversion, etc.)
- record the auto increment values (result of SELECT MAX
- purge logs (TRUNCATE TABLE)
- inserts last auto increment in each table

New Super User admin settings
- Purge traffic logs and process reports at least every day/month
- Backup traffic logs into a backup table containing all logs for the year (table is piwik_log_visit_backup2010)?
- Could also add admin setting for ‘Keep visitor cookies for 1 year’ (from #409)

A few interesting resources
- How to write efficient archiving and purging jobs in SQL http://www.xaprb.com/blog/2006/05/02/how-to-write-efficient-archiving-and-purging-jobs-in-sql/
- MK-archiver tool http://www.maatkit.org/doc/mk-archiver.htm

@robocoder

This comment has been minimized.

Show comment
Hide comment
@robocoder

robocoder Feb 3, 2010

Contributor

After pruning, we can also use MySQL's "COMPRESS" on the corresponding archive tables. A side-effect is that the archive table is read-only, but that's ok if the raw visit information no longer exists to regenerate those archives.

Contributor

robocoder commented Feb 3, 2010

After pruning, we can also use MySQL's "COMPRESS" on the corresponding archive tables. A side-effect is that the archive table is read-only, but that's ok if the raw visit information no longer exists to regenerate those archives.

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Jul 27, 2010

Member

Risk of this task is high, and time is too limited. Performance are fine long term, the downside of not purging logs is of course much higher disk space usage, but this is less an issue for 1.0 than performance.

delaying to post 1.0.

Member

mattab commented Jul 27, 2010

Risk of this task is high, and time is too limited. Performance are fine long term, the downside of not purging logs is of course much higher disk space usage, but this is less an issue for 1.0 than performance.

delaying to post 1.0.

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Dec 28, 2010

Member

This should probably be done at same time as #53

Member

mattab commented Dec 28, 2010

This should probably be done at same time as #53

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Mar 30, 2011

Member

Note that when Purging is enabled, we should review the "Unique Visitors" processing or disable it when logs are not available for the requested date range.

http://piwik.org/faq/how-to/#faq_113

Member

mattab commented Mar 30, 2011

Note that when Purging is enabled, we should review the "Unique Visitors" processing or disable it when logs are not available for the requested date range.

http://piwik.org/faq/how-to/#faq_113

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Apr 23, 2011

Member

Be careful when implementing this, with regards to archives that can be deleted after the fact in another ticket: #2328 - probably, in #2328, we should delete archives for days where the logs have already been deleted

Member

mattab commented Apr 23, 2011

Be careful when implementing this, with regards to archives that can be deleted after the fact in another ticket: #2328 - probably, in #2328, we should delete archives for days where the logs have already been deleted

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Apr 23, 2011

Member

When this is implemented, we should update various FAQs that explain that it is safe to delete archive tables & have them re-processed from logs. It will not be the case, if logs have been deleted, and this should be clearly stated in all FAQs and documentations.

Member

mattab commented Apr 23, 2011

When this is implemented, we should update various FAQs that explain that it is safe to delete archive tables & have them re-processed from logs. It will not be the case, if logs have been deleted, and this should be clearly stated in all FAQs and documentations.

@peterbo

This comment has been minimized.

Show comment
Hide comment
@peterbo

peterbo Jun 3, 2011

Contributor

(In [4856]) PrivacyManager / Delete old statistics from database; Refs #2233, #53, #5

Contributor

peterbo commented Jun 3, 2011

(In [4856]) PrivacyManager / Delete old statistics from database; Refs #2233, #53, #5

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Jun 4, 2011

Member

(In [4861]) Fixes #2233, Refs #5425

  • enable new plugin on upgrade
  • Display message "your changes have been saved"
  • fix link redirect without idSite by using smarty function {url ...}
Member

mattab commented Jun 4, 2011

(In [4861]) Fixes #2233, Refs #5425

  • enable new plugin on upgrade
  • Display message "your changes have been saved"
  • fix link redirect without idSite by using smarty function {url ...}
@peterbo

This comment has been minimized.

Show comment
Hide comment
@peterbo

peterbo Jun 4, 2011

Contributor

(In [4868]) Refs #2233, #53, #5

  • tweaking / optimizing / commenting
Contributor

peterbo commented Jun 4, 2011

(In [4868]) Refs #2233, #53, #5

  • tweaking / optimizing / commenting
@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Jun 5, 2011

Member

Fixed, we can always open another ticket for more advanced "pruning" and to move data to "archived" tables. Great to have this feature in 1.5!!

Member

mattab commented Jun 5, 2011

Fixed, we can always open another ticket for more advanced "pruning" and to move data to "archived" tables. Great to have this feature in 1.5!!

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Jun 6, 2011

Member

I see a potential bug: the task is set to "Not yet rescheduled" after I clicked "Yes" and clicked Save, and after reload the task is not scheduled. Thoughts?

(see attached screenshot)

Member

mattab commented Jun 6, 2011

I see a potential bug: the task is set to "Not yet rescheduled" after I clicked "Yes" and clicked Save, and after reload the task is not scheduled. Thoughts?

(see attached screenshot)

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Jun 6, 2011

Member

Attachment: Delete old logs piwik - Not yet rescheduled task
not yet rescheduler.png

Member

mattab commented Jun 6, 2011

Attachment: Delete old logs piwik - Not yet rescheduled task
not yet rescheduler.png

@peterbo

This comment has been minimized.

Show comment
Hide comment
@peterbo

peterbo Jun 6, 2011

Contributor

(In [4884]) Fixes #5425 - refactored the display "last run" and "next scheduled run" method. - Since the Scheduled Tasks Timetable is not immediately updated when the plugin is enabled, we can not rely on that value. Now the displayed times should be correct and we don't have to compensate the not yet set Schedule-Timetable value with the "not yet rescheduled" phrase.

Contributor

peterbo commented Jun 6, 2011

(In [4884]) Fixes #5425 - refactored the display "last run" and "next scheduled run" method. - Since the Scheduled Tasks Timetable is not immediately updated when the plugin is enabled, we can not rely on that value. Now the displayed times should be correct and we don't have to compensate the not yet set Schedule-Timetable value with the "not yet rescheduled" phrase.

@peterbo

This comment has been minimized.

Show comment
Hide comment
@peterbo

peterbo Jun 6, 2011

Contributor

(In [4886]) Refs #5425 - Make sure, log deletion will not be triggered before calculated "next scheduled deletion" time.

Contributor

peterbo commented Jun 6, 2011

(In [4886]) Refs #5425 - Make sure, log deletion will not be triggered before calculated "next scheduled deletion" time.

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Feb 27, 2012

Member

see follow up ticket to purge log_action #2805

Member

mattab commented Feb 27, 2012

see follow up ticket to purge log_action #2805

@anonymous-piwik-user

This comment has been minimized.

Show comment
Hide comment
@anonymous-piwik-user

anonymous-piwik-user Feb 3, 2013

In [changeset:"3e371089ad99c853d639432e8865958bfd409214/puppetpiwik"]:
Make Git the default VCS for future builds.

refs #5425

In [changeset:"3e371089ad99c853d639432e8865958bfd409214/puppetpiwik"]:
Make Git the default VCS for future builds.

refs #5425

@mattab mattab added this to the 1.x - Piwik 1.x milestone Jul 8, 2014

This issue was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment