-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archiving Memory exhausted error - Archiving memory leaks #766
Comments
In php 5.3 with zend.enable_gc, the garbage collector only runs when the root buffer is full (default max is 10000 entries). We need to know if a collection cycle is forced when the memory limit is reached, or whether we should manully call gc_collect_cycles(). |
has there been any resolution to this? I just wanted to play with piwik on my site, but it's waaaaaay too busy for piwik apparently. We're at 80k unique visitors a day or thereabouts...i set php.ini to 1GB, but no love. help? |
i changed the cli php.ini setting to 2Gb and it appears to be working now, but that's really not a great idea...i fear that the server is going to lock up at some point due to memory usage... |
jawrat, maybe check that you have not thousands of unique URLs to store with unique session IDs, etc. you can exclude URL parameters in the settings. Otherwise, this bug still exists, but very few people need to increase beyond 1Gb |
well, it's been moving along now for the last 8 weeks or so since I switched it to 2Gb and hasn't choked, so for my situation it appears that 2Gb works. the issue is that there are several thousand unique URLs to store...at any given time, we might have 50k listings active on the site...it's really nice to know which ones are the most active. btw, thanks for all the effort you guys have put into this...it's pretty awesome (memory issues aside) |
+1 I totally agree with jawrat, both on the issue of you guys doing a great job, but also on the issue that 2GB is too big a memorylimit. We would really like to run this locally, but having 1,5M+ pageviews.... Hope to see this cleared soon, even before 1.0 ;) But hey! Keep up the good work! The description of this issue is right to the point, very clear! |
Do you have issues with 1.5M pageviews? There are piwik users running that much traffic with 1G memory limit, but maybe your data set is different (eg. more unique URLs to record) ? |
Replying to matt:
Is there a query I can run on the db that will tell me how many uniques there are? i'm sure it's quite a few, but i have no idea how many. matt, contact me offline if you want to poke around...jawrat AT gmail DAHT com |
I have the problem with 2M+ now, maybe a big dataset, so yes, as jawrat suggested: How do we see this? ...and thank you for the priority-upgrade! |
okay, so I finally had to shut piwik off as it was completely killing the server...mysql was complaining about too many connections (with the max_connections set to 500)...and the dashboard just times out trying to get the data. I wish I had time to dig into the code and help out here, but i'm up to my ass in alligators with a .net codebase that was dumped on me with little to no documentation... shoot me a heads up matt if you want more info or would like to discuss...we did see a large 20% increase in traffic over the last week or so, so I think that may have been what knocked it over... |
Could we implement an alternate archiving algorithm for resource-constrained environments? (eg use temporary tables in mysql to collect results?) There's a recent forum post from a hostgator user who reports his shared hosting account has max memory limit set at 8M. |
Php+website optimizations: http://ilia.ws/files/zend_performance.pdf |
see also a report in #1777 |
jawrat, can you confirm that you setup the cron archiving? http://piwik.org/docs/setup-auto-archiving/ What kind of traffic did you receive on the day it broke? |
matt, for sure it was set up to auto archive with a crontab. as far as traffic, we'd just seen a general rise in traffic of about 20% over the course of a week or two. 85k visits, 535k pageviews... |
jawrat, did you have the full backtrace of the error? |
unfortunately no, the server fell over when it ran during peak traffic periods. :( |
To debug this issue, first it is necessary to upgrade to 5.3 which has better memory management features. Then maybe the FB xhprof could be useful: http://mirror.facebook.net/facebook/xhprof/doc.html |
(In [3670]) Fix some memory leak, Thanks to XHProf refs search for top level archiving functions (archiveDay) and sort by inc memory use DESC - more than zero inc memory means mem was leaked sometimes in the call which is a priori not expected refs #766 |
If we can't fix it completely, at least we should document known limits for memory_limits, and publish examples of large scale setups. |
improvement patch posted upstream for zend_db http://framework.zend.com/issues/browse/ZF-10930 - if not fixed in zend we can hack it around in piwik_db |
Looks like the patch was rejected upstream. Because of the inheritance tree, it'll have to be implemented in the subclassed adapters. Something like this?
|
yes patch rejected, since we should reuse the statement objects... which makes sense. vipsoft, something like this but I think we have to copy paste it in all adapters which is a shame, but saves us a boring refactoring ;) |
ok your patch is probably right actually, I forgot to look at the zend code again. Looks good to me! |
Do we recommend the latest php version in the FAQ? php 5.3.3 changelog has this entry: |
OK I updated doc at: http://piwik.org/docs/requirements/ |
(In [3716]) Refs #766 - performance optimization, caching prepared statements in mysqli and PDO/mysql |
(In [3717]) refs #766 - revert optimization from Mysqli.php (spurious unit test failures and exceptions thrown); tweak code in Pdo/Mysql.php to workaround segfault (now reported as PHP Bug 53716) |
Opened ZF-10943 to track down the cause of the failure in Mysqli.php. |
Thanks for the report. 20,000 unique Ids, do you mean 20k unique URLs? If so, we have an idea on how to solve this problem nicely, see the proposal in #766 Finally, if you have some time and are willing to help, it would be very interesting to install the PHP profile XHProf from facebook on your server and give me access to it, it might help finding out other problems and fixes that could help your case. Thanks Dupes #766 |
I got confused here, meant to comment in #3066 |
Replying to matt:
Sorry, but we are an enterprise datacenter, so there is no way to provide 3rd party developers access to the system. But we have asked our customer, if we could forward some example logfiles with fake-ip's. |
No success at the customer front. We could not provide some sample data. The workaround in the issue description is not usable for us, as we really want to replace awstats and need the full data, not only the TOP X. tbe |
We're also having major memory issues with Piwik, we're unable to run an archive process at all; I feel like I've been pretty thorough in investigating most of the suggestions but we still don't have a workable solution. We have implemented the Piwik front end on a medium amazon ec2 instance with nginx in place of apache for performance, and the database is on a large amazon rds. We can see the real time data coming in correctly, so we know there is data being collated. Our site is in online publishing, we have around 350,000 unique url's, and we're looking at 20-25M page views / month. We have a requirement for accessing detailed analytics data from our system via api's, so every unique url's page views are important, but for reporting through piwik that is far less important to us, so your 3rd proposal above could suit us (as i said, provided the raw page views for all urls were still available via api). We are in the evaluation stages of finding the right analytics fit for our needs, so we are open to experimenting and losing or tainting the current piwik data while we work out a method to make it work. We are also open to the idea above of installing XHProf if that would help give insight into causes and possible solutions. I'm going to make a forum post as well in case someone in the community can provide some direction for us. Thanks, |
@Rafe and others, if you hit a wall with Piwik and are stuck, please consider contacting Piwik Professional Services. We help Power Users run Piwik on 1M per day and more requests, on dedicated servers. PLease get in touch: http://piwik.org/consulting/ |
I am going to build a class that allows "3) Process aggregation in MySQL" from the ticket description. I created a separate ticket #3330 where I will commit code and where the implementation can be discussed. |
(In [6966]) Refs #3330, #766 add integration test for blob limiting behavior present in certain plugins. Also, fix three bugs:
|
Research Work around this is finished, we have identified the two main tasks remaining to fix the archiving memory errors:
If you can sponsor these optimizations improvements, or need any other tuning done, please contact us: http://piwik.org/consulting/ |
Users have memory exhausted errors when running Piwik archiving. There are two distinct use cases:
The error looks like:
-> We highly suggest users experiencing issues to upgrade to PHP 5.3 which has some improvements around memory management.
Ideas to solve this issue (less time consuming first)?
1) Monthly archiving should use the smaller amount of weeks and days inside the month
Currently, archiving a month in Piwik will sum the reports for all 30 days and sum all reports together. This is causing most of the php memory issues which start failing when summing all datatables for Pages URLs for a month for example.
Instead, we should reuse the algorithm used to process date ranges, which select the minimum number of periods to use to process the enclosing period. So, a month would typically use 3 or 4 full weeks + the remaining ~5-10 days. This would result in up to 70% less datatables to select and sum!
ie a month archiving would select 4 weeks archiving + 3 day archiving, or 3 weeks archiving + 10 days archiving, so either 7 or 13 archives, instead of 30...
2) Fix remaining Memory leaks
3) Process aggregation in MySQL
Currently, to process "Top Page URLs" report, we select ALL URLs, then truncate to 500 URLs (see datatable_archiving_maximum_rows_actions in global.ini.php)
Loading for example 50,000 URLs in memory in PHP results in obvious memory crashes. The solution here would be to process the aggregate of the top 500 URLs (including the last "Others" which aggregated all other "truncated" rows).
Example query that will select the sum of visits per hour, then truncate in Mysql to the first 15 rows, aggregating all rows from 16-23 in the label "Others". Imagine doing the same thing for Page URLs and only returning the top 1000 pages instead of all pages!
Here I only sum the number of visits, but we could automatically rewrite all aggregate queries in ArchiveProcessing/Day.php using this mechanism?
Output:
Obviously truncating the "Hourly" report is a bad example, but replace it with Page URLs to get the picture :)
This will save a lot of bandwith and CPU processing, especially for websites that have more than 500 unique Page URLs per day or external keywords, etc.
The text was updated successfully, but these errors were encountered: