SEE NEW TICKET AT: #1999
This one was closed
This ticket is a place holder for all performance related notes/thoughts/feedback.
There are already a number of interesting performance improvements tickets
Here is the list:
Building a regression and performance testing environment for Piwik
Partly described in #134.
Once we have a system to assess performance, we could answer in a specific documentation page a few of the more common questions
Interesting read on scalability & performance
Keywords: performance fast scalability high traffic
getDateStart getDateEnd in *period are not optimized. could somehow be cached.
when the plugins are not used in the piwik.php logging script, don’t load the related files
(that was #19)
Find memory leaks in PHP
If the image processing extension uses emalloc() style allocation, then you can compile php with —enable-debug and you will get a report of all leaks and their location at the end of the script. Note that you must finish with “return” or nothing to get the report, not with “exit”.
But that won’t pick up malloc() leaks. For that though you can use http://www.valgrind.org if you are running under unix.
Type: integer, Default value: 0
When this setting is set to something != 0 Xdebug’s human-readable generated trace files will show the difference in memory usage between function calls. If Xdebug is configured to generate computer-readable trace files then they will always show this information.
One other idea would be to remove the count(distinct idvisitor) in the archiving query. other products like GA don’t give the count of unique for each metric; that could eventually be a setting to decide to count unique or not.
we would still count uniques for the global stat, per month, week, day.
Optimizing Large Databases Using InnoDB Clustered Indexes
> I have set up a git branch at github to help port piwik to postgresql :
Edit : I made a mistake with branch naming.
The github is here : http://github.com/klando/pgpiwik/tree/master
Just do that to grab it : git clone git://github.com/klando/pgpiwik.git
see also #620: Piwik should use autoload to automatically load all classes intead of using require_once
plandem, see the thread on piwik-hackers for some thinking around alternative nosql databases in piwik: http://lists.piwik.org/pipermail/piwik-hackers/2010-February/000829.html
There's also an interesting FAQ/blog post re: Infinidb's column-oriented storage engine for MySQL vs "NoSQL":
Infinidb sounds like something we should definitely investigate first, as it might be much (much) easier to use with the current Piwik architecture. Are there limitations when "dropping it" in instead of mysql?
There's now a migration guide for InfiniDB. The relevant section starts at page 17.
The only limitation to the open source, community edition is the limit to a single machine (not CPUs, RAM, or concurrent users). Theoretically, you can build a fairly powerful box (think: multi-core, multi-processor boards) before you have to think about adding nodes (and license fees for the enterprise edition).
It looks like you have a good handle on where to start looking at InfiniDB. I took a quick look at your schema and don't see any fundamental problem with the log_visit fact table or the queries that reference it above. We do not (yet) support blob for your archive tables. The only other quick note I would add is that we aren't optimal for web/oltp style loads. However you could easily select * into outfile from existing schema, load data infile into InfiniDB to get good load rates, or use select into outfile, + cpimport (our bulk load) to get excellent load rates. 100k or more rows/second possible but will vary significantly based on disk and table definition.
One additional note, our current parallization distributes ranges of 8 million rows to each thread, so smaller tables won't show the same benefits from many cores as larger tables.
Anyway, currently signed up to follow this discussion, let me know if you have any questions or comments. Thanks - Jim Tommaney
See percona paper about Goal driven performance optimization: http://www.percona.com/files/white-papers/goal-driven-performance-optimization.pdf
we will tackle critical issues (#409, probably #1077), and postpone others to post 1.0
It might be good to look into storing json encoded data tables rather than serialized php tables. This would improve portability. See http://stackoverflow.com/questions/1306740/json-vs-serialized-array-in-database as reference. Speed of json decoding large arrays VS unserialize should be tested.
I created a summary ticket from this one, as this ticket became unclear. See #1999