New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delegate for query logging #739
Comments
This is a pretty neat idea, I wonder if Profile Devkit could also use it so that queries weren't logged every time for the sake of it. |
In addition, it would also be nice if there is some sort of query grouping. Some back-end operations require 5+ queries to be executed in order to commit them completely. Right now, the CDI extension logs each query as an individual operation and is not aware of any dependencies between queries. If there is an error executing query 4, it will not be able to determine query 3,2 and 1. To prevent database scheme corruption, the CDI extension should be aware of the entire transaction. It could than group the queries and execute them as one. |
Thats a tricky one, I don't think Symphony itself has that much of an idea of whats going on :D |
Yeah, it's a bit of a pain... especially because some of these operations rely on the "mysql_insert_id()" value which is returned by... the last commit. I don't see a solution to this dependency yet, but it should be possible :) The dependency on the auto increment value is a pain anyway when working with multiple environment. I'm currently working on improving the CDI extension with a submodule that checks the auto_increment values of tables on "slave" instances to see if they are compatible with the committed changes. There is a lot to keep in sync when working with multiple environments! |
We should try to do this, beginning with a proof of concept build... |
DelegateI considered adding a delegate some time ago when DB Sync was first created. At the time each Would there be value in multiple delegates, one for SELECT, one for INSERT, one for DELETE etc? I'm not sure, and it might complicate things unnecessarily. Stick with a single GroupingThis is kind of specific to the extensions but I'll address it here. DB Sync attempts to group queries in the SQL file by page load. The first query that's written has the date, author name and originating URL prepended as a comment when persisted to the file. While this isn't sufficient for very granular inspection, it has been sufficient when debugging what actions a user has taken in the backend — much like Craig's Tracker extension (which subscribes to most UI delegates and logs when they occurred, thereby building a log of backend user activity). Since each backend "action" is generally one page load (e.g. saving a section means re-saving all fields) I decided it was safe for a "group" of queries to be all of those executed during a single page lifecycle. If you dig through the commit history of DB Sync, when I used to store the query history in the database rather than a text file, you'll see I used to have two tables: queries and events. An event was, as I've just explained above, a page load. I'd create a new event row (date, author, URL and a unique hash for an ID) and then use this as the foreign key for queries. I decided to store the queries to a text file rather than the database purely because I was getting frustrated by having to fix the I'm not sure what more could be done to add transactional/rollback capability, or adding more query grouping, without rewriting and breaking quite a lot. Profiling
Agreed. Am I right in saying that presently every query is logged by Symphony into an array, regardless of context? This array is used in two places:
The former is only available to sessions logged in to Symphony. The latter is visible to anyone, and has been commented that it should not be the case (we shouldn't be exposing this information to the public!). I suggest that we add an option to the config, |
DelegateI would stick with the single "DatabaseQuery" delegate. If desired you can add optional parameters that specify what kind of query is being executed. GroupingI've fixed the issue with mysql_insert_id interference by executing the registration of the query prior to executing the query itself. This has the additional advantage that the original query is not executed when an error occurs while registering the query to the CDI tables. This ensures the integrity of the CDI log. Your point about grouping the queries based on page load is interesting. I've seen this in the DBSync extension and have actually copied it to the DBSync implementation in CDI. However, how would one go about understanding the action that was performed by the user and see the dependencies between the queries? BTW: I'm still contemplating about the transactional / rollback implementation. The problem lies within the dependency of mysql_insert_id(). There is currently something floating in my head involving the scaffolds extension and symphony object notation. Basically, you would want to communicate "create field X" instead of communicating "Run 20 queries based on these auto_increment values". If you can communicate in a meta language, you can also rollback / revert changes."create field X" can be translated to "delete field x". |
One delegate but we could pass the query type as a parameter to this delegate?
Correct, and on complex sites, it actually is a noticeable overhead. A config setting is one idea, but I'm beginning to get weary of 'adding settings' and starting to think we should just do some of these things by default (
This is exactly the reason why Scaffolds works the way it does. It's just a puppet to pull UI strings at the moment, so it's quite high level and doesn't worry about ID's at all. In terms of your 'meta language', you may actually be able to track the high level changes that are occurring. When a new Field is added to a section, a delegate is fired for pre/post create/edit, so theoretically you could build a listing of these sorts of actions and the reverse 'import' for parsing 'Field was created with (object)'. In fact, I think @czheng's Tracker already does this to build a human readable log of what is happening, so it is potentially an exercise in parsing this log. |
Just noticed there is a |
This is what I used originally, but the problem is there's no way of knowing when a page execution has finished (to log the queries). Symphony sometimes performs an action and then redirects to a new URL. So I had to log the query as it runs, rather than relying on some kind of "page finished" callback, since a redirect could occur at any moment. |
Good point |
So in early testing, this delegate actually negatively affects performance. While the memory usage is lower, the time to generate pages is longer. An apachebench indicates a loss of around 10% (55/rps to 50/rps) compared to not having it (a 2.2.3 install is 52/rps) using the default ensemble. Any thoughts? |
How does this compare to the current implementation of hacking the class.mysql.php file? Did you try that as well? I can imagine that there is a performance hit to log the query execution, compared to the current situation (Symphony 2.2.3). The question is which method performs best, and if it is worth the hit to make it generally available or just stick with the hacking for those who seek this functionality :) |
I cannot say anything regarding performance, but I like to note that I seek this functionality but won't hack the core. |
A 10% deficit doesn't sound good to me. I suppose we could fall back to a config option which enables this delegate on a per environment basis (disable it on live) but that gets a bit messy. I'm sure I considered this when developing DB Sync, but what about a trigger? We'd lose the finer control over processing that we get with PHP (logging the author name, environment variables etc), but it's a more unobtrusive way of achieving the same thing. |
Do you have access to the entire SQL statement within a trigger? I always thought you only have access to the changed/added values, or the ID of an deleted row... |
Oh blegh you're right. |
I didn't try it, I would think the 'hacking' method would be faster as it's bypassing the
I agree.
I'm afraid I don't know too much about the finer details of Triggers, but it looks like something basic could be done. The other thing to keep in mind is that this delegate (with the two above commits) prevents the logging of queries for every page. At the moment the backend queries are logged, even though it's not possible (out of the box) to view this information. The benefit of this 'continuous' logging is that when something goes wrong, we have the query backtrace. I haven't yet implemented the query backtrace into the current code, but I have a sneaky suspicion this will add overhead too. I think the idea has merit and we should probably look into it further |
Another approach would be to save the query and any performance data to a This behavior can be enabled / disabled from configuration for those |
Another point to raise, the delegate current fails is the callback contains database queries (recursion!). I'm guessing this is going to be a likely use case considering you are looking to get more information about the context that the query was fired from. While you could pass a |
This is indeed true, the CDI extension logs the query to database. To prevent recursion, it checks if the query contains the name of the CDI table Another point to make is that you will need a solution for the If you want to provide information of query execution (success/failure, performance data, etc) you will need to add two delegates :) Or... you rewrite the part were you store the |
Yeah its a unique situation as I believe this is the first delegate that doesn't operate at the UI level. |
After you created the branch I will fork it to work on the approach of logging the query meta information (query, performance data, etc) into a Symphony table. I want to try using a trigger and see if this is a valid approach for extensions. If so, this would eliminate the need of a delegate and rework regarding |
Ok... i've been working on the "logquery-db" implementation and i'm running into some issues:
I'm still working on this solution, but for now, the implementation feels like hacking the core. It's not as smooth as core code should be. |
Yeah, but what happens if there is an error while logging the query? That way it would make it into your development database but not in the CDI log, with possible schema corruption on other instances as a result. Ideally, we would have three delegates:
If I read this correctly, your suggestion is to go with the developer responsibility approach. So Symphony will not prevent the posibility of recursion but leave it up to the extension developer to ensure it does not happen. I think this is a valid approach: in most cases, preventing recursion is easy (see current implementation in CDI)
I would not do this because it will allow extension A to prevent delegate execution for extension B. See the first example in my earlier comment. So I'm I right to say that the conclusion of this thread is that we will go with delegate(s) only? |
Fair point. But this feels like over-engineering. Is it a legitimate problem that needs a solution, or a hypothetical one?
Sorry, I didn't fully read your description above. You're right, it could break things if used irresponsibly. But in the same way a UI developer could break something by fiddling with the DOM and forgetting to put some markup back. Looking for the table name (as CDI does, and DBSync used to when it used a database, it'd prepend a If you wanted to use, say,
I think so. The simplest solution:
|
Well... the CDI implementation will block further query execution if it fails to log the query. This is to ensure that the log will always be in sync with the actual database changes. If the log succeeds, but the actual query execution fails, there is rollback functionality to remove the query from the log. This scheme is core functionality for CDI because the database scheme must be the same (especially when considering keeping auto number increments in sync). |
Right... so what to do? |
Shall I wrap this up in a pull request based on the 'logquery-db' branch which can be tested by @brendo on performance? |
Sorry @remie, missed this one. Yeah please do, it's definitely something I'd like to see resolved, if not in this release, a future release, so the more groundwork we get done now the better IMO. As Nick said, the ideal solution will be a single delegate, as announcing delegates seems to incur a cost regardless of if an extension 'uses' them or not. |
Can I add the other delegates as optional based on config settings? That way developers can decide if they accept the performance hit for added functionality.
Is this going to be reviewed? Perhaps a difference delegate implementation will not involve this penalty and would encourge extension developers to add custom delegates for their extension without the fear of degrading performance. |
Hopefully not. I hate configuration, and the less the better. If it a tipping point between turning it on/off, then it should be an extension. Symphony is great because it tends to keep it's configuration small and just go by convention. It's an area I'd like to further improve actually.
Yes, but not in the scope of this release. |
@brendo: this should be it! The changes include your first commit of 'LogQuery' delegate, which I renamed to 'PostQueryExecution'. Do you want me to create a pull request or do you want to test it first? |
Would be a good idea to benchmark this before and after to see what sort of impact this has. If a page runs 200 queries, might an additional 200 function calls decrease performance? By how much, maybe negligible, but I think this deserves some testing. |
Yeah the original benchmarks for this were surprising as it was slowly to announce the delegate than it was to log all queries regardless. I have not had a chance yet to test @remie's changes, but am keen to. I'm very cautious of any performance impact but I understand that some degradation maybe required to allow greater flexibility for extensions. As a sidenote I've been tracking performance of the default ensemble for awhile and am worried at the downward trend of the |
Have finally got around to checking out @remie's work. @remie's implementation is good, it only announces the delegates when the The So the only concern now is that this current implementation will remove the Database Query Log from exception pages. The solution is remove the if/else nature of the commit. This means Symphony will fire the Delegate and maintain the log of all queries in memory. It's what I've committed to my local branch, but I'm not happy with that solution just yet. |
Pushed another commit that just keeps the last 5 queries in Symphony's memory. Any concerns of this approach? |
Personally I never use this stacktrace. Most of the time the error message is clear enough. BTW, perhaps there is already a generic config setting for this? Something like "Debug Mode = True"? |
Likewise, or the majority of the time the last couple of queries tell the story, hence why I went with the last 5. It is possible that an extension would be able to provide full debugging if required with the delegate, so I'm leaning towards we merge as is. |
Ah yes.. this is definitely something that can be achieved by an extension (using the delegate :)) so if someone needs it they can introduce the functionality again. In that case I vote for removing the Database Query Log and do not store the query information in memory. This means better performance for those who don't need it and the ability to add functionality through an extension for those who desire it. |
I agree. I have pushed an updated commit
|
Does this mean it will be included in the 2.3 release? |
Yes it will. Going to slightly change the behaviour to log queries for logged in users only, thus maintaining the backtrace for exceptions without the performance overhead for the public on the frontend. I will merge these changes tonight. |
…delegate and adding variables to context
Changes merged! Thanks for your assistance and time @remie, much appreciated. |
More than happy to have helped out, can't wait to implement it in the CDI extension! |
Would it be possible to add a new delegate that allows extensions like DB Sync or CDI to work without core modifications?
/cc @brendo, @nickdunn, @remie
The text was updated successfully, but these errors were encountered: