Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add delegate for query logging #739

Closed
nilshoerrmann opened this issue Aug 12, 2011 · 67 comments
Closed

Add delegate for query logging #739

nilshoerrmann opened this issue Aug 12, 2011 · 67 comments

Comments

@nilshoerrmann
Copy link
Contributor

Would it be possible to add a new delegate that allows extensions like DB Sync or CDI to work without core modifications?

/cc @brendo, @nickdunn, @remie

@brendo
Copy link
Member

brendo commented Aug 12, 2011

This is a pretty neat idea, I wonder if Profile Devkit could also use it so that queries weren't logged every time for the sake of it.
Of course some sort of rolling log would need to be kept at all times (perhaps last 5) in case of an expection, but I think its worth investigating, you'd be suprised just how much overhead the logging has at the moment.

@remie
Copy link
Contributor

remie commented Aug 12, 2011

In addition, it would also be nice if there is some sort of query grouping. Some back-end operations require 5+ queries to be executed in order to commit them completely. Right now, the CDI extension logs each query as an individual operation and is not aware of any dependencies between queries. If there is an error executing query 4, it will not be able to determine query 3,2 and 1.

To prevent database scheme corruption, the CDI extension should be aware of the entire transaction. It could than group the queries and execute them as one.

@brendo
Copy link
Member

brendo commented Aug 12, 2011

Thats a tricky one, I don't think Symphony itself has that much of an idea of whats going on :D
It would be a nice enhancement to use MySQL transactions though to start and end operations like that

@remie
Copy link
Contributor

remie commented Aug 12, 2011

Yeah, it's a bit of a pain... especially because some of these operations rely on the "mysql_insert_id()" value which is returned by... the last commit. I don't see a solution to this dependency yet, but it should be possible :)

The dependency on the auto increment value is a pain anyway when working with multiple environment. I'm currently working on improving the CDI extension with a submodule that checks the auto_increment values of tables on "slave" instances to see if they are compatible with the committed changes.

There is a lot to keep in sync when working with multiple environments!

@cz
Copy link
Contributor

cz commented Aug 15, 2011

We should try to do this, beginning with a proof of concept build...

@nickdunn
Copy link
Contributor

Delegate

I considered adding a delegate some time ago when DB Sync was first created. At the time each notifyMembers() call (which fires a delegate) would look up in the database to see whether any extensions were subscribing to it. That meant a doubling in the number of queries executed! However since something like Symphony 2.0.7, delegate subscriptions are read from the table once and stored in an array. So provided the overhead of adding a new delegate is low, I'm all for it.

Would there be value in multiple delegates, one for SELECT, one for INSERT, one for DELETE etc? I'm not sure, and it might complicate things unnecessarily. Stick with a single DatabaseQuery delegate?

Grouping

This is kind of specific to the extensions but I'll address it here. DB Sync attempts to group queries in the SQL file by page load. The first query that's written has the date, author name and originating URL prepended as a comment when persisted to the file. While this isn't sufficient for very granular inspection, it has been sufficient when debugging what actions a user has taken in the backend — much like Craig's Tracker extension (which subscribes to most UI delegates and logs when they occurred, thereby building a log of backend user activity).

Since each backend "action" is generally one page load (e.g. saving a section means re-saving all fields) I decided it was safe for a "group" of queries to be all of those executed during a single page lifecycle.

If you dig through the commit history of DB Sync, when I used to store the query history in the database rather than a text file, you'll see I used to have two tables: queries and events. An event was, as I've just explained above, a page load. I'd create a new event row (date, author, URL and a unique hash for an ID) and then use this as the foreign key for queries.

I decided to store the queries to a text file rather than the database purely because I was getting frustrated by having to fix the mysql_insert_id problems, since inserting the row to log the query was falsely incrementing the result Symphony expected. If you've been able to solve that in CDI then that's great :-) It also solves the problem that flat file introduced, in that two people could not work on local web servers while their Symphony builds connect to a shared database, since they'd each have a SQL file created locally. That was a major downside of moving to flat .sql files.

I'm not sure what more could be done to add transactional/rollback capability, or adding more query grouping, without rewriting and breaking quite a lot.

Profiling

This is a pretty neat idea, I wonder if Profile Devkit could also use it so that queries weren't logged every time for the sake of it. Of course some sort of rolling log would need to be kept at all times (perhaps last 5) in case of an expection, but I think its worth investigating, you'd be suprised just how much overhead the logging has at the moment.

Agreed. Am I right in saying that presently every query is logged by Symphony into an array, regardless of context? This array is used in two places:

  • in the Profile Devkit (?profile) to inspect all and slow queries
  • when an exception occurs and the backtrace is rendered

The former is only available to sessions logged in to Symphony. The latter is visible to anyone, and has been commented that it should not be the case (we shouldn't be exposing this information to the public!). I suggest that we add an option to the config, profile_queries perhaps, which is no by default. When disabled, queries will only be profiled for users logged in to Symphony (author or developer), so that if an exception occurs they see the list, and can happily view the ?profile data, but this logging into an array doesn't occur for anyone else. If yes, the existing behaviour of logging for all is used.

@remie
Copy link
Contributor

remie commented Aug 16, 2011

Delegate

I would stick with the single "DatabaseQuery" delegate. If desired you can add optional parameters that specify what kind of query is being executed.

Grouping

I've fixed the issue with mysql_insert_id interference by executing the registration of the query prior to executing the query itself. This has the additional advantage that the original query is not executed when an error occurs while registering the query to the CDI tables. This ensures the integrity of the CDI log.

Your point about grouping the queries based on page load is interesting. I've seen this in the DBSync extension and have actually copied it to the DBSync implementation in CDI. However, how would one go about understanding the action that was performed by the user and see the dependencies between the queries?

BTW: I'm still contemplating about the transactional / rollback implementation. The problem lies within the dependency of mysql_insert_id().

There is currently something floating in my head involving the scaffolds extension and symphony object notation. Basically, you would want to communicate "create field X" instead of communicating "Run 20 queries based on these auto_increment values". If you can communicate in a meta language, you can also rollback / revert changes."create field X" can be translated to "delete field x".

@brendo
Copy link
Member

brendo commented Aug 16, 2011

Would there be value in multiple delegates, one for SELECT, one for INSERT, one for DELETE etc? I'm not sure, and it might complicate things unnecessarily. Stick with a single DatabaseQuery delegate?

One delegate but we could pass the query type as a parameter to this delegate?

Agreed. Am I right in saying that presently every query is logged by Symphony into an array, regardless of context?

Correct, and on complex sites, it actually is a noticeable overhead. A config setting is one idea, but I'm beginning to get weary of 'adding settings' and starting to think we should just do some of these things by default (strict_error_reporting is another culprit here). That said, it's potentially a quick solution to the problem!

There is currently something floating in my head involving the scaffolds extension and symphony object notation. Basically, you would want to communicate "create field X" instead of communicating "Run 20 queries based on these auto_increment values". If you can communicate in a meta language, you can also rollback / revert changes."create field X" can be translated to "delete field x".

This is exactly the reason why Scaffolds works the way it does. It's just a puppet to pull UI strings at the moment, so it's quite high level and doesn't worry about ID's at all.

In terms of your 'meta language', you may actually be able to track the high level changes that are occurring. When a new Field is added to a section, a delegate is fired for pre/post create/edit, so theoretically you could build a listing of these sorts of actions and the reverse 'import' for parsing 'Field was created with (object)'. In fact, I think @czheng's Tracker already does this to build a human readable log of what is happening, so it is potentially an exercise in parsing this log.

@brendo
Copy link
Member

brendo commented Aug 18, 2011

Just noticed there is a MySQL->debug() method which returns all queries that have run and how long they took to run for a single page load, would that prove useful for CDI/DB Sync to use instead of logging each individual query?

@nickdunn
Copy link
Contributor

This is what I used originally, but the problem is there's no way of knowing when a page execution has finished (to log the queries). Symphony sometimes performs an action and then redirects to a new URL. So I had to log the query as it runs, rather than relying on some kind of "page finished" callback, since a redirect could occur at any moment.

@brendo
Copy link
Member

brendo commented Aug 18, 2011

Good point

@brendo
Copy link
Member

brendo commented Aug 23, 2011

So in early testing, this delegate actually negatively affects performance. While the memory usage is lower, the time to generate pages is longer. An apachebench indicates a loss of around 10% (55/rps to 50/rps) compared to not having it (a 2.2.3 install is 52/rps) using the default ensemble.

Any thoughts?

@remie
Copy link
Contributor

remie commented Aug 23, 2011

How does this compare to the current implementation of hacking the class.mysql.php file? Did you try that as well? I can imagine that there is a performance hit to log the query execution, compared to the current situation (Symphony 2.2.3). The question is which method performs best, and if it is worth the hit to make it generally available or just stick with the hacking for those who seek this functionality :)

@nilshoerrmann
Copy link
Contributor Author

… if it is worth the hit to make it generally available or just stick with the hacking for those who seek this functionality :)

I cannot say anything regarding performance, but I like to note that I seek this functionality but won't hack the core.

@nickdunn
Copy link
Contributor

A 10% deficit doesn't sound good to me. I suppose we could fall back to a config option which enables this delegate on a per environment basis (disable it on live) but that gets a bit messy.

I'm sure I considered this when developing DB Sync, but what about a trigger? We'd lose the finer control over processing that we get with PHP (logging the author name, environment variables etc), but it's a more unobtrusive way of achieving the same thing.

@remie
Copy link
Contributor

remie commented Aug 23, 2011

Do you have access to the entire SQL statement within a trigger? I always thought you only have access to the changed/added values, or the ID of an deleted row...

@nickdunn
Copy link
Contributor

Oh blegh you're right.

@brendo
Copy link
Member

brendo commented Aug 23, 2011

How does this compare to the current implementation of hacking the class.mysql.php file? Did you try that as well? I can imagine that there is a performance hit to log the query execution, compared to the current situation (Symphony 2.2.3)

I didn't try it, I would think the 'hacking' method would be faster as it's bypassing the notifyMembers logic of finding what extensions have subscribed to this delegate and then proceeding to run their callbacks.

I cannot say anything regarding performance, but I like to note that I seek this functionality but won't hack the core.

I agree.

I'm sure I considered this when developing DB Sync, but what about a trigger?

I'm afraid I don't know too much about the finer details of Triggers, but it looks like something basic could be done.

The other thing to keep in mind is that this delegate (with the two above commits) prevents the logging of queries for every page. At the moment the backend queries are logged, even though it's not possible (out of the box) to view this information. The benefit of this 'continuous' logging is that when something goes wrong, we have the query backtrace. I haven't yet implemented the query backtrace into the current code, but I have a sneaky suspicion this will add overhead too.

I think the idea has merit and we should probably look into it further

@remie
Copy link
Contributor

remie commented Aug 23, 2011

Another approach would be to save the query and any performance data to a
symphony database table. The extension could then get away with adding a
trigger to that table.

This behavior can be enabled / disabled from configuration for those
instances that do not wish to save this meta information.

@brendo
Copy link
Member

brendo commented Aug 23, 2011

Another point to raise, the delegate current fails is the callback contains database queries (recursion!).

I'm guessing this is going to be a likely use case considering you are looking to get more information about the context that the query was fired from.

While you could pass a $log = false parameter to the query() function, it's a bit ugly, and then wouldn't be a true indication of the queries executed. Back to the old drawing board ;)

@remie
Copy link
Contributor

remie commented Aug 24, 2011

Another point to raise, the delegate current fails is the callback contains database queries (recursion!).

I'm guessing this is going to be a likely use case considering you are looking to get more information about the context that >the query was fired from.

This is indeed true, the CDI extension logs the query to database. To prevent recursion, it checks if the query contains the name of the CDI table tbl_cdi_log. If so, it does not log those queries, stopping the recursion.

Another point to make is that you will need a solution for the mysql_insert_id() return value. If you call the delegate after the query execution, and the delegate function also includes database operations (like CDI), this would interfere with the value of the last inserted id. So you will need the raise the event prior to executing the query.

If you want to provide information of query execution (success/failure, performance data, etc) you will need to add two delegates :)

Or... you rewrite the part were you store the mysql_insert_id() value in a static property and add it to the event information. That would also mean that you need to refactor other parts of the Symphony core were there is a dependency on the last inserted id (for instance, when creating a section with fields). It should register a delegate function as well.

@brendo
Copy link
Member

brendo commented Aug 24, 2011

Yeah its a unique situation as I believe this is the first delegate that doesn't operate at the UI level.
This evening I'm going to pull this functionality out into its own branch so we can try some different solutions while allowing integration work to continue (something I really should of done in the first place! doh)
Thats a really good point about the insert ID, thanks for raising it!

@remie
Copy link
Contributor

remie commented Aug 24, 2011

After you created the branch I will fork it to work on the approach of logging the query meta information (query, performance data, etc) into a Symphony table. I want to try using a trigger and see if this is a valid approach for extensions. If so, this would eliminate the need of a delegate and rework regarding mysql_insert_id().

@brendo
Copy link
Member

brendo commented Aug 24, 2011

@remie, the branch is available here. Let us know how you get on :)

@remie
Copy link
Contributor

remie commented Aug 30, 2011

Ok... i've been working on the "logquery-db" implementation and i'm running into some issues:

  • The install.sql file is obviously executed using class.mysql.php. This is a problem because the creation of the tbl_log_query is created... in the install.sql. So I needed a way to skip logging of the initial installation statements, which is done by checking if the Administration class exists. That is a poor workaround. Not only does it limit to logging backend execution (which might be a good thing), it is a strange place to create a dependency on Symphony logic in a class that appears to be rather abstract.
  • The dependency on the class.mysql.php goes beyond the install.sql script: the extensions will need to create triggers through this class. In addition, the extension will need to execute queries to retrieve the log files. I can fix this by adding a check to see if there is a read / write operation to the tbl_log_queries and prevent this from entering the log. This is exactly what I'm doing now in the CDI extension. However, it feels weird to add this exception at this point in code.

I'm still working on this solution, but for now, the implementation feels like hacking the core. It's not as smooth as core code should be.

@remie
Copy link
Contributor

remie commented Sep 21, 2011

With regards to the mysql_insert_id, I like the idea of executing the logging query before the actual query. However if a problem occur with the second (actual) query, won't it already have been logged? Shouldn't the query only be logged if it was executed correctly?

Yeah, but what happens if there is an error while logging the query? That way it would make it into your development database but not in the CDI log, with possible schema corruption on other instances as a result. Ideally, we would have three delegates: PreQueryExection, PostQueryExecution, QueryExectionError. The first can be dropped for performance reason if we have QueryExectionError delegate. The extension can add rollback functionality in the delegate handler in case of error.

And then we document the LogQuery delegate with a warning explaining what you need to achieve. Developers even contemplating this delegate are going to be technically savvy and so will understand the cause and effect if it is documented.

If I read this correctly, your suggestion is to go with the developer responsibility approach. So Symphony will not prevent the posibility of recursion but leave it up to the extension developer to ensure it does not happen.

I think this is a valid approach: in most cases, preventing recursion is easy (see current implementation in CDI)

The MySQL class is static, so could there be a static accessor to disable further logging?

I would not do this because it will allow extension A to prevent delegate execution for extension B. See the first example in my earlier comment.

So I'm I right to say that the conclusion of this thread is that we will go with delegate(s) only?

@nickdunn
Copy link
Contributor

[Remie] Ideally, we would have three delegates: PreQueryExection, PostQueryExecution, QueryExectionError. The first can be dropped for performance reason if we have QueryExectionError delegate. The extension can add rollback functionality in the delegate handler in case of error

Fair point. But this feels like over-engineering. Is it a legitimate problem that needs a solution, or a hypothetical one?

[Nick] The MySQL class is static, so could there be a static accessor to disable further logging?

[Remie] I would not do this because it will allow extension A to prevent delegate execution for extension B

Sorry, I didn't fully read your description above. You're right, it could break things if used irresponsibly. But in the same way a UI developer could break something by fiddling with the DOM and forgetting to put some markup back. Looking for the table name (as CDI does, and DBSync used to when it used a database, it'd prepend a -- DBSync comment to each SQL line to prevent logging) to prevent recursion will indeed work in most cases, but only for queries that the extension can control.

If you wanted to use, say, EntryManager::fetch(...) to get some entries in your callback, then you're executing Symphony queries that you can't filter out. Quite why you'd want to do this I don't know, and to be honest we can likely discount this mega edge use case until someone actually needs it.

[Remie] So I'm I right to say that the conclusion of this thread is that we will go with delegate(s) only?

I think so. The simplest solution:

  • single delegate
  • the implementing extension checks for recursion in whatever way it sees fit, such as CDI looking for the table name in the query string

@remie
Copy link
Contributor

remie commented Sep 21, 2011

Is it a legitimate problem that needs a solution, or a hypothetical one?

Well... the CDI implementation will block further query execution if it fails to log the query. This is to ensure that the log will always be in sync with the actual database changes. If the log succeeds, but the actual query execution fails, there is rollback functionality to remove the query from the log.

This scheme is core functionality for CDI because the database scheme must be the same (especially when considering keeping auto number increments in sync).

@nickdunn
Copy link
Contributor

nickdunn commented Oct 7, 2011

Right... so what to do?

@remie
Copy link
Contributor

remie commented Oct 7, 2011

Shall I wrap this up in a pull request based on the 'logquery-db' branch which can be tested by @brendo on performance?

@brendo
Copy link
Member

brendo commented Oct 9, 2011

Sorry @remie, missed this one. Yeah please do, it's definitely something I'd like to see resolved, if not in this release, a future release, so the more groundwork we get done now the better IMO.

As Nick said, the ideal solution will be a single delegate, as announcing delegates seems to incur a cost regardless of if an extension 'uses' them or not.

@remie
Copy link
Contributor

remie commented Oct 10, 2011

Can I add the other delegates as optional based on config settings? That way developers can decide if they accept the performance hit for added functionality.

delegates seems to incur a cost regardless of if an extension 'uses' them or not.

Is this going to be reviewed? Perhaps a difference delegate implementation will not involve this penalty and would encourge extension developers to add custom delegates for their extension without the fear of degrading performance.

@brendo
Copy link
Member

brendo commented Oct 11, 2011

Can I add the other delegates as optional based on config settings?

Hopefully not. I hate configuration, and the less the better. If it a tipping point between turning it on/off, then it should be an extension. Symphony is great because it tends to keep it's configuration small and just go by convention. It's an area I'd like to further improve actually.

Is this going to be reviewed?

Yes, but not in the scope of this release.

@remie
Copy link
Contributor

remie commented Oct 20, 2011

@brendo: this should be it!

The changes include your first commit of 'LogQuery' delegate, which I renamed to 'PostQueryExecution'.
However, because this delegate will never be fired if an MySQL error occurs, i've also created the 'QueryExecutionError' delegate. This should be sufficient for extension developers and should not cause any performance problems because only one of the delegates will fire.

Do you want me to create a pull request or do you want to test it first?

@nickdunn
Copy link
Contributor

nickdunn commented Dec 2, 2011

Would be a good idea to benchmark this before and after to see what sort of impact this has. If a page runs 200 queries, might an additional 200 function calls decrease performance? By how much, maybe negligible, but I think this deserves some testing.

@brendo
Copy link
Member

brendo commented Dec 2, 2011

Yeah the original benchmarks for this were surprising as it was slowly to announce the delegate than it was to log all queries regardless. I have not had a chance yet to test @remie's changes, but am keen to.

I'm very cautious of any performance impact but I understand that some degradation maybe required to allow greater flexibility for extensions. As a sidenote I've been tracking performance of the default ensemble for awhile and am worried at the downward trend of the 2.2.x branch. I desperately want to correct this in the 2.3.x branch and push the trend in an upward direction.

@brendo
Copy link
Member

brendo commented Dec 10, 2011

Have finally got around to checking out @remie's work.

@remie's implementation is good, it only announces the delegates when the ExtensionManager exists and if it exists, Symphony no longer keeps track of them. By doing this, performance is slightly increased (as we aren't storing an array of queries in memory), but we lose the Database Query Log when an Exception occurs.

The ?profile stats will be skewed once the Profiler is updated for the delegate as the query log will be stored in memory only ?profile is activated. Looking at apachebench times will give a more true indication of the performance impact (which is actually positive in my testing!)

So the only concern now is that this current implementation will remove the Database Query Log from exception pages. The solution is remove the if/else nature of the commit. This means Symphony will fire the Delegate and maintain the log of all queries in memory. It's what I've committed to my local branch, but I'm not happy with that solution just yet.

@brendo
Copy link
Member

brendo commented Dec 10, 2011

Pushed another commit that just keeps the last 5 queries in Symphony's memory. Any concerns of this approach?

@remie
Copy link
Contributor

remie commented Dec 13, 2011

So the only concern now is that this current implementation will remove the Database Query Log from exception pages

Personally I never use this stacktrace. Most of the time the error message is clear enough.
I know you hate it, but I would suggest that we make this debug option configurable: do not show the Database Query Log in the exception pages by default. If a certain config setting is set, the application can start logging the queries and display them in case of error.

BTW, perhaps there is already a generic config setting for this? Something like "Debug Mode = True"?

@brendo
Copy link
Member

brendo commented Dec 14, 2011

Personally I never use this stacktrace. Most of the time the error message is clear enough.

Likewise, or the majority of the time the last couple of queries tell the story, hence why I went with the last 5.

It is possible that an extension would be able to provide full debugging if required with the delegate, so I'm leaning towards we merge as is.

@remie
Copy link
Contributor

remie commented Dec 15, 2011

Ah yes.. this is definitely something that can be achieved by an extension (using the delegate :)) so if someone needs it they can introduce the functionality again. In that case I vote for removing the Database Query Log and do not store the query information in memory. This means better performance for those who don't need it and the ability to add functionality through an extension for those who desire it.

@brendo
Copy link
Member

brendo commented Dec 16, 2011

I agree. I have pushed an updated commit

  • Removed MySQL::getLastError(), this is not needed an Exception is thrown when a database error occurs which contains the same information
  • Updated the /install and GenericErrorHandler classes to reflect the above change
  • Removed Symphony maintaining a record of database queries. The only queries logged now are those that occur when the ExtensionManager is not available.

@remie
Copy link
Contributor

remie commented Dec 16, 2011

Does this mean it will be included in the 2.3 release?
If so I will start making changes to the CDI extension.

@brendo
Copy link
Member

brendo commented Dec 16, 2011

Does this mean it will be included in the 2.3 release?

Yes it will.

Going to slightly change the behaviour to log queries for logged in users only, thus maintaining the backtrace for exceptions without the performance overhead for the public on the frontend.

I will merge these changes tonight.

@brendo
Copy link
Member

brendo commented Dec 16, 2011

Changes merged! Thanks for your assistance and time @remie, much appreciated.

@brendo brendo closed this as completed Dec 16, 2011
@remie
Copy link
Contributor

remie commented Dec 16, 2011

More than happy to have helped out, can't wait to implement it in the CDI extension!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants