WIP: (PDB-950) Catch and express postgresql regexp failures as better errors#1142
WIP: (PDB-950) Catch and express postgresql regexp failures as better errors#1142nikitiuk0 wants to merge 2 commits intopuppetlabs:masterfrom
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Hmm interesting.
The general concern I have with this facility, is that it enforces Java regular expressions, which isn't POSIX regexp which is ultimately what we are exposing here: http://www.postgresql.org/docs/9.0/static/functions-matching.html#FUNCTIONS-POSIX-TABLE. Although in practice I'm sure this catches a lot of the problems, what if java regex doesn't support something PG does for example. Also it will allow through Java regexp forms, even though the engine will throw an error.
While HSQLDB supports Java regexp, we aren't as concerned about that database platform.
There was a problem hiding this comment.
I was thinking about it too, but HSQLDB supports Java regexp rules. Do we support 2 different regexp syntaxes depending on which DB is used?
Just found a difference between Java and POSIX regexp syntax in character classes: http://www.regular-expressions.info/posixbrackets.html
But overall they look similar to each other.
I think I can create 2 different validators: for HSQLDB and for PG. However, I am not sure if it is possible to validate PG(POSIX) regexp without executing DB query (which can probably affect the performance). Do you want me to do this?
There was a problem hiding this comment.
I was thinking about it too, but HSQLDB supports Java regexp rules. Do we support 2 different regexp syntaxes depending on which DB is used?
Yes we do, for good or bad. And what's more, HSQLDB is not the major case we support ... PG is. We've been reducing support for HSQLDB if anything.
Just found a difference between Java and POSIX regexp syntax in character classes: http://www.regular-expressions.info/posixbrackets.html
But overall they look similar to each other.
We'd need to dig in deep to ensure this at least, or I'd be very very careful at least to say that its safe to make a 1 to 1 mapping. I'm wary basically :-).
I think I can create 2 different validators: for HSQLDB and for PG. However, I am not sure if it is possible to validate PG(POSIX) regexp without executing DB query (which can probably affect the performance). Do you want me to do this?
Well, its okay to validate this way, but you might as well do the full query at the same time and instead throw a more meaningful error when it fails. This is not validation up front really, having said that I wouldn't expect a query to pause if a regexp is incorrect, it should throw an error quite quickly. Maybe this is just overall a more 'correct' approach? The JDBC driver should be expressing a reasonably correct regular expression error. The problem stems from the fact that its then hard to tell the user which query in the context of our AST style querying language that actually failed, because once its SQL its hard to track it back to the original part.
Overall this is an interesting problem.
There was a problem hiding this comment.
How about this, why not go ahead and add the re-pattern style validation for everything, but also try and catch the exception for PG like you see in PDB-950 at the same time. At least if someone tries to use something that is too complex for PG we have a fallback. It seems like a reasonable approach for now, but I think both steps are desirable for PG and to solve PDB-950. We'll meditate separately on whether this is a good idea for PG ... what I'm generally wary of is removing features people might be using ultimately. In the worst case we could apply the up front validation only for HSQLDB perhaps, that should just be an extra (when ...) statement somewhere, so shouldn't be a massive change later on.
There was a problem hiding this comment.
Hmm, what if we leave that check for HSQLDB only and for PG instead of sending another request just for validation, simply catch org.postgresql.util.PSQLException and generate normal 5xx error? (I don't think that we can return 4xx error in this case, even if we check exception message to match "ERROR: invalid regular expression". As you already said - it is hard to say whether it was user's invalid regexp or our internal one)
There was a problem hiding this comment.
Hmm, what if we leave that check for HSQLDB only and for PG instead of sending another request just for validation, simply catch org.postgresql.util.PSQLException and generate normal 5xx error?
What I said in my previous comment was to just make it work for both, and we can figure out with some functional testing and perhaps some more thought whether we need a conditional. You (or us) can change this later, but I'd just raise a patch that tried to do the validation for now for both db's. That way we get it in front of us and we can see how it feels when working. You never know, this might be enough.
(I don't think that we can return 4xx error in this case, as you already said - it is hard to say whether it was user's invalid regexp or our internal one)
I don't think I said anything about determining the difference between a user regexp or an internal one ... I'm not sure what you mean here, or perhaps I wasn't clear. No, the goal of PDB-950 is to not make it a 500 error and reflect it as a user error status code, ultimately this must be our destination otherwise we're just wasting time and moving the problem around. That is, I'm not sure what the value of any extra validation if all we do is still return a 500 status. We must be able to differentiate from consoles whether the user must take action, or whether its a fault in our system.
Perhaps what you misunderstood is the difference between validating it up front, versus validating it by catching the exception. Validation up front can allow us at some point to point to the user and say "this part of the query broke it" for example they may have done multiple regexp queries and only one failed. On the flip-side if we just catch the exception later on however, we can't express this to a user in the form of their query ... ie. 'something broke' but its up to the user to determine which. Either way, there is a much greater chance that its still a user problem that they need to fix, so a 500 would not be valid here, and a 400 would apply instead.
Half the trick would just be making sure that the exception we catch from the system is analyzed correctly for this particular failure. Strings aren't great, but I believe if we analyze the PSQLException there is probably an error code (getErrorCode) or state (getSQLState) we can match on for this kind of failure instead, that should be far more stable and reliable. Either way, matching this kind of thing can be tricky, so a unit test that actually shakes up the PSQL failure would be needed to ensure this doesn't regress I think.
There was a problem hiding this comment.
If I understood you correctly, you suggests to leave this check for both DB, but don't fail PG request immediately and rather perform a query. Then if we see that query failed - generate a 4xx error (we already know that this is because of invalid regexp).
If query was ok - then we proceed this as normal request and not fail because of some of POSIX regexp features are not supported by Java regexps.
Right?
PS: Could we connect on Skype or other IM? It is easier to discuss such things there.
There was a problem hiding this comment.
If I understood you correctly, you suggests to leave this check for both DB, but don't fail PG request immediately and rather perform a query. Then if we see that query failed - generate a 4xx error (we already know that this is because of invalid regexp).
Almost ... put in the check and if it fails just fail for both regardless, ie. disregard my concern for now. Return a 400 early. We'll then do some research and testing on this end, to see where it might not be quite 'perfect'.
Now as obviously Java regexp is a superset most probably of some POSIX re features, that check isn't enough for PG, so we need to also do a validation at query runtime as we described. Somehow we need to identify if its just a regexp query or some other fatal PSQLException, that might be tricky but worth the research into status codes and error codes and such (matching the error string would be the very last thing I'd try, since thats not the most stable of API's).
I can talk on skype if you like ... kenbarber_geek ... I'm about to go to lunch soon but send me a request and we can chat later on. I'm in the UK.
|
I changed couple of things:
However, there is a special case that results in 200 HTTP code and empty response. I'd like to add, that any SQL exception that occurs while executing "main" query in a separate thread when "count-query" does not exist will result in same response and exception in the log. So, it seems that with current architecture it is not possible to handle those exceptions and return 4xx error when regexp is invalid. I think that the case I described above is pretty rare, since to reproduce it, user will need to use Java-specific regexp syntax with Postgres DB and also call a query that does not have "count-query". In all other cases user should get 4xx error. Any comments? |
|
@DemonShi for query 1) we only need this to work for v4, so doing whatever is needed for that is all that is necessary. v2/v3 will be retired in master soon, before we release. As far as problem 2, I almost get it, perhaps you want to split out your patches into two PR's, so we can focus on these two issues separately? I think feature 1 should be easy enough to merge in on its own. If you can make a patch with the problem you describe in for #2 then I can probably look at that on its own a little easier. |
|
Hi @kbarber , I moved simple validation (query 1) into PR-1164 and reverted all up-front validation here as you asked :) I will be waiting on your feedback for both PRs. One question - are we stopped of making fixes for v3 API targeted for the next release (as this API will be retired before the next release)? Thanks, |
|
Yes, PDB-695 will cover the removal work. We're pegging this for next sprint (2 weeks starting wednesday) most probably, along with other drastic changes like converting all fields to use underscores instead of hyphens (PDB-698). |
|
Can one of the admins verify this patch? |
1 similar comment
|
Can one of the admins verify this patch? |
|
@pljenkinsro test this please |
|
Test PASSed. |
There was a problem hiding this comment.
The more I think about this case of checking the exception within the streaming thread, the more I think its not enough, and probably not massively useful in its current case. In that, the result is more or less the same with or without this code. What I mean is, at the end of the day the exception is still logged as an exception, and the user still receives a 200 regardless. What has changed? We've added some code, and modified how we log it. I'm not 100% certain this should go in as it stands.
Its a shame we don't have a decent way of expressing this exception back to the HTTP layer somehow, perhaps we need to do more work here to get the streaming code to achieve better messaging between the front & backend threads so exceptions can be expressed correctly - somehow.
|
I think generally, for this bug to be truly fixed we need to see if we can get the errors back from the thread somehow. I have zero ideas on this at the moment, what did you investigation lead you to for this @DemonShi ? |
|
@kbarber I did not investigate this topic much. But here are some of the ideas in my mind on how to solve this problem (and other ones when we log an exception and return 200 code with empty body):
|
|
@DemonShi I think we should investigate this more deeply. I'm probably okay merging in the changes specifically for include-total for now, but its not a complete 'fix' for PDB-950, not across the board anyway. This is a hard problem to solve, and we need to be mindful of the performance setup we have now with a thread that performs the cursor & pg streaming ... there is a specific reason why it works the way it does and I bet thats going to get in the way of thinking about reducing the thread (your idea #1). As far as idea #2 ... this is going to be interesting and complex, but we might need to start thinking in this direction. Blocking until the thread finishes however might not be great, we want to start streaming as soon as results are available, and by then we usually have already setup the header. Perhaps we need to block until we have 1 response from the streaming thread somehow? I like the thinking around this problem, and the problem you've dragged up in this patch, its very evident to me now the design makes it hard for anyone trying to solve this. I guess this is what happens when you optimize for performance :-). |
|
@kbarber I totally understand and agree with your points. I will investigate this deeply. Hopefully, I will have more free time than I do now :) BTW, do we have any performance tests with long responses which I could use to check if there is a performance degradation? As for this PR, are we going to merge it while I will be working on a threading problem in the other PR or just hold this one until the threading problem solution is available? |
|
@DemonShi I think if we want to merge anything, we shouldn't merge anything that is YAGNI today. So reducing this to work with just include-total on the HTTP thread isn't a bad idea for now, I could probably agree to that at least as its a start. Since I see some risk in merging the alternate code on the inner thread I say avoid that for now ... because ultimately a) it might change anyway and b) it doesn't actually change that much as it stands. That is, we should be wary of merging in something not quite complete and liable to change anyway. |
|
@kbarber Okay, I will remove inner thread exception handler from this PR. Do you know any info on deadline and performance tests questions? |
|
@DemonShi I've been absent and missed your last question: I have no specific deadline on this one. In regards to performance testing, we have nothing specific for queries today that we use, its usually bespoke - one could use an external testing tool, or do the test in clojure without the networking traffic so you can iterate in a tight loop. I'm not sure if this change needs something heavier than this today, its easy enough to observe a potential change and conjecture as well :-). |
|
@kbarber Oh, I've been pretty busy and also had pretty long holidays here too. According to the PR: I developed a solution that tries to take a first item from result-set lazy sequence under try/catch and resolves a promise with nil. If there is an exception, I resolve that promise with exception from DB. This approach also allows to handle all other types of exceptions that occur in separate thread during query execution. Those currently result in empty 200 response. Here is code snippet just to get your opinion of the approach: I still need to refactor it a bit, check corner cases and cover it with tests, but the idea of the approach should be clear. What do you think? |
|
Waiting for CLA signature by @DemonShi @DemonShi - We require a Contributor License Agreement (CLA) for people who contribute to Puppet, but we have an easy click-through license with instructions, which is available at https://cla.puppetlabs.com/ Note: if your contribution is trivial and you think it may be exempt from the CLA, please post a short reply to this comment with details. http://docs.puppetlabs.com/community/trivial_patch_exemption.html |
|
@puppetcla, I signed CLA |
|
CLA signed by all contributors. |
|
@DemonShi I think this sounds interesting, it looks to me like this should be okay in theory - and it looks like this is only done on the first item in the response only which is great. As long as it doesn't try to realize the lazy-seq at any point (ie. it keeps it as a lazy-seq and just does this on the first item for example) then it should be fine. |
…valid regexp exception handler which returns 4xx status code. Covered with tests.
|
@pljenkinsro test this please |
|
I implemented the approach discussed above. Program evaluates first element from lazy-seq forcing statement execution and catching any SQL exceptions which can be handled later. Couple of questions:
|
|
@pljenkinsro retest this please |
|
@DemonShi to answer your questions:
|
|
Test FAILed. |
|
thats a PASS, only a rubygems transient is being thrown ^^. |
|
Merged manually, with some small cleanups and the extra tests. |
This commit adds a validation for regexps in queries as was proposed in PDB-950