WIP: (PDB-950) Catch and express postgresql regexp failures as better errors by nikitiuk0 · Pull Request #1142 · puppetlabs/puppetdb

nikitiuk0 · 2014-11-06T00:43:52Z

This commit adds a validation for regexps in queries as was proposed in PDB-950

pljenkinsro · 2014-11-06T00:46:02Z

Can one of the admins verify this patch?

kbarber · 2014-11-06T01:03:08Z

src/puppetlabs/puppetdb/query_eng.clj

Hmm interesting.

The general concern I have with this facility, is that it enforces Java regular expressions, which isn't POSIX regexp which is ultimately what we are exposing here: http://www.postgresql.org/docs/9.0/static/functions-matching.html#FUNCTIONS-POSIX-TABLE. Although in practice I'm sure this catches a lot of the problems, what if java regex doesn't support something PG does for example. Also it will allow through Java regexp forms, even though the engine will throw an error.

While HSQLDB supports Java regexp, we aren't as concerned about that database platform.

I was thinking about it too, but HSQLDB supports Java regexp rules. Do we support 2 different regexp syntaxes depending on which DB is used?

Just found a difference between Java and POSIX regexp syntax in character classes: http://www.regular-expressions.info/posixbrackets.html
But overall they look similar to each other.

I think I can create 2 different validators: for HSQLDB and for PG. However, I am not sure if it is possible to validate PG(POSIX) regexp without executing DB query (which can probably affect the performance). Do you want me to do this?

I was thinking about it too, but HSQLDB supports Java regexp rules. Do we support 2 different regexp syntaxes depending on which DB is used?

Yes we do, for good or bad. And what's more, HSQLDB is not the major case we support ... PG is. We've been reducing support for HSQLDB if anything.

Just found a difference between Java and POSIX regexp syntax in character classes: http://www.regular-expressions.info/posixbrackets.html
But overall they look similar to each other.

We'd need to dig in deep to ensure this at least, or I'd be very very careful at least to say that its safe to make a 1 to 1 mapping. I'm wary basically :-).

I think I can create 2 different validators: for HSQLDB and for PG. However, I am not sure if it is possible to validate PG(POSIX) regexp without executing DB query (which can probably affect the performance). Do you want me to do this?

Well, its okay to validate this way, but you might as well do the full query at the same time and instead throw a more meaningful error when it fails. This is not validation up front really, having said that I wouldn't expect a query to pause if a regexp is incorrect, it should throw an error quite quickly. Maybe this is just overall a more 'correct' approach? The JDBC driver should be expressing a reasonably correct regular expression error. The problem stems from the fact that its then hard to tell the user which query in the context of our AST style querying language that actually failed, because once its SQL its hard to track it back to the original part.

Overall this is an interesting problem.

How about this, why not go ahead and add the re-pattern style validation for everything, but also try and catch the exception for PG like you see in PDB-950 at the same time. At least if someone tries to use something that is too complex for PG we have a fallback. It seems like a reasonable approach for now, but I think both steps are desirable for PG and to solve PDB-950. We'll meditate separately on whether this is a good idea for PG ... what I'm generally wary of is removing features people might be using ultimately. In the worst case we could apply the up front validation only for HSQLDB perhaps, that should just be an extra (when ...) statement somewhere, so shouldn't be a massive change later on.

Hmm, what if we leave that check for HSQLDB only and for PG instead of sending another request just for validation, simply catch org.postgresql.util.PSQLException and generate normal 5xx error? (I don't think that we can return 4xx error in this case, even if we check exception message to match "ERROR: invalid regular expression". As you already said - it is hard to say whether it was user's invalid regexp or our internal one)

Hmm, what if we leave that check for HSQLDB only and for PG instead of sending another request just for validation, simply catch org.postgresql.util.PSQLException and generate normal 5xx error?

What I said in my previous comment was to just make it work for both, and we can figure out with some functional testing and perhaps some more thought whether we need a conditional. You (or us) can change this later, but I'd just raise a patch that tried to do the validation for now for both db's. That way we get it in front of us and we can see how it feels when working. You never know, this might be enough.

(I don't think that we can return 4xx error in this case, as you already said - it is hard to say whether it was user's invalid regexp or our internal one)

I don't think I said anything about determining the difference between a user regexp or an internal one ... I'm not sure what you mean here, or perhaps I wasn't clear. No, the goal of PDB-950 is to not make it a 500 error and reflect it as a user error status code, ultimately this must be our destination otherwise we're just wasting time and moving the problem around. That is, I'm not sure what the value of any extra validation if all we do is still return a 500 status. We must be able to differentiate from consoles whether the user must take action, or whether its a fault in our system.

Perhaps what you misunderstood is the difference between validating it up front, versus validating it by catching the exception. Validation up front can allow us at some point to point to the user and say "this part of the query broke it" for example they may have done multiple regexp queries and only one failed. On the flip-side if we just catch the exception later on however, we can't express this to a user in the form of their query ... ie. 'something broke' but its up to the user to determine which. Either way, there is a much greater chance that its still a user problem that they need to fix, so a 500 would not be valid here, and a 400 would apply instead.

Half the trick would just be making sure that the exception we catch from the system is analyzed correctly for this particular failure. Strings aren't great, but I believe if we analyze the PSQLException there is probably an error code (getErrorCode) or state (getSQLState) we can match on for this kind of failure instead, that should be far more stable and reliable. Either way, matching this kind of thing can be tricky, so a unit test that actually shakes up the PSQL failure would be needed to ensure this doesn't regress I think.

If I understood you correctly, you suggests to leave this check for both DB, but don't fail PG request immediately and rather perform a query. Then if we see that query failed - generate a 4xx error (we already know that this is because of invalid regexp).
If query was ok - then we proceed this as normal request and not fail because of some of POSIX regexp features are not supported by Java regexps.
Right?

PS: Could we connect on Skype or other IM? It is easier to discuss such things there.

If I understood you correctly, you suggests to leave this check for both DB, but don't fail PG request immediately and rather perform a query. Then if we see that query failed - generate a 4xx error (we already know that this is because of invalid regexp).

Almost ... put in the check and if it fails just fail for both regardless, ie. disregard my concern for now. Return a 400 early. We'll then do some research and testing on this end, to see where it might not be quite 'perfect'.

Now as obviously Java regexp is a superset most probably of some POSIX re features, that check isn't enough for PG, so we need to also do a validation at query runtime as we described. Somehow we need to identify if its just a regexp query or some other fatal PSQLException, that might be tricky but worth the research into status codes and error codes and such (matching the error string would be the very last thing I'd try, since thats not the most stable of API's).

I can talk on skype if you like ... kenbarber_geek ... I'm about to go to lunch soon but send me a request and we can chat later on. I'm in the UK.

nikitiuk0 · 2014-11-11T22:27:04Z

I changed couple of things:

I integrated regexp validation function with existing v4 and v2//v3 validators.
Unfortunately "validate-binary-operators" is not used by v2/v3 API. Also, compilation and validation of queries in v2/v3 is implemented differently than in v4, so I couldn't just add "validate-binary-operators" to older APIs. Instead I integrated this check to a term compilation function.
Since Java and POSIX regexp syntax are slightly different and we can only validate regexps (in easy way) by Java syntax, there is a small chance that user will use Java-specific construction with PostgreSQL DB which will cause the same error as described in this issue. To avoid this, I added PSQL exception handler that returns 4xx HTTP error when sees "2201B" error code (http://www.postgresql.org/docs/9.3/static/errcodes-appendix.html).

However, there is a special case that results in 200 HTTP code and empty response.
If you check query_eng.clj, there is a "resp" variable that is assigned to a result of "pl-http/stream-json-response" function. The main idea here is that it should immediately return ring response object with 200 status code and execute "main" query in a separate thread. The result of this query occurs in InputStream that is assigned to (:body resp).
In the mean time if "count-query" exists, it will be executed in a current thread. But since "count-query" is optional, user might get 200 HTTP code and empty response when "count-query" does not exist.

I'd like to add, that any SQL exception that occurs while executing "main" query in a separate thread when "count-query" does not exist will result in same response and exception in the log.
If you check macro "pl-http/streamed-response", it states:
As 'body' is executed in a separate thread, it's not possible for the caller to catch exceptions thrown by 'body'. Errors are instead logged.
I wrapped this body with a similar exception handler which prints special error message instead of printing exception with a stack trace when receiving "invalid_regexp".

So, it seems that with current architecture it is not possible to handle those exceptions and return 4xx error when regexp is invalid.
We should probably create another jira-ticket to add a possibility of error handling here. It may also be related to adding normal error handling of other errors, instead of generating HTML page with 500 status code.

I think that the case I described above is pretty rare, since to reproduce it, user will need to use Java-specific regexp syntax with Postgres DB and also call a query that does not have "count-query". In all other cases user should get 4xx error.
Anyway, in this case he should get an error in the log saying: "Processed invalid PostgreSQL regexp. Unable to return bad request code from a different thread."

Any comments?

kbarber · 2014-11-26T18:42:56Z

@DemonShi for query 1) we only need this to work for v4, so doing whatever is needed for that is all that is necessary. v2/v3 will be retired in master soon, before we release.

As far as problem 2, I almost get it, perhaps you want to split out your patches into two PR's, so we can focus on these two issues separately? I think feature 1 should be easy enough to merge in on its own.

If you can make a patch with the problem you describe in for #2 then I can probably look at that on its own a little easier.

nikitiuk0 · 2014-11-26T23:04:34Z

Hi @kbarber ,

I moved simple validation (query 1) into PR-1164 and reverted all up-front validation here as you asked :)

I will be waiting on your feedback for both PRs.

One question - are we stopped of making fixes for v3 API targeted for the next release (as this API will be retired before the next release)?

Thanks,
Andrii

kbarber · 2014-11-27T13:25:25Z

Yes, PDB-695 will cover the removal work. We're pegging this for next sprint (2 weeks starting wednesday) most probably, along with other drastic changes like converting all fields to use underscores instead of hyphens (PDB-698).

pljenkinsro · 2014-12-01T14:46:07Z

Can one of the admins verify this patch?

pljenkinsro · 2014-12-02T18:39:06Z

Can one of the admins verify this patch?

kbarber · 2014-12-03T13:32:05Z

@pljenkinsro test this please

pljenkinsro · 2014-12-03T16:27:31Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://jenkins.puppetlabs.com/job/platform_puppetdb_intn-sys_pr/260/

kbarber · 2014-12-10T14:01:51Z

src/puppetlabs/puppetdb/query_eng.clj

The more I think about this case of checking the exception within the streaming thread, the more I think its not enough, and probably not massively useful in its current case. In that, the result is more or less the same with or without this code. What I mean is, at the end of the day the exception is still logged as an exception, and the user still receives a 200 regardless. What has changed? We've added some code, and modified how we log it. I'm not 100% certain this should go in as it stands.

Its a shame we don't have a decent way of expressing this exception back to the HTTP layer somehow, perhaps we need to do more work here to get the streaming code to achieve better messaging between the front & backend threads so exceptions can be expressed correctly - somehow.

kbarber · 2014-12-10T14:12:23Z

I think generally, for this bug to be truly fixed we need to see if we can get the errors back from the thread somehow. I have zero ideas on this at the moment, what did you investigation lead you to for this @DemonShi ?

nikitiuk0 · 2014-12-10T14:20:18Z

@kbarber I did not investigate this topic much.

But here are some of the ideas in my mind on how to solve this problem (and other ones when we log an exception and return 200 code with empty body):

Don't execute results-query in a separate thread when include-total is not passed. This might solve the problem, but only if count-query is semantically the same as results-query (e.g. both count-query and result-query will always fail in a same way). I need to double check that.
Pass status code back from the results-query thread. This will probably require blocking ring thread until results-query thread finishes. I will need to study more of how ring manages its threads and if there are other ways of passing status code back (maybe again though something similar to InputStream)

kbarber · 2014-12-10T14:30:16Z

@DemonShi I think we should investigate this more deeply. I'm probably okay merging in the changes specifically for include-total for now, but its not a complete 'fix' for PDB-950, not across the board anyway.

This is a hard problem to solve, and we need to be mindful of the performance setup we have now with a thread that performs the cursor & pg streaming ... there is a specific reason why it works the way it does and I bet thats going to get in the way of thinking about reducing the thread (your idea #1). As far as idea #2 ... this is going to be interesting and complex, but we might need to start thinking in this direction. Blocking until the thread finishes however might not be great, we want to start streaming as soon as results are available, and by then we usually have already setup the header. Perhaps we need to block until we have 1 response from the streaming thread somehow?

I like the thinking around this problem, and the problem you've dragged up in this patch, its very evident to me now the design makes it hard for anyone trying to solve this. I guess this is what happens when you optimize for performance :-).

nikitiuk0 · 2014-12-10T14:41:25Z

@kbarber I totally understand and agree with your points. I will investigate this deeply. Hopefully, I will have more free time than I do now :)
Do we have any hard deadline for this issue?

BTW, do we have any performance tests with long responses which I could use to check if there is a performance degradation?

As for this PR, are we going to merge it while I will be working on a threading problem in the other PR or just hold this one until the threading problem solution is available?

kbarber · 2014-12-10T14:44:55Z

@DemonShi I think if we want to merge anything, we shouldn't merge anything that is YAGNI today. So reducing this to work with just include-total on the HTTP thread isn't a bad idea for now, I could probably agree to that at least as its a start.

Since I see some risk in merging the alternate code on the inner thread I say avoid that for now ... because ultimately a) it might change anyway and b) it doesn't actually change that much as it stands. That is, we should be wary of merging in something not quite complete and liable to change anyway.

nikitiuk0 · 2014-12-10T14:54:22Z

@kbarber Okay, I will remove inner thread exception handler from this PR.

Do you know any info on deadline and performance tests questions?

kbarber · 2015-01-05T14:22:39Z

@DemonShi I've been absent and missed your last question: I have no specific deadline on this one. In regards to performance testing, we have nothing specific for queries today that we use, its usually bespoke - one could use an external testing tool, or do the test in clojure without the networking traffic so you can iterate in a tight loop. I'm not sure if this change needs something heavier than this today, its easy enough to observe a potential change and conjecture as well :-).

nikitiuk0 · 2015-01-06T01:31:29Z

@kbarber Oh, I've been pretty busy and also had pretty long holidays here too.

According to the PR: I developed a solution that tries to take a first item from result-set lazy sequence under try/catch and resolves a promise with nil. If there is an exception, I resolve that promise with exception from DB.
Then from the "ring" thread we can wait until promise is ready (basically, that is the moment right after we took a first item from result-set or nothing - when result-set is empty) and say whether query was successful or not. If query failed - we re-throw the exception and follow normal error processing (the same way as if there was a count-query). If query is okay - continue execution.

This approach also allows to handle all other types of exceptions that occur in separate thread during query execution. Those currently result in empty 200 response.

Here is code snippet just to get your opinion of the approach:

(jdbc/with-transacted-connection db
      (let [{[sql & params] :results-query
             count-query :count-query
             projections :projections} (query->sql version query
                                                   paging-options)
             resp-error (promise)
             resp (output-fn
                   (fn [f]
                     (try
                       (jdbc/with-transacted-connection db
                         (query/streamed-query-result version sql params
                           (comp
                             f
                             #(do
                                (first %)
                                (deliver resp-error nil)
                                %)
                             (munge-fn version projections))))
                       (catch Exception e
                         (deliver resp-error e)
                         (throw e)))))]
        (when @resp-error
          (println (str "SQL error occured: " (.getMessage @resp-error)))
          (throw @resp-error))
        (if count-query
          (http/add-headers resp {:count (jdbc/get-result-count count-query)})
          resp)))))

I still need to refactor it a bit, check corner cases and cover it with tests, but the idea of the approach should be clear.

What do you think?

puppetcla · 2015-01-06T18:00:34Z

Waiting for CLA signature by @DemonShi

@DemonShi - We require a Contributor License Agreement (CLA) for people who contribute to Puppet, but we have an easy click-through license with instructions, which is available at https://cla.puppetlabs.com/

Note: if your contribution is trivial and you think it may be exempt from the CLA, please post a short reply to this comment with details. http://docs.puppetlabs.com/community/trivial_patch_exemption.html

nikitiuk0 · 2015-01-12T00:28:58Z

@puppetcla, I signed CLA

puppetcla · 2015-01-12T01:00:20Z

CLA signed by all contributors.

kbarber · 2015-01-12T19:34:46Z

@DemonShi I think this sounds interesting, it looks to me like this should be okay in theory - and it looks like this is only done on the first item in the response only which is great. As long as it doesn't try to realize the lazy-seq at any point (ie. it keeps it as a lazy-seq and just does this on the first item for example) then it should be fine.

…valid regexp exception handler which returns 4xx status code. Covered with tests.

nikitiuk0 · 2015-01-14T10:24:59Z

@pljenkinsro test this please

nikitiuk0 · 2015-01-14T10:27:02Z

I implemented the approach discussed above. Program evaluates first element from lazy-seq forcing statement execution and catching any SQL exceptions which can be handled later.
Please take a look at my above comment for more detailed description.

Couple of questions:

Should I write http tests for each endpoint that supports regexps?
Should I combine my main commit with travis-related commit?
I cannot remove WIP label myself, can I?

kbarber · 2015-02-03T11:52:52Z

@pljenkinsro retest this please

kbarber · 2015-02-03T12:40:09Z

@DemonShi to answer your questions:

no, this is enough
no, this is fine for now
yes, well - you should be able to? No idea, I can look into github permissions.

pljenkinsro · 2015-02-03T13:05:24Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://jenkins.puppetlabs.com/job/platform_puppetdb_intn-sys_pr/547/

kbarber · 2015-02-03T13:08:30Z

thats a PASS, only a rubygems transient is being thrown ^^.

kbarber · 2015-02-03T13:40:09Z

Merged manually, with some small cleanups and the extra tests.

kbarber reviewed Nov 6, 2014
View reviewed changes

kbarber changed the title ~~(PDB-950) Added validation for regexps in queries~~ WIP: (PDB-950) Added validation for regexps in queries Nov 7, 2014

nikitiuk0 mentioned this pull request Nov 26, 2014

WIP: (PDB-950) Added upfront validation for regexps in queries in API v4 for HSQLDB #1164

Closed

kbarber changed the title ~~WIP: (PDB-950) Added validation for regexps in queries~~ (PDB-950) Catch and express postgresql regexp failures as better errors Nov 28, 2014

kbarber reviewed Dec 10, 2014
View reviewed changes

kbarber changed the title ~~(PDB-950) Catch and express postgresql regexp failures as better errors~~ WIP: (PDB-950) Catch and express postgresql regexp failures as better errors Dec 10, 2014

kbarber added the work in progress (...and please don't merge) label Jan 5, 2015

nikitiuk0 force-pushed the master branch from f8b0159 to 2a243b6 Compare January 14, 2015 00:04

(PDB-950) Added an ability to handle query exceptions. Added PGSQL in…

581ff98

…valid regexp exception handler which returns 4xx status code. Covered with tests.

nikitiuk0 force-pushed the master branch from 2a243b6 to 581ff98 Compare January 14, 2015 00:53

(PDB-950) Switched to PostgreSQL 9.3 instead of default 9.1 in Travis CI

d37337b

kbarber removed the work in progress (...and please don't merge) label Feb 3, 2015

kbarber closed this Feb 3, 2015

Conversation

nikitiuk0 commented Nov 6, 2014

Uh oh!

pljenkinsro commented Nov 6, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikitiuk0 commented Nov 11, 2014

Uh oh!

kbarber commented Nov 26, 2014

Uh oh!

nikitiuk0 commented Nov 26, 2014

Uh oh!

kbarber commented Nov 27, 2014

Uh oh!

pljenkinsro commented Dec 1, 2014

Uh oh!

pljenkinsro commented Dec 2, 2014

Uh oh!

kbarber commented Dec 3, 2014

Uh oh!

pljenkinsro commented Dec 3, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbarber commented Dec 10, 2014

Uh oh!

nikitiuk0 commented Dec 10, 2014

Uh oh!

kbarber commented Dec 10, 2014

Uh oh!

nikitiuk0 commented Dec 10, 2014

Uh oh!

kbarber commented Dec 10, 2014

Uh oh!

nikitiuk0 commented Dec 10, 2014

Uh oh!

kbarber commented Jan 5, 2015

Uh oh!

nikitiuk0 commented Jan 6, 2015

Uh oh!

puppetcla commented Jan 6, 2015

Uh oh!

nikitiuk0 commented Jan 12, 2015

Uh oh!

puppetcla commented Jan 12, 2015

Uh oh!

kbarber commented Jan 12, 2015

Uh oh!

nikitiuk0 commented Jan 14, 2015

Uh oh!

nikitiuk0 commented Jan 14, 2015

Uh oh!

kbarber commented Feb 3, 2015

Uh oh!

kbarber commented Feb 3, 2015

Uh oh!

pljenkinsro commented Feb 3, 2015

Uh oh!

kbarber commented Feb 3, 2015

Uh oh!

kbarber commented Feb 3, 2015

Uh oh!

Reviewers

Assignees

Labels