Skip to content

WIP: (PDB-950) Catch and express postgresql regexp failures as better errors#1142

Closed
nikitiuk0 wants to merge 2 commits intopuppetlabs:masterfrom
nikitiuk0:master
Closed

WIP: (PDB-950) Catch and express postgresql regexp failures as better errors#1142
nikitiuk0 wants to merge 2 commits intopuppetlabs:masterfrom
nikitiuk0:master

Conversation

@nikitiuk0
Copy link

This commit adds a validation for regexps in queries as was proposed in PDB-950

@pljenkinsro
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting.

The general concern I have with this facility, is that it enforces Java regular expressions, which isn't POSIX regexp which is ultimately what we are exposing here: http://www.postgresql.org/docs/9.0/static/functions-matching.html#FUNCTIONS-POSIX-TABLE. Although in practice I'm sure this catches a lot of the problems, what if java regex doesn't support something PG does for example. Also it will allow through Java regexp forms, even though the engine will throw an error.

While HSQLDB supports Java regexp, we aren't as concerned about that database platform.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it too, but HSQLDB supports Java regexp rules. Do we support 2 different regexp syntaxes depending on which DB is used?

Just found a difference between Java and POSIX regexp syntax in character classes: http://www.regular-expressions.info/posixbrackets.html
But overall they look similar to each other.

I think I can create 2 different validators: for HSQLDB and for PG. However, I am not sure if it is possible to validate PG(POSIX) regexp without executing DB query (which can probably affect the performance). Do you want me to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it too, but HSQLDB supports Java regexp rules. Do we support 2 different regexp syntaxes depending on which DB is used?

Yes we do, for good or bad. And what's more, HSQLDB is not the major case we support ... PG is. We've been reducing support for HSQLDB if anything.

Just found a difference between Java and POSIX regexp syntax in character classes: http://www.regular-expressions.info/posixbrackets.html
But overall they look similar to each other.

We'd need to dig in deep to ensure this at least, or I'd be very very careful at least to say that its safe to make a 1 to 1 mapping. I'm wary basically :-).

I think I can create 2 different validators: for HSQLDB and for PG. However, I am not sure if it is possible to validate PG(POSIX) regexp without executing DB query (which can probably affect the performance). Do you want me to do this?

Well, its okay to validate this way, but you might as well do the full query at the same time and instead throw a more meaningful error when it fails. This is not validation up front really, having said that I wouldn't expect a query to pause if a regexp is incorrect, it should throw an error quite quickly. Maybe this is just overall a more 'correct' approach? The JDBC driver should be expressing a reasonably correct regular expression error. The problem stems from the fact that its then hard to tell the user which query in the context of our AST style querying language that actually failed, because once its SQL its hard to track it back to the original part.

Overall this is an interesting problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this, why not go ahead and add the re-pattern style validation for everything, but also try and catch the exception for PG like you see in PDB-950 at the same time. At least if someone tries to use something that is too complex for PG we have a fallback. It seems like a reasonable approach for now, but I think both steps are desirable for PG and to solve PDB-950. We'll meditate separately on whether this is a good idea for PG ... what I'm generally wary of is removing features people might be using ultimately. In the worst case we could apply the up front validation only for HSQLDB perhaps, that should just be an extra (when ...) statement somewhere, so shouldn't be a massive change later on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what if we leave that check for HSQLDB only and for PG instead of sending another request just for validation, simply catch org.postgresql.util.PSQLException and generate normal 5xx error? (I don't think that we can return 4xx error in this case, even if we check exception message to match "ERROR: invalid regular expression". As you already said - it is hard to say whether it was user's invalid regexp or our internal one)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what if we leave that check for HSQLDB only and for PG instead of sending another request just for validation, simply catch org.postgresql.util.PSQLException and generate normal 5xx error?

What I said in my previous comment was to just make it work for both, and we can figure out with some functional testing and perhaps some more thought whether we need a conditional. You (or us) can change this later, but I'd just raise a patch that tried to do the validation for now for both db's. That way we get it in front of us and we can see how it feels when working. You never know, this might be enough.

(I don't think that we can return 4xx error in this case, as you already said - it is hard to say whether it was user's invalid regexp or our internal one)

I don't think I said anything about determining the difference between a user regexp or an internal one ... I'm not sure what you mean here, or perhaps I wasn't clear. No, the goal of PDB-950 is to not make it a 500 error and reflect it as a user error status code, ultimately this must be our destination otherwise we're just wasting time and moving the problem around. That is, I'm not sure what the value of any extra validation if all we do is still return a 500 status. We must be able to differentiate from consoles whether the user must take action, or whether its a fault in our system.

Perhaps what you misunderstood is the difference between validating it up front, versus validating it by catching the exception. Validation up front can allow us at some point to point to the user and say "this part of the query broke it" for example they may have done multiple regexp queries and only one failed. On the flip-side if we just catch the exception later on however, we can't express this to a user in the form of their query ... ie. 'something broke' but its up to the user to determine which. Either way, there is a much greater chance that its still a user problem that they need to fix, so a 500 would not be valid here, and a 400 would apply instead.

Half the trick would just be making sure that the exception we catch from the system is analyzed correctly for this particular failure. Strings aren't great, but I believe if we analyze the PSQLException there is probably an error code (getErrorCode) or state (getSQLState) we can match on for this kind of failure instead, that should be far more stable and reliable. Either way, matching this kind of thing can be tricky, so a unit test that actually shakes up the PSQL failure would be needed to ensure this doesn't regress I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood you correctly, you suggests to leave this check for both DB, but don't fail PG request immediately and rather perform a query. Then if we see that query failed - generate a 4xx error (we already know that this is because of invalid regexp).
If query was ok - then we proceed this as normal request and not fail because of some of POSIX regexp features are not supported by Java regexps.
Right?

PS: Could we connect on Skype or other IM? It is easier to discuss such things there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood you correctly, you suggests to leave this check for both DB, but don't fail PG request immediately and rather perform a query. Then if we see that query failed - generate a 4xx error (we already know that this is because of invalid regexp).

Almost ... put in the check and if it fails just fail for both regardless, ie. disregard my concern for now. Return a 400 early. We'll then do some research and testing on this end, to see where it might not be quite 'perfect'.

Now as obviously Java regexp is a superset most probably of some POSIX re features, that check isn't enough for PG, so we need to also do a validation at query runtime as we described. Somehow we need to identify if its just a regexp query or some other fatal PSQLException, that might be tricky but worth the research into status codes and error codes and such (matching the error string would be the very last thing I'd try, since thats not the most stable of API's).

I can talk on skype if you like ... kenbarber_geek ... I'm about to go to lunch soon but send me a request and we can chat later on. I'm in the UK.

@kbarber kbarber changed the title (PDB-950) Added validation for regexps in queries WIP: (PDB-950) Added validation for regexps in queries Nov 7, 2014
@nikitiuk0
Copy link
Author

I changed couple of things:

  1. I integrated regexp validation function with existing v4 and v2//v3 validators.
    Unfortunately "validate-binary-operators" is not used by v2/v3 API. Also, compilation and validation of queries in v2/v3 is implemented differently than in v4, so I couldn't just add "validate-binary-operators" to older APIs. Instead I integrated this check to a term compilation function.

  2. Since Java and POSIX regexp syntax are slightly different and we can only validate regexps (in easy way) by Java syntax, there is a small chance that user will use Java-specific construction with PostgreSQL DB which will cause the same error as described in this issue. To avoid this, I added PSQL exception handler that returns 4xx HTTP error when sees "2201B" error code (http://www.postgresql.org/docs/9.3/static/errcodes-appendix.html).

However, there is a special case that results in 200 HTTP code and empty response.
If you check query_eng.clj, there is a "resp" variable that is assigned to a result of "pl-http/stream-json-response" function. The main idea here is that it should immediately return ring response object with 200 status code and execute "main" query in a separate thread. The result of this query occurs in InputStream that is assigned to (:body resp).
In the mean time if "count-query" exists, it will be executed in a current thread. But since "count-query" is optional, user might get 200 HTTP code and empty response when "count-query" does not exist.

I'd like to add, that any SQL exception that occurs while executing "main" query in a separate thread when "count-query" does not exist will result in same response and exception in the log.
If you check macro "pl-http/streamed-response", it states:
As 'body' is executed in a separate thread, it's not possible for the caller to catch exceptions thrown by 'body'. Errors are instead logged.
I wrapped this body with a similar exception handler which prints special error message instead of printing exception with a stack trace when receiving "invalid_regexp".

So, it seems that with current architecture it is not possible to handle those exceptions and return 4xx error when regexp is invalid.
We should probably create another jira-ticket to add a possibility of error handling here. It may also be related to adding normal error handling of other errors, instead of generating HTML page with 500 status code.

I think that the case I described above is pretty rare, since to reproduce it, user will need to use Java-specific regexp syntax with Postgres DB and also call a query that does not have "count-query". In all other cases user should get 4xx error.
Anyway, in this case he should get an error in the log saying: "Processed invalid PostgreSQL regexp. Unable to return bad request code from a different thread."

Any comments?

@kbarber
Copy link
Contributor

kbarber commented Nov 26, 2014

@DemonShi for query 1) we only need this to work for v4, so doing whatever is needed for that is all that is necessary. v2/v3 will be retired in master soon, before we release.

As far as problem 2, I almost get it, perhaps you want to split out your patches into two PR's, so we can focus on these two issues separately? I think feature 1 should be easy enough to merge in on its own.

If you can make a patch with the problem you describe in for #2 then I can probably look at that on its own a little easier.

@nikitiuk0
Copy link
Author

Hi @kbarber ,

I moved simple validation (query 1) into PR-1164 and reverted all up-front validation here as you asked :)

I will be waiting on your feedback for both PRs.

One question - are we stopped of making fixes for v3 API targeted for the next release (as this API will be retired before the next release)?

Thanks,
Andrii

@kbarber
Copy link
Contributor

kbarber commented Nov 27, 2014

Yes, PDB-695 will cover the removal work. We're pegging this for next sprint (2 weeks starting wednesday) most probably, along with other drastic changes like converting all fields to use underscores instead of hyphens (PDB-698).

@kbarber kbarber changed the title WIP: (PDB-950) Added validation for regexps in queries (PDB-950) Catch and express postgresql regexp failures as better errors Nov 28, 2014
@pljenkinsro
Copy link

Can one of the admins verify this patch?

1 similar comment
@pljenkinsro
Copy link

Can one of the admins verify this patch?

@kbarber
Copy link
Contributor

kbarber commented Dec 3, 2014

@pljenkinsro test this please

@pljenkinsro
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://jenkins.puppetlabs.com/job/platform_puppetdb_intn-sys_pr/260/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about this case of checking the exception within the streaming thread, the more I think its not enough, and probably not massively useful in its current case. In that, the result is more or less the same with or without this code. What I mean is, at the end of the day the exception is still logged as an exception, and the user still receives a 200 regardless. What has changed? We've added some code, and modified how we log it. I'm not 100% certain this should go in as it stands.

Its a shame we don't have a decent way of expressing this exception back to the HTTP layer somehow, perhaps we need to do more work here to get the streaming code to achieve better messaging between the front & backend threads so exceptions can be expressed correctly - somehow.

@kbarber
Copy link
Contributor

kbarber commented Dec 10, 2014

I think generally, for this bug to be truly fixed we need to see if we can get the errors back from the thread somehow. I have zero ideas on this at the moment, what did you investigation lead you to for this @DemonShi ?

@nikitiuk0
Copy link
Author

@kbarber I did not investigate this topic much.

But here are some of the ideas in my mind on how to solve this problem (and other ones when we log an exception and return 200 code with empty body):

  • Don't execute results-query in a separate thread when include-total is not passed. This might solve the problem, but only if count-query is semantically the same as results-query (e.g. both count-query and result-query will always fail in a same way). I need to double check that.
  • Pass status code back from the results-query thread. This will probably require blocking ring thread until results-query thread finishes. I will need to study more of how ring manages its threads and if there are other ways of passing status code back (maybe again though something similar to InputStream)

@kbarber
Copy link
Contributor

kbarber commented Dec 10, 2014

@DemonShi I think we should investigate this more deeply. I'm probably okay merging in the changes specifically for include-total for now, but its not a complete 'fix' for PDB-950, not across the board anyway.

This is a hard problem to solve, and we need to be mindful of the performance setup we have now with a thread that performs the cursor & pg streaming ... there is a specific reason why it works the way it does and I bet thats going to get in the way of thinking about reducing the thread (your idea #1). As far as idea #2 ... this is going to be interesting and complex, but we might need to start thinking in this direction. Blocking until the thread finishes however might not be great, we want to start streaming as soon as results are available, and by then we usually have already setup the header. Perhaps we need to block until we have 1 response from the streaming thread somehow?

I like the thinking around this problem, and the problem you've dragged up in this patch, its very evident to me now the design makes it hard for anyone trying to solve this. I guess this is what happens when you optimize for performance :-).

@kbarber kbarber changed the title (PDB-950) Catch and express postgresql regexp failures as better errors WIP: (PDB-950) Catch and express postgresql regexp failures as better errors Dec 10, 2014
@nikitiuk0
Copy link
Author

@kbarber I totally understand and agree with your points. I will investigate this deeply. Hopefully, I will have more free time than I do now :)
Do we have any hard deadline for this issue?

BTW, do we have any performance tests with long responses which I could use to check if there is a performance degradation?

As for this PR, are we going to merge it while I will be working on a threading problem in the other PR or just hold this one until the threading problem solution is available?

@kbarber
Copy link
Contributor

kbarber commented Dec 10, 2014

@DemonShi I think if we want to merge anything, we shouldn't merge anything that is YAGNI today. So reducing this to work with just include-total on the HTTP thread isn't a bad idea for now, I could probably agree to that at least as its a start.

Since I see some risk in merging the alternate code on the inner thread I say avoid that for now ... because ultimately a) it might change anyway and b) it doesn't actually change that much as it stands. That is, we should be wary of merging in something not quite complete and liable to change anyway.

@nikitiuk0
Copy link
Author

@kbarber Okay, I will remove inner thread exception handler from this PR.

Do you know any info on deadline and performance tests questions?

@kbarber
Copy link
Contributor

kbarber commented Jan 5, 2015

@DemonShi I've been absent and missed your last question: I have no specific deadline on this one. In regards to performance testing, we have nothing specific for queries today that we use, its usually bespoke - one could use an external testing tool, or do the test in clojure without the networking traffic so you can iterate in a tight loop. I'm not sure if this change needs something heavier than this today, its easy enough to observe a potential change and conjecture as well :-).

@kbarber kbarber added the work in progress (...and please don't merge) label Jan 5, 2015
@nikitiuk0
Copy link
Author

@kbarber Oh, I've been pretty busy and also had pretty long holidays here too.

According to the PR: I developed a solution that tries to take a first item from result-set lazy sequence under try/catch and resolves a promise with nil. If there is an exception, I resolve that promise with exception from DB.
Then from the "ring" thread we can wait until promise is ready (basically, that is the moment right after we took a first item from result-set or nothing - when result-set is empty) and say whether query was successful or not. If query failed - we re-throw the exception and follow normal error processing (the same way as if there was a count-query). If query is okay - continue execution.

This approach also allows to handle all other types of exceptions that occur in separate thread during query execution. Those currently result in empty 200 response.

Here is code snippet just to get your opinion of the approach:

(jdbc/with-transacted-connection db
      (let [{[sql & params] :results-query
             count-query :count-query
             projections :projections} (query->sql version query
                                                   paging-options)
             resp-error (promise)
             resp (output-fn
                   (fn [f]
                     (try
                       (jdbc/with-transacted-connection db
                         (query/streamed-query-result version sql params
                           (comp
                             f
                             #(do
                                (first %)
                                (deliver resp-error nil)
                                %)
                             (munge-fn version projections))))
                       (catch Exception e
                         (deliver resp-error e)
                         (throw e)))))]
        (when @resp-error
          (println (str "SQL error occured: " (.getMessage @resp-error)))
          (throw @resp-error))
        (if count-query
          (http/add-headers resp {:count (jdbc/get-result-count count-query)})
          resp)))))

I still need to refactor it a bit, check corner cases and cover it with tests, but the idea of the approach should be clear.

What do you think?

@puppetcla
Copy link

Waiting for CLA signature by @DemonShi

@DemonShi - We require a Contributor License Agreement (CLA) for people who contribute to Puppet, but we have an easy click-through license with instructions, which is available at https://cla.puppetlabs.com/

Note: if your contribution is trivial and you think it may be exempt from the CLA, please post a short reply to this comment with details. http://docs.puppetlabs.com/community/trivial_patch_exemption.html

@nikitiuk0
Copy link
Author

@puppetcla, I signed CLA

@puppetcla
Copy link

CLA signed by all contributors.

@kbarber
Copy link
Contributor

kbarber commented Jan 12, 2015

@DemonShi I think this sounds interesting, it looks to me like this should be okay in theory - and it looks like this is only done on the first item in the response only which is great. As long as it doesn't try to realize the lazy-seq at any point (ie. it keeps it as a lazy-seq and just does this on the first item for example) then it should be fine.

…valid regexp exception handler which returns 4xx status code. Covered with tests.
@nikitiuk0
Copy link
Author

@pljenkinsro test this please

@nikitiuk0
Copy link
Author

I implemented the approach discussed above. Program evaluates first element from lazy-seq forcing statement execution and catching any SQL exceptions which can be handled later.
Please take a look at my above comment for more detailed description.

Couple of questions:

  1. Should I write http tests for each endpoint that supports regexps?
  2. Should I combine my main commit with travis-related commit?
  3. I cannot remove WIP label myself, can I?

@kbarber kbarber removed the work in progress (...and please don't merge) label Feb 3, 2015
@kbarber
Copy link
Contributor

kbarber commented Feb 3, 2015

@pljenkinsro retest this please

@kbarber
Copy link
Contributor

kbarber commented Feb 3, 2015

@DemonShi to answer your questions:

  1. no, this is enough
  2. no, this is fine for now
  3. yes, well - you should be able to? No idea, I can look into github permissions.

@pljenkinsro
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://jenkins.puppetlabs.com/job/platform_puppetdb_intn-sys_pr/547/

@kbarber
Copy link
Contributor

kbarber commented Feb 3, 2015

thats a PASS, only a rubygems transient is being thrown ^^.

@kbarber
Copy link
Contributor

kbarber commented Feb 3, 2015

Merged manually, with some small cleanups and the extra tests.

@kbarber kbarber closed this Feb 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants