Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate out handling of Solr timeout errors with it's own exception class #2597

Merged
merged 5 commits into from
Feb 9, 2022

Conversation

jrochkind
Copy link
Member

@jrochkind jrochkind commented Dec 15, 2021

context

Almost any RSolr exception is wrapped by Blacklight in a Blacklight::Exceptions::InvalidRequest

A Blacklight Catalog controller rescues this exception with a method that just re-displays the search page, with the message "Sorry, I don't understand your search."

"Connection refused" errors are handled differently -- which is good, it would make no sense to prompt the user to search again with "Sorry, I don't understand your search" to network errors like connection refused.

What about a timeout error, where Solr did not return in a response within the timeout value RSolr had configured? Prior to this PR, that was handled the same as any Blacklight::Exceptions::InvalidRequest -- redisplay search page to the user, with message "Sorry, I don't understand your search."

That is not a desirable UX, and also causes Solr timeout errors to generally not make it to many error monitoring and logging systems, since Blacklight rescues them and displays a 200.

change

With the release of RSolr 2.4.0, RSolr uses a specific exception class to raise timeout errors, so Blacklight can rescue them and handle them differently.

After this PR, they are handled analogously to connection refused errors, with a custom exception that won't be rescued as a Blacklight::Exceptions::InvalidRequest.

a defined? check is done so Blacklight can still work with earlier versions of RSolr, and just won't do this new rescue.

weird code in test

RSolr::Error api has some really rough edges, which requires some weirdness in setting up the test.

to be done in future

I formatted the message of this exception just like the existing Blacklight::Exceptions::ECONNREFUSED... even though that formatting isn't necessarily so useful, there's a lot of stuff in there.

But I opted to do it consistently.

…class

== context ==

Almost any RSolr exception is wrapped by Blacklight in a `Blacklight::Exceptions::InvalidRequest`

A Blacklight Catalog controller [rescues](https://github.com/projectblacklight/blacklight/blob/main/app/controllers/concerns/blacklight/catalog.rb#L21) this exception with a [method](https://github.com/projectblacklight/blacklight/blob/main/app/controllers/concerns/blacklight/catalog.rb#L282-L299) that just re-displays the search page, with the message "Sorry, I don't understand your search."

"Connection refused" errors are handled differently -- which is good, it would make no sense to prompt the user to search again with "Sorry, I don't understand your search" to network errors like connection refused.

What about a timeout error, where Solr did not return in a response within the timeout value RSolr had configured?  Prior to this PR, that was handled the same as any `Blacklight::Exceptions::InvalidRequest` -- redisplay search page to the user, with message "Sorry, I don't understand your search."

That is not a desirable UX, and also causes Solr timeout errors to generally not make it to many error monitoring and logging systems, since Blacklight rescues them and displays a 200.

== change ==

With the release of RSolr 2.4.0, RSolr uses a specific exception class to raise timeout errors, so Blacklight can rescue them and handle them differently.

After this PR, they are handled analogously to connection refused errors, with a custom exception that won't be rescued as a `Blacklight::Exceptions::InvalidRequest`.

a `defined?` check is done so Blacklight can still work with earlier versions of RSolr, and just won't do this new rescue.

== weird code in test ==

RSolr::Error api has some really rough edges, which requires some weirdness in setting up the test.

== to be done in future ==

I formatted the message of this exception just like the existing `Blacklight::Exceptions::ECONNREFUSED`... even though that formatting isn't necessarily so useful, there's a lot of stuff in there.

But I opted to do it consistently.
@jrochkind
Copy link
Member Author

I don't know what to do about the linter failure. Any advice?

lib/blacklight/solr/repository.rb:60:5: C: Metrics/AbcSize: Assignment Branch Condition size for send_and_receive is too high. [<6, 43, 10> 44.55/43]
def send_and_receive(path, solr_params = {}) ...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The defined? check must be what's making it decide the complexity measurement is too high. That's what's necessary to only put in this code if we have a recent enough RSolr for the exception class.

I guess I could extract the thing to like:

def rsolr_timeout_exceptions
  [(defined?(RSolr::Error::Timeout) ? RSolr::Error::Timeout : nil)].compact
end

Does that improve readability? Should I do that?

@barmintor
Copy link
Contributor

that is pretty hairy, although I'm hard-pressed to think of an alternative. Moving it to a method would probably work, but I wonder whether it's worth just bumping the rsolr dependency?

@jrochkind
Copy link
Member Author

Thanks for the comments, @barmintor!

My understanding is that there are still people using Blacklight with Rsolr 1.x -- which it currently works with. Is it a problem if we force them to upgrade to RSolr 2.x? I'm not sure. RSolr 2.x introduces faraday for the first time, so would introduce that dependency into an app. But latest RSolr currently allows faraday 2.x as well as 1.x (it's still possible there are problems we haven't found yet with faraday 2.x).

To avoid this conditional would mean bumping the RSolr dependency all the way to minimum RSolr 2.4 though. Is that a problem? I don't know.

With something as "legacy" as Blacklight, where an install base has built up over years, including at sites we don't necessarily hear from a lot or whom might have trouble dealing with anything that does go wrong -- I tend to be ultra-conservative with backwards compat, including bumping intermediate dependency requirements like this.

The other option of course is deciding that this PR is solving a problem that does not need solving, and no change is needed. That's part of a larger conversation about how I think Blacklgiht's current behavior of swallowing all RSolr errors does not serve us well generally... but this was one thing specific thing I encountered that caused me big problems, that could be specifically fixed, that I had spent a lot of time figuring out and fixing and had hoped to be able to spare other users that pain.

@@ -384,6 +384,15 @@
expect { service.repository.search }.to raise_exception(/Unable to connect to Solr instance/)
end

it "raises a Blacklight exception if RSolr raises a timeout error connecting to Solr instance" do
# rsolr api is mess, let's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what I was thinking a month ago. Let's mock a timeout to give us a way to test it? Or shall I just delete this comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just remove it.

@barmintor
Copy link
Contributor

I totally agree re: backwards compatibility @jrochkind - in this case I'm not sure, tho, that the intersection of sites running on current Blacklight but requiring rsolr 1.x could be very high? Like you, I don't know - but it might be worth asking. The easier punt here is to move that logic to a well-commented method (including an rsolr version note), as you suggested.

@jrochkind
Copy link
Member Author

jrochkind commented Jan 13, 2022

I know one person I was talking on Slack was using RSolr 1 with Blacklight 7(!), I forget who though (it wasn't you again @barmintor ? :) )

I can move it to a separate method to make rubocop happy; my concern was it would actually make the code harder, not easier, to read, to do that, and I hate making code harder to read for rubocop rules, it seems counter-productive. But if others think it would be the same or easlier, no problem!

@barmintor
Copy link
Contributor

I can move it to a separate method to make rubocop happy; my concern was it would actually make the code harder, not easier, to read, to do that, and I hate making code harder to read for rubocop rules, it seems counter-productive.

I think I'm personally in the "same or easier" camp on this one - this:

rescue *[(defined?(RSolr::Error::Timeout) ? RSolr::Error::Timeout : nil)].compact => e

... is just not going to be an easy read, even if it's the best implementation. Moving into a method would let you at least call the method defined_rsolr_timeout_exceptions, which would at least prompt the reader. Would you like a PR to your PR?

@jrochkind
Copy link
Member Author

jrochkind commented Jan 13, 2022

Thank you for the offer, but I can just quick do that change now, I have a minute!

(If we were going to do that, a direct edit on the branch would probably be a better way to collaborate than a literal PR to PR?)

… method

Thanks @barmintor, I think you're right this is more clear.
@barmintor
Copy link
Contributor

Well, that linter failure feels like a betrayal. For what it's worth @jrochkind I feel like that commit you added makes things easier to parse.

We should have been able to should have been able to see the linter issue before the push - I am noticing just as far as execution branching that the references to BLACKLIGHT_VERBOSE_LOGGING (which are not in the scope of this PR) might be adding some cost here as well, and that constant appears to only be referred to here, and it also appears to duplicate the functions of calling the logger's debug method. Should I see what the linter says about this PR with that change, and open a separate PR if it would unblock this?

@jrochkind
Copy link
Member Author

Yeah I agree that was a good change. And I hate fighting with rubocop like this.

Just adding the additional rescue alone is enough to maek rubocop say "Assignment Branch Condition size" is too big, is that it? Cause that's the only change I made to the method now, right?

But an additional rescue is not a problem.

At this point, @barmintor or anyone else please feel free to commit to the branch directly to try to make rubocop happy and/or tell rubocop to ignore it (which I don't know how to do but think there's a way?), whatever is appropriate, I don't know.

@barmintor
Copy link
Contributor

Let me see what I can do.

@barmintor
Copy link
Contributor

Ok, the particular linter rule we're hitting here is Metrics/ABCSize (link), (original paper)

I'm tuning the limit for the linter to enforce down to 20 to get feedback here, these scores are typically set to a max of 43. I'm getting the metric score by running rubocop lib/blacklight/solr/repository.rb.

Before this change, the metric was [<5, 38, 9> 39.37/20]
After the change, the metric was [<6, 43, 9> 44.34/20]

The tuple indicates the component scores <Assignment, Branching, Conditionals>. So the branching score (which was already pretty high) is the problem.

Since the point of this change is to add some branching, the best route to remediating this is to refactor some other branching. I think the most likely candidate is the way the actual rsolr request is build and connection.send_and_receive is called. I'm going to add a commit that pulls that out into a private method.

@barmintor
Copy link
Contributor

It looks like that change decreases the metric score significantly: [<5, 23, 5> 24.06/20]

I'm adding some comments and running the CI suite.

@jrochkind
Copy link
Member Author

jrochkind commented Jan 13, 2022

It seems inappropriate to me to put unrelated logic into this PR in order to make rubocop happy, no? It seems inappropriate to me to say that adding another rescue clause makes things "too complicated", rescue clauses are meant to be stacked and do not inhibit readability in my mind, I think the rubocop analyzer is wrong.

But I'll defer to y'all, I'm kind of overwhelmed by it all at this point.

@barmintor
Copy link
Contributor

I try not to think of it as putting unrelated logic into this PR "for the linter", but rather that a reasonable PR has prompted the linter to flag a previously existing problem with the code that we should fix. Because I think the linter is an important safeguard here and I think you have a good change, I'm going to propose a fix to the problem. I'm making progress!

@barmintor barmintor force-pushed the separate_out_solr_timeout_errors branch from e911c69 to a5395f6 Compare January 13, 2022 20:28
@barmintor
Copy link
Contributor

😆 and it looks like my version of the linting rules is out of date!

@barmintor
Copy link
Contributor

@jcoyne is this looking ok to you? I could do some git surgery to put the ABC refactor at the front, so the commits are all separately green... but maybe it's good enough?

Copy link
Contributor

@barmintor barmintor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, but as I've contributed to the PR I'd like @jcoyne to give it a 👍 before merging.

@barmintor barmintor requested a review from jcoyne January 13, 2022 23:31
Copy link
Member

@tpendragon tpendragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. In the future PRs reviewed and supported by additional code in PRs by committers will be able to be merged by said committer, for maximum welcoming.

@tpendragon tpendragon merged commit 3b36a7a into main Feb 9, 2022
@tpendragon tpendragon deleted the separate_out_solr_timeout_errors branch February 9, 2022 20:29
@jrochkind
Copy link
Member Author

Thank you @tpendragon !

@cbeer cbeer added the Backport label Feb 25, 2022
jcoyne added a commit that referenced this pull request Feb 25, 2022
Backport #2597 (Handle RSolr timeouts with its own class)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants