Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change search to use POST instead of GET to accommodate long queries #121

Closed
wants to merge 14 commits into from

Conversation

cbollerud
Copy link

HTTP POST for search(select) #110

The code below switches searches to use POST instead of GET, allowing for very long query strings.
It should be trivial to add a switch in search that would allow choosing either option based on either query length or configuration.

@csabapalfi
Copy link
Contributor

I'm not the author of the module but a few things:

  • it can be dangerous to switch to POSTs without a flag/switch as some people might rely on GETs that'll result in their queries included in access logs, etc
  • the build is likely failing because the Solr the tests are connecting to is down (I think @lbdremy is looking into it)

@cbollerud
Copy link
Author

Agreed. I'll update to make GET the default.

@cbollerud
Copy link
Author

I didn't see anyplace to document the new config.json parameter. if "searchMethod": "POST" is added to the the options passed into Client, then searches will use POST, otherwise they will use "GET".
For example:
{
"client": {
"host": "127.0.0.1",
"port": 8983,
"path": "/solr",
"core": "default",
"searchMethod": "POST"
}
}

I duplicated the existing search tests so that both GET and POST are tested.
The tests all pass locally, it seems all tests are failing on Travis, most likely due to SOLR being down on the test machine.

@cbollerud
Copy link
Author

@lbdremy,

Is there anything else you need to complete the pull?

Chris

@kfitzgerald
Copy link
Contributor

Didn't see this until I already rolled my own. Submitted as PR #129

Guessing that since this has been sitting on ice for a few months there is no plan on supporting this any time soon? Side note: there are now three PRs requesting this feature.

Also, regarding GET by default, POST if you want to – I would solidly advocate going the other way around. The folks who are actually interested in logging GETS (i would guess) are substantially less than those who want a reliable search queries.

@cbollerud
Copy link
Author

I’d really like to see one of these pulled into the main branch, it’s definitely a need for us. I do think that the default needs to be GET to ensure backward compatibility. I put a switch in PR #121 to allow using POST for those who need it.

Chris

On Apr 17, 2015, at 2:49 PM, Kevin M Fitzgerald notifications@github.com wrote:

Didn't see this until I already rolled my own. Submitted as PR #129 #129
Guessing that since this has been sitting on ice for a few months there is no plan on supporting this any time soon? Side note: there are now three PRs requesting this feature.

Also, regarding GET by default, POST if you want to – I would solidly advocate going the other way around. The folks who are actually interested in logging GETS (i would guess) are substantially less than those who want a reliable search queries.


Reply to this email directly or view it on GitHub #121 (comment).

@kfitzgerald
Copy link
Contributor

@cbollerud I don't dislike the backwards compatibility, however I think folks are more likely to discover (the hard way) that things randomly stop working when a large query shows up if GET is used by default over POST. Everything maybe working fine until those one or two queries don't go through, and depending on the application, might not get any indication that something went wrong.

There's always the possibility of a hybrid automatic approach, that would check the query to see if it's longer than the standard length (say 2048 chars) then automatically use POST instead of GET so prevent a likely failure. Would be pretty easy to do.

Regardless, I would love to see search via POST in master in any form I can get it! :D

@cbollerud
Copy link
Author

I suppose using GET until the search string is over a particular length and then automatically switching to POST could be considered backward compatible. Might be nice to keep the switch as an option as well. I’m fine either way, @lbdremy can make that call. I just would like to see a solution in master.

@lbdremy, can you give some feedback? It sounds like Kevin or I could complete the update if that is needed.

Chris

On Apr 20, 2015, at 6:36 AM, Kevin M Fitzgerald notifications@github.com wrote:

@cbollerud https://github.com/cbollerud I don't dislike the backwards compatibility, however I think folks are more likely to discover (the hard way) that things randomly stop working when a large query shows up if GET is used by default over POST. Everything maybe working fine until those one or two queries don't go through, and depending on the application, might not get any indication that something went wrong.

There's always the possibility of a hybrid automatic approach, that would check the query to see if it's longer than the standard length (say 2048 chars) then automatically use POST instead of GET so prevent a likely failure. Would be pretty easy to do.

Regardless, I would love to see search via POST in master in any form I can get it! :D


Reply to this email directly or view it on GitHub #121 (comment).

@kfitzgerald
Copy link
Contributor

@cbollerud Just for fun, I rolled that hybrid change up on my branch (#129). Tested on our code base and seems to be working great. Added a configurable option get_max to set the threshold GETs are escalated to POSTs (default: 2048).

Should be the best of both worlds, while maintaining the set-it-and-forget-it attitude. :)

@cbollerud
Copy link
Author

Sounds Great!

On Apr 21, 2015, at 7:16 AM, Kevin M Fitzgerald notifications@github.com wrote:

@cbollerud https://github.com/cbollerud Just for fun, I rolled that hybrid change up on my branch (#129 #129). Tested on our code base and seems to be working great. Added a configurable option get_max to set the threshold GETs are escalated to POSTs (default: 2048).

Should be the best of both worlds, while maintaining the set-it-and-forget-it attitude. :)


Reply to this email directly or view it on GitHub #121 (comment).

@nicolasembleton
Copy link
Collaborator

Hi guys, I've ran into the same problem and of course used the same thing (but made it much more lightweight and too opinionated so I won't make a PR out of it, especially since you guys already made 2, and @csabapalfi 1st comment was to-the-point)
I'm not a maintainer (yet), but please allow me to give my feedbacks.

But that said, I've been looking at the code changes @cbollerud, and it includes a lot of things that I think shouldn't be included into this PR. Allow me to list them:

  • All the re-formatting is one, you're using a cleaner format (I use something close), but you shouldn't mix those.
  • You're changing some of the package versions, I don't think we should do that (unless necessary)
  • You changed format.escapeAndEncode to encodeURIComponent (which is I think a good choice) but could you document why? I don't think we should mix this PR with this change. This PR should really focus on allowing searching via POST
  • Finally, and more importantly, you're introducing breaking changes and changing postJSON to postDATA, while also changing the content-type and not defaulting it to application/json

To summarize, I think your PR is great, but it is including too many things and some design decisions that should be discussed before going into the PR. I am again not a maintainer (yet :)) but I'd suggest doing some clean-up and/or starting discussion about this postDATA thing. Which sound like it should be optional, opt-in and go along with a deprecation for postJSON if needed (but I think postJSON is just fine). Or you could just make postDATA = postJSON = function () and make both methods having the same code. Defaulting the content-type to application/json or to your new option if available, and setting the data to pass as .data if available or .json otherwise. To play better with the rest.

@nicolasembleton
Copy link
Collaborator

@kfitzgerald I also like your get_max (although it should probably be named get_max_request_entity_size and commented in the constructor). Could we get something with both of your work into 1 cleaner PR? I hope I'm not stepping out of my turf, but I think it'd be beneficial and easier to get the PR through.

@kfitzgerald
Copy link
Contributor

@nicolasembleton Updated PR #129 per your feedback, and passes CI tests.

@luketaverne
Copy link
Collaborator

@nicolasembleton @kfitzgerald, this can be closed, right? I think PR #129 implemented the same.

@nicolasembleton
Copy link
Collaborator

@luketaverne Correct. Closing as this was already handled by #129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants