Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to support 5.4 #1321

Merged
merged 16 commits into from
Jun 13, 2017
Merged

Update to support 5.4 #1321

merged 16 commits into from
Jun 13, 2017

Conversation

mtdavidson
Copy link
Contributor

  • Update test container to use 5.4.1
  • Fix multi search tests to pass since DFS_QUERY_AND_FETCH / QUERY_AND_FETCH have been removed in 5.3.0 ( https://www.elastic.co/guide/en/elasticsearch/reference/5.3/release-notes-5.3.0.html )
  • OPTION_SEARCH_TYPE_DFS_QUERY_AND_FETCH and OPTION_SEARCH_TYPE_QUERY_AND_FETCH have been removed from Elastica\Search. We could keep these to allow for their use in 5.2 and below but I don't think they are highly used so maintaining the backwards compatibility is not key.

@mtdavidson mtdavidson changed the title Update to support 5.4 [WIP] Update to support 5.4 Jun 4, 2017
@ruflin
Copy link
Owner

ruflin commented Jun 6, 2017

@mtdavidson I restarted the build as the xdebug failures were not related to this change. But now there seem to be some other problems.

@mtdavidson
Copy link
Contributor Author

mtdavidson commented Jun 6, 2017

Thanks @ruflin I am working on the problems. Its a bit of a mess of issues, mainly due to content type checking. elastic/elasticsearch#23452

Since we set the default type to be application/jsonwe currently see issues anywhere we are using the body as part of the request such as with _analyze.

Changing the default content/type to text/plain solves ~99% of the issues.

diff --git a/lib/Elastica/Request.php b/lib/Elastica/Request.php
index d81702ea..33b2f1e7 100644
--- a/lib/Elastica/Request.php
+++ b/lib/Elastica/Request.php
@@ -15,7 +15,7 @@ class Request extends Param
     const PUT = 'PUT';
     const GET = 'GET';
     const DELETE = 'DELETE';
-    const DEFAULT_CONTENT_TYPE = 'application/json';
+    const DEFAULT_CONTENT_TYPE = 'text/plain';
     const NDJSON_CONTENT_TYPE = 'application/x-ndjson';

But I'm not convinced that is the best approach. The better approach would probably be to keep that default and start getting rid of any uses of setBody but thats going to result in breaking a lot of peoples code. So not sure what the best approach is.

@ruflin
Copy link
Owner

ruflin commented Jun 12, 2017

We introduced the content type here: #1302 Perhaps @nomoa can also comment here. Reverting this change will probably mean more deprecated errors logged on the elasticsearch 5.x side but will break it with 6.0

About setBody: Can you link to some code where this causes issues?

@nomoa
Copy link
Contributor

nomoa commented Jun 12, 2017

@mtdavidson when $data is an array, using text/plain is certainly wrong because the array is converted to a json body by the Transport classes. This will likely break in the near future.
The problem here (I think) is that we allow mixed type on $data (array or string).

  • When a client send a string it's ambiguous, the client should provide a content-type
  • When a client send an array it's not ambiguous, application/json must be used.

It's why I've set the default to application/json, using text/plain by default does not seem right to me since JSON is used by Transport.
Ideally we should force clients to override the content-type when not using arrays but I don't see a clean way to do this without introducing a breaking change. The data should not be separated from its content-type, in an ideal world we should introduce a Body class where clients could set the data and its content/type. Sadly it's huge refactoring.

More practically I'd be for tracking everywhere we send a string and specify the content/type to use on the call sites.
I'm not very familiar with elasticsearch-php but since we use AbstractEnpoint as a pivot class before we call Transport on Client::requestEndpoint(), I'd find a way to keep track of the content-type in AbstractEndpoint classes so that we could send it back to the Elastica Transport.

I quickly checked elasticsearch-php but could not find where Content-Type is handled... maybe with :
Index.php:

    public function analyze($text, $args = [])
    {
        $endpoint = new Analyze();
        $endpoint->setBody($text);
        $args['client']['headers'][] = 'Content-Type: text/plain';
        $endpoint->setParams($args);

        $data = $this->requestEndpoint($endpoint)->getData();

        // Support for "Explain" parameter, that returns a different response structure from Elastic
        // @see: https://www.elastic.co/guide/en/elasticsearch/reference/current/_explain_analyze.html
        if (isset($args['explain']) && $args['explain']) {
            return $data['detail'];
        }

        return $data['tokens'];
    }
}

Then in Client.php

    public function requestEndpoint(AbstractEndpoint $endpoint)
    {
        $options = $endpoint->getOptions();
        $contentType = Request::DEFAULT_CONTENT_TYPE;
        if(!empty($options['client']['headers'])) {
                // TODO: a method to find and extract the Content-Type header
                // from a list of headers
                $contentType = findContentType($options['client']['headers']);
        }
        return $this->request(
            ltrim($endpoint->getURI(), '/'),
            $endpoint->getMethod(),
            null === $endpoint->getBody() ? [] : $endpoint->getBody(),
            $endpoint->getParams(),
            $contentType
        );
    }

I'm not sure that this option (client/headers) is compatible with elasticsearch-php but this maybe works?

@ruflin
Copy link
Owner

ruflin commented Jun 13, 2017

@nomoa Thanks a lot for investigating this and sharing the details.

One thing that is not clear to me yet is how many place do we have where Elastica sends a string?

@nomoa
Copy link
Contributor

nomoa commented Jun 13, 2017

I don't think we have many of them, elasticsearch REST endpoints that accept non-json are rare:

My feeling is that elastic wants to deprecate and remove all these ambiguities by accepting only bodies that are in a known XContent.

So one easy fix I haven't thought yesterday would be to always send an array so that it always matches content-type/json (except in the case of bulk where it's properly handled with a custom content/type)
I think it's possible in most of the cases, if not overriding content-type would be the solution.

@mtdavidson would you mind testing if changing Index::analyze to :

     public function analyze($text, $args = [])
     {
         $endpoint = new Analyze();
         $endpoint->setBody(['text' => $text]); // use an array instead of a plain string
         $endpoint->setParams($args);
// [...]

fixes the issue without changing the default content/type to text/plain

@mtdavidson
Copy link
Contributor Author

@nomoa Sure thing I'll give it a go.

@mtdavidson
Copy link
Contributor Author

mtdavidson commented Jun 13, 2017

That does work @nomoa

However does mean people will have to change anywhere they have called setBody directly like

public function testRequestEndpoint()
{
     ...
    $endpoint = new Analyze();
    $endpoint->setIndex('fooIndex');
    $endpoint->setBody('foo');
    ...
}

might just have to accept its going to be a bit of a breaking upgrade on a few things.

@ruflin

To answer your previous question at least from a test perspective not to many things that use a text body these are the ones we see errors with

There were 8 errors:

1) Elastica\Test\IndexTest::testAnalyze
Elastica\Exception\ResponseException: Failed to parse request body

2) Elastica\Test\IndexTest::testRequestEndpoint
Elastica\Exception\ResponseException: Failed to parse request body

3) Elastica\Test\IndexTest::testAnalyzeExplain
Elastica\Exception\ResponseException: Failed to parse request body

4) Elastica\Test\Query\MultiMatchTest::testZeroTerm
Elastica\Exception\Connection\HttpException: Operation timed out

5) Elastica\Test\ScrollTest::testForeach
Elastica\Exception\ResponseException: Failed to parse request body

6) Elastica\Test\ScrollTest::testSearchRevert
Elastica\Exception\ResponseException: Failed to parse request body

7) Elastica\Test\SearchTest::testSearchScrollRequest
Elastica\Exception\ResponseException: Failed to parse request body

8) Elastica\Test\Tool\CrossIndexTest::testReindexTypeOption
Elastica\Exception\ResponseException: Failed to parse request body

@nomoa
Copy link
Contributor

nomoa commented Jun 13, 2017

@mtdavidson thanks!
I think that the use of elasticsearch-php Enpoints is pretty new in Elastica so I don't think Elastica users started to use them directly.

Note that the issues related to Scroll have just been fixed. You might just merge master again to your PR.

@mtdavidson
Copy link
Contributor Author

Great call on the merge @nomoa that and the change to use setBody(['text' => $text]) gets us to green.

@ruflin
Copy link
Owner

ruflin commented Jun 13, 2017

+1 on moving away on using strings in all places. I also assume endpoints so far are rarely used and the above is more a "bug" on our implementation side.

Anyone wants to do a follow up PR to fix the other "problems"?

@ruflin ruflin merged commit 302b9da into ruflin:master Jun 13, 2017
@ruflin
Copy link
Owner

ruflin commented Jun 13, 2017

@mtdavidson Merged. Thanks a lot for the work on this one.

@mtdavidson mtdavidson changed the title [WIP] Update to support 5.4 Update to support 5.4 Jun 13, 2017
@mtdavidson
Copy link
Contributor Author

No problem glad we got there in the end with a good solution thanks to @nomoa

I can get back to adding Adjacency Matrix Aggregation which was the main point of the upgrade.

@mickaelandrieu
Copy link

Hello,

big thanks for this contribution @mtdavidson!

@ruflin do you plan to publish a new release - including the support of ES 5.4 - soon? I know you release often, this is why I'm wondering if I can rely on master branch right now and use ES 5.4 in production :)

Have a nice day

Mickaël

@ruflin
Copy link
Owner

ruflin commented Jun 19, 2017

@mickaelandrieu Not sure what statement I should make here ;-) I do plan a release in the near future but obviously can't guarantee that it will not contain any breaking changes. As there will be some breaking changes for 6.0 I probably will branch off before we introduce the first breaking change. You should be save with master until a 5.3 or 5.4 release is out, but obviously at your own risk ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants