Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java high-level REST client completeness #27205

Open
javanna opened this issue Nov 1, 2017 · 70 comments

Comments

Projects
None yet
@javanna
Copy link
Member

commented Nov 1, 2017

This is a meta issue to track completeness of the Java REST high-level Client in terms of supported API. The following list includes all the REST API that Elasticsearch exposes to date, and that are also exposed by the Transport Client. The ones marked as done are already supported by the high-level REST client, while the others need to be added. Every group is sorted based on an estimation around how important the API is, from more important to less important. Each API is also assigned a rank (easy, medium, hard) that expresses how difficult adding support for it is expected to be.

The API listed as "Not Required" won't need to be supported before the transport client is removed from the master branch (next major version). Such API are mainly administrative API that are not likely to be used from a Java application. They generally return heavy responses and make it hard to reuse response objects from the transport client as they expose internal objects that in some cases cannot even be parsed back entirely based on the information returned at REST. We considered returning those as maps of maps but that’s also easy to achieve using the low-level REST client hence we decided to not implement them for the time being.

Top-level APIs

  • ping (easy)
  • info (easy)
  • index (medium)
  • update (medium)
  • delete (medium)
  • bulk (hard)
  • get (medium)
  • exists (easy)
  • multi get (medium) #27337
  • search (very hard)
  • search scroll (easy)
  • clear scroll (easy)
  • multi search (hard) #27274
  • update by query (medium) @sohaibiftikhar
  • delete by query (medium) @sohaibiftikhar
  • reindex (medium) @sohaibiftikhar
  • [] reindex with wait_for_completion=false creates task @pgomulka
  • rethrottle (reindex, update by query, delete by query) #33951
  • search template (medium) #30473
  • render search template (easy) (included in search template API) #30473
  • multi search templates (medium) #30836
  • term vectors (hard) #33447
  • multi term vectors (hard) @mayya-sharipova #35266
  • explain (medium) #31387
  • field caps (easy) #29664
  • put stored script (easy) #31323
  • delete stored script (easy) #31355
  • get stored script (medium) #31355

Indices API

  • create index (easy)
  • delete index (easy)
  • indices exist (easy) #27384
  • update alias (medium) #27876
  • exists alias (easy) #28332
  • get alias (medium) #28799
  • [ ] types exist (easy)
  • put mapping (easy) #27869
  • open index (easy)
  • close index (easy)
  • refresh (easy) #27799
  • flush (easy) #28852
  • update index settings (easy) #28892
  • get index settings (easy) #29229
  • clear cache (easy) #28866
  • force merge (easy) #28896
  • shrink (easy) #28425
  • split (easy) #28425
  • rollover (easy) #28698
  • synced flush (medium) (exposes ShardRouting, hard to reconstruct the whole response from info returned via REST) #30650
  • get index (medium) #31703
  • get mappings (easy) #30889
  • get field mappings (medium) #31423
  • put index template (medium) #30400
  • delete index template (easy) #36320
  • get index templates (medium) #31161
  • validate query (medium) #31077
  • analyze (hard) #31577

Not required

  • shard stores (medium)
  • upgrade (easy) (to be removed?)
  • upgrade status (easy) (to be removed?)
  • segments (hard) (exposes ShardRouting)
  • recoveries (hard) (exposes ShardRouting, DiscoveryNode)
  • indices stats (hard) (exposes ShardRouting and a lot of other objects)

Snapshot API

Ingest API

  • put ingest pipeline (easy) #30793
  • delete ingest pipeline (easy) #30865
  • get ingest pipeline (easy) #30847
  • simulate ingest pipeline (medium) #31158

Tasks API

Cluster API

  • cluster health (medium) #29331
  • update cluster settings (easy) #28633
  • get cluster settings (medium) #31706 (doesn't have its own Response object, exposed at REST only)

Not required

  • search shards (medium) (exposes ShardRouting, DiscoveryNode and requires parsing back QueryBuilder)
  • pending cluster tasks (easy)
  • allocation explain (hard) (exposes ShardRouting)
  • cluster state (hard) (exposes ClusterState)
  • reroute (easy if done after cluster state API, returns the entire cluster state)
  • nodes info (hard) (exposes DiscoveryNode and a lot of other objects)
  • nodes stats (hard) (exposes ShardRouting and a lot of other objects)
  • cluster stats (hard) (exposes DiscoveryNode, requires nodes info and nodes stats)
  • hot threads (easy) (exposes DiscoveryNode)
  • nodes usage (medium) (exposes DiscoveryNode)

REST only API

There are a number of API that are exposed via REST but not via the Transport Client. They don't necessarily have to be implemented if the goal is feature parity with the Transport Client, yet we should probably have a look at why they were not added to the Transport Client and whether it makes sense to add their support to the high-level REST Client or not. I don't think it makes sense to add support for cat API and ingest processor grok, hence I took them out already.

  • cluster remote info
  • count #31868
  • get source
  • source exists #34519
  • delete alias @DaveCTurner
  • indices template exist @andyb-elastic (#36132)
  • get upgrade
  • ingest processor grok
  • cat API: aliases, allocation, count, fielddata, health, help, indices, master, nodeattrs, nodes, pending tasks, plugins, recovery, repositories, segments, shards, snapshots, tasks, templates, threadpool

How to add support for a new API

Look at some of the already supported API and existing PRs that have been merged:

  • Add Index API to High Level Rest Client (#23040)
  • Add BulkRequest support to High Level Rest client (#23312)
  • Add delete API to the High Level Rest Client (#23187)
  • Add UpdateRequest support to High Level Rest client (#23266)
  • Added Delete Index support to high-level REST client (#27019)

The common tasks in each of the above PRs are:

  • add fromXContent method to existing response class currently used by transport client and corresponding unit tests that make use of fields shuffling as well as random fields insertion (in order to test forward compatibility). That usually means adding a test for the response object that extends AbstractXContentTestCase where supportsUnknownFields() returns true as well as assertToXContentEquivalence. There are cases where we can't insert random fields everywhere, which then require to also override the getRandomFieldsExcludeFilter() method which returns path that should be excluded when injecting random fields. Given the randomizations applied, it makes sense to run this type of test locally with -Dtests.iters=50 argument just to make sure that it is consistently green.
  • add new method to Request class which translates the input request into the internal REST request representation that holds method, url, endpoint, params etc. and add corresponding tests to RequestTests
  • add new method to RestHighLevelClient, possibly also its async variant when it makes sense, we may not want to add async variants to every single method, so we decide case by case. The name of the new method must match what is defined in our REST spec including the namespace.
  • add integration test that extends ESRestHighLevelClientTestCase that tests the new method end-to-end by sending REST requests to an external cluster.
  • add docs page. To check how docs are rendered and whether the links between docs pages and docs snippets work ok, run the following command from the root of your local checkout of the Elasticsearch repository: /path/to/elastic/docs/build_docs.pl --doc docs/java-rest/index.asciidoc --chunk 1 --out ~/temp/asciidoc --open . This requires also a local checkout of the docs repository, where the perl script is located.

Relates #29827

@javanna

This comment has been minimized.

Copy link
Member Author

commented Nov 7, 2017

I updated the description of the issue by assigning each API a rank from 1 to 3 based on how difficult it should be to add support for it to the high-level REST client. Criterias were mainly how big the request is to serialize and how big the response is to parse back.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Nov 8, 2017

thanks @javanna - I've separated the APIs into "important" and "optional" lists, where optional APIs are ones that will seldom be used from applications other than monitoring applications or tests. If anybody disagrees with my selection, feel free to mention which APIs should be marked as important.

@slovdahl

This comment has been minimized.

Copy link

commented Nov 9, 2017

Not using the high-level REST client yet, but I would really have expected that multi-get was supported.

@hariso

This comment has been minimized.

Copy link
Contributor

commented Nov 10, 2017

@javanna This might be a bloody stupid question, but: In which way does someone pick up an API and starts working on it? Without risking that someone did the same.: )

@nik9000

This comment has been minimized.

Copy link
Contributor

commented Nov 10, 2017

@javanna This might be a bloody stupid question, but: In which way does someone pick up an API and starts working on it? Without risking that someone did the same.: )

You add a comment here saying you are working on it.

@hariso

This comment has been minimized.

Copy link
Contributor

commented Nov 10, 2017

Thanks @nik9000 !

@catalin-ursachi

This comment has been minimized.

Copy link
Contributor

commented Nov 11, 2017

I've picked up Create Index.

@hariso

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2017

I have picked up " indices exist".

@hariso

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2017

For questions related to code (how to run a test, which tests are (not) needed, do we need the async version of a method etc.) do I ask here, in a separate issue or the forums? Or something else? : )

@javanna

This comment has been minimized.

Copy link
Member Author

commented Nov 14, 2017

hi @hariso it depends on the question :) Probably better to open a PR even though it is work in progress, so we can discuss your questions there. Would that work for you?

@hariso

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2017

It definitely would. Thanks for the answer!

javanna added a commit that referenced this issue Dec 7, 2017

javanna added a commit that referenced this issue Dec 7, 2017

Add Open Index API to the high level REST client (#27574)
Add _open to the high level REST client

Relates to #27205

javanna added a commit that referenced this issue Dec 10, 2017

Add Open Index API to the high level REST client (#27574)
Add _open to the high level REST client

Relates to #27205
@hub-cap

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2018

im ++ for not throwing exceptions too. I did a random sampling of 4 things that throw 404's on the server if they are not found, and the results are almost all that they have a "exists" method of some sort. The alias one is just a bit different due to its API. Im sure there are cases where we throw an exception, but it seems that we should not be doing that.

There is a concept of a StatusToXContentObject which returns a RestStatus to the consumer. We have no standard on "what was the status of the call I made" in the codebase currently, so it might make sense to add one. Im keen to add something to the responses for this. We have roughly 15 responses that are a StatusToXContentObject, and bulk/index being some of those. These still rely on either 1) the status saved in the responses, or 2) some boolean used to say if its OK or NOT_FOUND. The latter is how get pipelines does it. The former is what is stored in get aliases response.

Im not keen on the null response. Id rather have someone do a if (status check) vs if (null check) but that could be because of my scala days. I do think an Optional would also work if we want to get a little functional, as @hariso mentions :)

My .02 would be to have either a way to say "isFound" as translated by some internal status code, or just save the rest status code internally and let the user reason about it. The former ensures we can say "well this non 200 status code is actually 'ok'", but I dont know if we have a reason for that. the latter gives the user the flexibility and foot gun. I would also be fine with an Optional.

The methods i checked

Alias - getAlias has a getException() which is validated against if there are any exceptions in the call.
Pipelines - get pipeline response is a StatusToXContentObject
Get - 404 if index not found, isExists (which is a bool set on the GetResult nested in the response set by ShardService) if doc not found
Delete watch - isFound, which is set directly by the transport action

kcm added a commit that referenced this issue Oct 30, 2018

HLRC API for _termvectors (#33447)
* HLRC API for _termvectors

relates to #27205

kcm added a commit that referenced this issue Oct 30, 2018

HLRC - add support for source exists API (#34519)
HLRC - add support for source exists API
API re-uses the GetRequest object (following the precedent set by the plain “exists” api).

Relates to #27205
@markharwood

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2018

@hub-cap One more for the list - the explain API uses the "isExists" approach too.

This looks to be where the "isExists or exception" design choice is forced. The ignores parameter can be used to declare 404 status codes are to be expected but the logic in this helper method uses the same responseConvertor.apply method for parsing both healthy responses and any "ignored" error codes - the same type of response object is returned. This steers us towards using a "FooResponse" object with an "isExists" property of some sort.
The alternative use of this method is to call without listing 404s in the ignores parameter in which case a more generic exception is thrown (ElasticsearchStatusException with status =404).

markharwood added a commit to markharwood/elasticsearch that referenced this issue Nov 1, 2018

HLRC support for getTask.
Given a GetTaskRequest the API returns an Optional which is empty in the case of 404s or returns a TaskInfo object if found.
Added Helper methods in RestHighLevelClient for returning empty Optionals when hitting 404s

Relates to elastic#27205

pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Nov 2, 2018

HLRC: reindex API with wait_for_completion false
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskID and results
can be queried with Task API

refers: elastic#27205
@markharwood

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2018

Perhaps another general "java convention" to consider @hub-cap.

How do we map potentially long-running wait_for_completion=false style REST APIs to our notion of sync and async Java calls?

I hit this trying to find a long-running task that could be used in my getTask tests. It looks like HLRC's reindex has been written without any support for returning task IDs. This means reindex and getTask can't be practically used together in HLRC. Reindex needs to find a way to offer more of the async features.
In discussions with @pgomulka we came up with this candidate general convention for mapping async REST apis to Java:

  • Foo syncFoo(...) and void asyncFoo(..., listener) would map to REST calls without wait_for_completion params (the majority of our existing APIs)
  • FooTask submitFooTask(...) would map to the REST equivalents with wait_for_completion set to false. It's a synchronous call to a REST api with async features.

Does this make sense? It probably applies to more than reindex

pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Nov 2, 2018

HLRC: reindex API with wait_for_completion false
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskID and results
can be queried with Task API

refers: elastic#27205

hub-cap added a commit that referenced this issue Nov 2, 2018

HLRC: Add document _count API (#34267)
Add `count()` api method, `CountRequest` and `CountResponse` classes to HLRC. Code in server module is unchanged.

Relates to #27205

pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Nov 2, 2018

HLRC: reindex API with wait_for_completion false
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskID and results
can be queried with Task API

refers: elastic#27205

hub-cap added a commit that referenced this issue Nov 2, 2018

HLRC: Add document _count API (#34267)
Add `count()` api method, `CountRequest` and `CountResponse` classes to HLRC. Code in server module is unchanged.

Relates to #27205
@hub-cap

This comment has been minimized.

Copy link
Contributor

commented Nov 5, 2018

@markharwood I think as mentioned above, I dont think throwing exceptions is the way to go, so I dont think we should be removing it from ignores. Just to reiterate the work I saw in your other review, #35166, I think the use of Optional works well here.

Also, I agree with @pgomulka and your assessment of sync/async/submit, :shipit:

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Nov 5, 2018

pgomulka added a commit that referenced this issue Nov 8, 2018

HLRC: reindex API with wait_for_completion false (#35202)
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskSubmissionResult with task identifier as string and results can be queried with Task API

refers: #27205

pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Nov 13, 2018

HLRC: reindex API with wait_for_completion false (elastic#35202)
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskSubmissionResult with task identifier as string and results can be queried with Task API

refers: elastic#27205

pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Nov 14, 2018

HLRC: reindex API with wait_for_completion false (elastic#35202)
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskSubmissionResult with task identifier as string and results can be queried with Task API

refers: elastic#27205

pgomulka added a commit that referenced this issue Nov 14, 2018

HLRC: reindex API with wait_for_completion false
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskSubmissionResult with task identifier as string and results can be queried with Task API

refers: #27205
Original PR against master #35202 
Back Port PR #35527

mayya-sharipova added a commit that referenced this issue Nov 19, 2018

mayya-sharipova added a commit that referenced this issue Nov 19, 2018

dnhatn added a commit that referenced this issue Dec 7, 2018

dnhatn added a commit that referenced this issue Dec 7, 2018

@utkarsh4G

This comment was marked as off-topic.

Copy link

commented Dec 11, 2018

Updating elastic to 5.x from 2.x needs shield security to be replaced with x-pack security and java Transport client with java Rest-High-Level client/Low-Level-Client.
Where can i find the information on how to use xpack with elasticsearch java rest client for version 5.6.2 and If no such information is present then how do i do it? Plzz help :)

@tvernum

This comment has been minimized.

Copy link
Contributor

commented Dec 11, 2018

@utkarsh4G Could you please ask this question on our discuss forum
Elastic uses GitHub issues for tracking work that needs to be undertaken, such as bugs and feature requests, and we use the forums for questions such as yours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.