Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMILE support for the _bulk API #94319

Open
rockdaboot opened this issue Mar 6, 2023 · 25 comments
Open

SMILE support for the _bulk API #94319

rockdaboot opened this issue Mar 6, 2023 · 25 comments
Labels
:Core/Infra/REST API REST infrastructure and utilities >docs General docs changes >enhancement Team:Core/Infra Meta label for core/infra team Team:Docs Meta label for docs team

Comments

@rockdaboot
Copy link
Contributor

Description

In Universal Profiling we store lots of documents in a short time with many binary values.
The _bulk API currently only allows JSON (ndjson) format which forces us to convert the binary values into base64.
Using the SMILE format instead of JSON is less overhead in terms of CPU, allocated objects (GC pressure) and bytes output. Less bytes also means less CPU overhead when pipelining the data through a (gzip) compressor before it is sent.

It is somewhat hard to understand why almost every ES API supports SMILE while the _bulk API does not. Users of this API may benefit the most from not using JSON.

We really would like to switch away from using JSON to using SMILE. The _bulk API not allowing this is a blocker.

@rockdaboot rockdaboot added >enhancement needs:triage Requires assignment of a team area label labels Mar 6, 2023
@DaveCTurner
Copy link
Contributor

I think this is a docs issue: although it is not mentioned in the manual, the _bulk API does support SMILE format input. More precisely, it accepts a sequence of SMILE docs that follow the same structure as the NDJSON endpoint, using an 0xff byte as the separator between docs rather than 0x0a.

It doesn't support CBOR because there's no good choice for a separator byte there - see #8481.

@DaveCTurner DaveCTurner added >docs General docs changes :Core/Infra/REST API REST infrastructure and utilities and removed needs:triage Requires assignment of a team area label labels Mar 6, 2023
@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team Team:Docs Meta label for docs team labels Mar 6, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@rockdaboot
Copy link
Contributor Author

More precisely, it accepts a sequence of SMILE docs that follow the same structure as the NDJSON endpoint, using an 0xff byte as the separator between docs rather than 0x0a

That's good news 😃 . I'll test it likely during this week and will report back.

@DaveCTurner
Copy link
Contributor

Sounds good.

I did a bit of digging and I'm pretty sure that this behaviour is covered by the test suite, for instance this line means we'll randomly use SMILE in _bulk calls made by the YAML test suite:

private static final XContentType[] STREAMING_CONTENT_TYPES = new XContentType[] { XContentType.JSON, XContentType.SMILE };

That's not to say that we are free from test bugs ofc, just to say that this is something we genuinely support and should be covered in the manual.

@iverase
Copy link
Contributor

iverase commented Mar 6, 2023

I think it works as far as you don't have binary fields. In that case the binary blob can contain 0xff and it breaks the format. (which is a bit awkward as it is a binary format that does not fully support binary fields)

@rockdaboot
Copy link
Contributor Author

@iverase For SMILE, this can't happen. A binary field has a preamble (1 byte), a length (varint) and the data (any bytes).

@iverase
Copy link
Contributor

iverase commented Mar 6, 2023

@rockdaboot I am pretty sure our bulk parser does not look into the data but just looks for 0xff markers to split up the documents bytes.

FYI: I tried to use smile to encode complex geo data using WKB which is a binary format and it did not work.

@iverase
Copy link
Contributor

iverase commented Mar 6, 2023

Here is the magic:

private static int findNextMarker(byte marker, int from, BytesReference data) {

I did hack a way to support it so if this is something you need I can resurrect the code.

@rockdaboot
Copy link
Contributor Author

I did hack a way to support it so if this is something you need I can resurrect the code.

That would be awesome, as our binary data contains 0xff with the same probability as any other byte value. Do you think we can get a fix for the 8.8 release ? (Just for planning)

@iverase
Copy link
Contributor

iverase commented Mar 6, 2023

Do you think we can get a fix for the 8.8 release ? (Just for planning)

I will try to put a PR in the next few days. We will know then if it is realistic.

@iverase
Copy link
Contributor

iverase commented Mar 7, 2023

I reviewed what I did long time ago and I am afraid I don't have a straight forward solution here. I tried some horrible stuff like parsing the incoming bytes to make sure we only build valid documents but the code relies in catching exceptions and in general not something we would want in the code base.

I remembered that the reason we don't support this is in this line:

smileFactory.configure(SmileGenerator.Feature.ENCODE_BINARY_AS_7BIT, false);

We are explicitly removing the escaping of the stream separator. This makes sense because we use this format for performance sensitive actions and the overhead of escaping the separator is quite noticeable. I think the right solution would be to have a XContentType that scapes the character and can be used when building the bulk request. I don't know if this is possible.

@DaveCTurner
Copy link
Contributor

Ugh sorry @rockdaboot, I wasn't aware of this issue until Ignacio's comment.

Rather than escaping separators I think it would be better to move to supporting length-prefixed inputs somehow. This'd be immune to delimiter problems and would also allow us to slice up the input without needing to scan every byte. It's unclear to me whether that'd be better done by encoding the sequence of SMILE docs as an array of binary values within a single top-level SMILE doc, vs just doing the delimiting "by hand". Possibly we'd want some notion of length-prefixed chunking as well.

Unfortunately that's going to be a bunch of development work so either we'd be looking to your team to take this on (PRs welcome!) or else we'll need to find a spot for this work in a roadmap somewhere.

@rockdaboot
Copy link
Contributor Author

rockdaboot commented Mar 7, 2023

That is unfortunate, but many thanks for looking into this @DaveCTurner and @iverase !

Ideally, I'd like to have a (asynchronous) streaming API where the client receives responses while feeding requests towards ES. With HTTP/1.1, this requires parallel connections for each request (pipelining is a no-go with HTTP/1.1). HTTP/2 is designed for parallel requests, but sadly ES doesn't support it (see #10981).

It's unclear to me whether that'd be better done by encoding the sequence of SMILE docs as an array of binary values within a single top-level SMILE doc, vs just doing the delimiting "by hand"

If I understand correctly, in both cases, the receiver has to parse all the data to separate the docs. The main question for me is how to make maximal use of shared strings. Just guessing, but maybe having the docs in an array in a top-level object is what makes more sense in regards to shared strings !?

Unfortunately that's going to be a bunch of development work so either we'd be looking to your team to take this on (PRs welcome!) or else we'll need to find a spot for this work in a roadmap somewhere.

I wish we could pick this up, but the team is currently also a bit under water. Well, there are lot's of tasks with higher priority, I'd say.

While I'll fall back to using JSON encoding, can we leave this open in the hope that someone will pick it up ?

@DaveCTurner
Copy link
Contributor

With HTTP/1.1, this requires parallel connections for each request

So yes we expect clients to use an appropriate number of parallel connections (which is kind of why we don't see a great reason to add support for HTTP/2), but ...

(pipelining is a no-go with HTTP/1.1)

... why do you say this?

The main question for me is how to make maximal use of shared strings. Just guessing, but maybe having the docs in an array in a top-level object is what makes more sense in regards to shared strings !?

Yes I think cross-doc string sharing would require the whole request to be a single SMILE entity indeed. That sounds possible, although it would involve the coordinating node doing a bunch of extra work to parse the request body and re-serialize the individual docs (i.e. de-sharing the strings) for the shard-level indexing operations. I wonder whether we'd get the same effect by compressing the whole request?

can we leave this open in the hope that someone will pick it up ?

We can, but I doubt it'll see much action any time soon without some PM-level discussions to work out how to fit it into the roadmap.

@rockdaboot
Copy link
Contributor Author

rockdaboot commented Mar 7, 2023

With HTTP/1.1, this requires parallel connections for each request

So yes we expect clients to use an appropriate number of parallel connections (which is kind of why we don't see a great reason to add support for HTTP/2), but ...

(pipelining is a no-go with HTTP/1.1)

... why do you say this?

HTTP pipelining is a feature that works on a single connection. It is very fragile and hardly supported. That's why browsers dropped support for it. A good summary can be read at https://daniel.haxx.se/blog/2019/04/06/curl-says-bye-bye-to-pipelining/ .

You say there is not problem in opening (hundreds of) parallel HTTP/1.1 connections per client. Hm, I am maybe a bit conservative, but TCP/IP ports are a limited resource. That's why HTTP/2 allows parallel connections on a single port. But it would be fine if we are the only network client doing this (probably we are).

The main question for me is how to make maximal use of shared strings. Just guessing, but maybe having the docs in an array in a top-level object is what makes more sense in regards to shared strings !?

Yes I think cross-doc string sharing would require the whole request to be a single SMILE entity indeed. That sounds possible, although it would involve the coordinating node doing a bunch of extra work to parse the request body and re-serialize the individual docs (i.e. de-sharing the strings) for the shard-level indexing operations. I wonder whether we'd get the same effect by compressing the whole request?

I think that works pretty well, especially when having brotli or zstd request compression ;-)
Gzip either doesn't compress so well or eats huge amounts of CPU.

@rockdaboot
Copy link
Contributor Author

Another option would be to define a "ndsmile" format, analogue to the "ndjson" format, where each SMILE document is prepended by it's length (varint). It could be signaled by Content-Type: application/ndsmile HTTP header. The header isn't standardized, but application/smile isn't as well.
The parsing/receiving on the ES side via the _bulk API should be relatively easy to implement with a relatively low CPU overhead (likely even lower than finding \n in a ndjson input stream).

@DaveCTurner
Copy link
Contributor

Another option would be to define a "ndsmile" format

Yes that's pretty much what I meant with doing the length-prefixing "by hand" in my earlier comment. This would definitely be less work for the coordinating node, but it does mean we can't use SMILE's built-in string deduplication across docs.

Apologies for the delayed response to the previous response too:

You say there is not problem in opening (hundreds of) parallel HTTP/1.1 connections per client.

I did say "appropriate number" 😉 and per-client that's probably not "hundreds". For bulk indexing traffic from a limited number of clients I'd expect the appropriate number of connections to be hundreds (or fewer) in total for each ES node, not per client. Beyond that there would be minimal benefits to adding more connections, the important bottlenecks would all be elsewhere.

HTTP pipelining is a feature that works on a single connection. It is very fragile and hardly supported.

Fortunately we don't need to worry about browser support here. Pipelining works reasonably well with bulk traffic into ES: e.g. pipelined requests are processed in parallel. Of course the responses have to go back in order so it's best if all the requests have approximately equal processing times and/or aren't very latency-sensitive, but these conditions do apply to bulk indexing traffic in many cases.

@rockdaboot
Copy link
Contributor Author

rockdaboot commented Mar 10, 2023

Yes that's pretty much what I meant with doing the length-prefixing "by hand" in my earlier comment.

Ah yes, sorry, that slipped out of my head.

Pipelining works reasonably well with bulk traffic into ES: e.g. pipelined requests are processed in parallel.

TIL, that's good to know and very helpful.

When you say "For bulk indexing traffic" are you talking about the _bulk API or about the index and create API (or all of them) ?

Of course the responses have to go back in order so it's best if all the requests have approximately equal processing times and/or aren't very latency-sensitive

Given that the responses include state (action, index name, _id, etc), the order might not even be important.

@DaveCTurner
Copy link
Contributor

When you say "For bulk indexing traffic" are you talking about the _bulk API or about the index and create API (or all of them) ?

I was specifically thinking about the _bulk API. The same sort of argument applies to single-doc indexing APIs too, although there's more of a risk of hitting the pipeline depth limit there (http.pipelining.max_events, 10k by default) and those APIs carry more overhead within ES too. In more detail, these APIs map internally into a bulk request that carries a single doc, and we require at least one fsync() during the processing of each bulk, and although fsync() calls are amortized across concurrent bulks to some extent I would still expect this to not perform optimally.

@rockdaboot
Copy link
Contributor Author

👍 That is precious information when designing a streaming SMILE Go bulk indexer for symbolization (prioritized quite high).

@DaveCTurner
Copy link
Contributor

Given that the responses include state (action, index name, _id, etc), the order might not even be important.

... as long as you're only sending indexing requests, and the request specifies IDs and so on, and everything succeeds. Otherwise you need the ordering to correlate errors to the original requests/docs. And it's required for compliance with HTTP/1.1 too. We could solve this by adding HTTP/2 support, and if it were a big deal in practice then this would be a strong argument for doing that. That's a big "if" tho, we haven't seen any cases where it really matters.

@rockdaboot
Copy link
Contributor Author

That's a big "if" tho, we haven't seen any cases where it really matters.

Agreed, with HTTP pipelining working on the ES side, we don't need HTTP/2 for symbolization. The large bulks are either create or index actions to the same ES index with fixed document layouts/mappings.

@CaptainDredge
Copy link

Using the SMILE format instead of JSON is less overhead in terms of CPU, allocated objects (GC pressure) and bytes output. Less bytes also means less CPU overhead when pipelining the data through a (gzip) compressor before it is sent.

@rockdaboot have you benchmarked smile version specifically for your use case (excluding any docs containing 0xFF) ?

@rockdaboot
Copy link
Contributor Author

@rockdaboot have you benchmarked smile version specifically for your use case (excluding any docs containing 0xFF) ?

No, currently all docs contain 0xFF 🤷🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/REST API REST infrastructure and utilities >docs General docs changes >enhancement Team:Core/Infra Meta label for core/infra team Team:Docs Meta label for docs team
Projects
None yet
Development

No branches or pull requests

5 participants