Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk UDP #2201

Closed
kimchy opened this issue Aug 23, 2012 · 9 comments
Closed

Bulk UDP #2201

kimchy opened this issue Aug 23, 2012 · 9 comments

Comments

@kimchy
Copy link
Member

kimchy commented Aug 23, 2012

A Bulk UDP service is a service listening over UDP for bulk format requests. The idea is to provide a low latency UDP service that allows to easily index data that is not of critical nature.

The Bulk UDP service is disabled by default, but can be enabled by setting bulk.udp.enabled to true.

The bulk UDP service performs intenral bulk aggregation of the data and then flushes it based on several parametres:

  • bulk.udp.bulk_actions: The number of actions to flush a bulk after, defaults to 1000.
  • bulk.udp.bulk_size: The size of the current bulk request to flush the request once exceeded, defaults to 5mb.
  • bulk.udp.flush_interval: An interval after which the current request is flushed, regarldess of the above limits. Defaults to 5s.
  • bulk.udp.concurrent_requests: The number on max in flight bulk requests allowed. Defaults to 4.

The network settings allowed are:

  • bulk.udp.host: The host to bind to, defualts to network.host which defaults to any.
  • bulk.udp.port: The port to use, defaults to 9700-9800.
  • bulk.udp.receive_buffer_size: The receive buffer size, defaults to 10mb.

Here is an example of how it can be used:

> cat bulk.txt
{ "index" : { "_index" : "test", "_type" : "type1" } }
{ "field1" : "value1" }
{ "index" : { "_index" : "test", "_type" : "type1" } }
{ "field1" : "value1" }

> cat bulk.txt | nc -w 0 -u localhost 9700
@kimchy kimchy closed this as completed in 072fcaa Aug 23, 2012
kimchy added a commit that referenced this issue Aug 23, 2012
A Bulk UDP service is a service listening over UDP for bulk format requests. The idea is to provide a low latency UDP service that allows to easily index data that is not of critical nature.

The Bulk UDP service is disabled by default, but can be enabled by setting `bulk.udp.enabled` to `true`.

The bulk UDP service performs intenral bulk aggregation of the data and then flushes it based on several parametres:

* `bulk.udp.bulk_actions`: The number of actions to flush a bulk after, defaults to `1000`.
* `bulk.udp.bulk_size`: The size of the current bulk request to flush the request once exceeded, defaults to `5mb`.
* `bulk.udp.flush_interval`: An interval after which the current request is flushed, regarldess of the above limits. Defaults to `5s`.
* `bulk.udp.concurrent_requests`: The number on max in flight bulk requests allowed. Defaults to `4`.

The network settings allowed are:

* `bulk.udp.host`: The host to bind to, defualts to `network.host` which defaults to any.
* `bulk.udp.port`: The port to use, defaults to `9700-9800`.

Here is an example of how it can be used:

    > cat bulk.txt
    { "index" : { "_index" : "test", "_type" : "type1" } }
    { "field1" : "value1" }
    { "index" : { "_index" : "test", "_type" : "type1" } }
    { "field1" : "value1" }

    > cat bulk.txt | nc -w 0 -u localhost 9700
martijnvg pushed a commit to martijnvg/elasticsearch that referenced this issue Aug 27, 2012
A Bulk UDP service is a service listening over UDP for bulk format requests. The idea is to provide a low latency UDP service that allows to easily index data that is not of critical nature.

The Bulk UDP service is disabled by default, but can be enabled by setting `bulk.udp.enabled` to `true`.

The bulk UDP service performs intenral bulk aggregation of the data and then flushes it based on several parametres:

* `bulk.udp.bulk_actions`: The number of actions to flush a bulk after, defaults to `1000`.
* `bulk.udp.bulk_size`: The size of the current bulk request to flush the request once exceeded, defaults to `5mb`.
* `bulk.udp.flush_interval`: An interval after which the current request is flushed, regarldess of the above limits. Defaults to `5s`.
* `bulk.udp.concurrent_requests`: The number on max in flight bulk requests allowed. Defaults to `4`.

The network settings allowed are:

* `bulk.udp.host`: The host to bind to, defualts to `network.host` which defaults to any.
* `bulk.udp.port`: The port to use, defaults to `9700-9800`.

Here is an example of how it can be used:

    > cat bulk.txt
    { "index" : { "_index" : "test", "_type" : "type1" } }
    { "field1" : "value1" }
    { "index" : { "_index" : "test", "_type" : "type1" } }
    { "field1" : "value1" }

    > cat bulk.txt | nc -w 0 -u localhost 9700
@medcl
Copy link
Contributor

medcl commented Sep 5, 2012

as the udp is not reliable,so does that mean the data maybe losed for some reason? is that right?
@kimchy
anyway,this feature is very suitable for indexing logging data,cheers :)

@kimchy
Copy link
Member Author

kimchy commented Sep 6, 2012

Yes, UDP comes with its downsides, but sometimes its acceptable.

@vbichkovsky
Copy link

Hello!
I tried to import 100 documents from a text file with "cat ...|nc ..." (as mentioned in documentation), but only small portion of these was imported. In the log I see org.elasticsearch.ElasticSearchParseException and sometimes org.elasticsearch.common.jackson.core.JsonParseException. JSON data is valid, separated by \n. The error occurs only when I try to import several documents at once, like 6 of them. Sending each document (+meta) in a separate datagram works. Elasticsearch version is 0.19.10, I can provide example document if necessary. Is it a well-known limitation or an issue?

@kimchy
Copy link
Member Author

kimchy commented Oct 4, 2012

@vbichkovsky it should work, can you provide an example?

@vbichkovsky
Copy link

The file: http://pastie.org/4909910
Log output: http://pastie.org/4909902

Commands I typed:
curl -XDELETE http://localhost:9292/tests
cat data-file | nc -w 0 -u localhost 9797

I'm using a virtual box (debian squeeze) set up via Vagrant. Ports 9200 and 9700 are forwarded to 9292 and 9797. Elasticsearch was installed using .deb package downloaded from the site.

@kimchy
Copy link
Member Author

kimchy commented Oct 4, 2012

I can see that the message sent is broken down into two, one in the size of 1024 bytes, and then the rest. The parsing fails on the second chunk of the message. We set properly the receive buffer size (as far as I can see) for the UDP socket, so my first guess is that nc has a send buffer size of 1024 maybe? If not, then the other option is that the receive buffer that we set is not properly set for some reason. Still need to chase it down if its on the nc level or not, a simple way is to write a udp client with proper settings and see if it fails then, will try and do it in the coming days, but if you can try on your end it would help speed things up.

@vbichkovsky
Copy link

You were right, it is netcat.
I wrote a small Ruby script (http://pastie.org/4913960) to check what is sent by netcat, and, indeed, first packet is 1024 bytes, then comes the rest.
Unfortunately, there is no option in netcat to increase it, here is related thread: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/questions/2010-02/msg00747.html
Btw, if I have questions about Elasticsearch, where is the best place to ask them?
Are you going to release a book about it? :)

@kimchy
Copy link
Member Author

kimchy commented Oct 5, 2012

@vbichkovsky luck guess on my end regarding netcat, thanks for checking it out!, Questions on ES are best asked on the ES google group. We lurk there quite a bit, or on IRC (but sometimes, you might need to ping us when we are online...).
Book is planned for sure!, working on starting to get the process going on one.

@tianchu
Copy link

tianchu commented Feb 27, 2014

Hi @kimchy, I'm currently using 0.90.10. To enable this udp bulk api, I only need to add bulk.udp.enabled: true in the elasticsearch.yml file? Just wanna double check, since I'm not seeing bulk.udp.enabled in the elasticsearch.yml file shipped with ES installation.

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
A Bulk UDP service is a service listening over UDP for bulk format requests. The idea is to provide a low latency UDP service that allows to easily index data that is not of critical nature.

The Bulk UDP service is disabled by default, but can be enabled by setting `bulk.udp.enabled` to `true`.

The bulk UDP service performs intenral bulk aggregation of the data and then flushes it based on several parametres:

* `bulk.udp.bulk_actions`: The number of actions to flush a bulk after, defaults to `1000`.
* `bulk.udp.bulk_size`: The size of the current bulk request to flush the request once exceeded, defaults to `5mb`.
* `bulk.udp.flush_interval`: An interval after which the current request is flushed, regarldess of the above limits. Defaults to `5s`.
* `bulk.udp.concurrent_requests`: The number on max in flight bulk requests allowed. Defaults to `4`.

The network settings allowed are:

* `bulk.udp.host`: The host to bind to, defualts to `network.host` which defaults to any.
* `bulk.udp.port`: The port to use, defaults to `9700-9800`.

Here is an example of how it can be used:

    > cat bulk.txt
    { "index" : { "_index" : "test", "_type" : "type1" } }
    { "field1" : "value1" }
    { "index" : { "_index" : "test", "_type" : "type1" } }
    { "field1" : "value1" }

    > cat bulk.txt | nc -w 0 -u localhost 9700
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants