Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use out_http to send logs to in_http #2909

Closed
Gellardo opened this issue Mar 24, 2020 · 7 comments · Fixed by #2973
Closed

Use out_http to send logs to in_http #2909

Gellardo opened this issue Mar 24, 2020 · 7 comments · Fixed by #2973

Comments

@Gellardo
Copy link

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Is your feature request related to a problem? Please describe.

I have 2 managed k8s clusters with individual fluentd services and want to forward some logs from cluster_1 to cluster_2. I can't use the forward protocol because the ingress does not allow raw TCP forwarding.

Instead I tried to use out_http in cluster_1 and in_http in cluster_2, since HTTP based ingress configurations are easy to do. But I couldn't find a working configuration, all failed with one of:

  • 400 Bad Request\nReceived event is not json
  • 400 Bad Request\n'json' or 'msgpack' parameter is required
  • Server: incoming event is invalid: path=/some.tag params={"json":"{\"hello\":\"world\",\"num\":18}\n{\"hello\":\"world\",\"num\":19}\n",...}

Describe the solution you'd like

There should be an easy configuration to allow using http to forward events between 2 fluentd instances.
I see 2 possible solutions:

  • add an option to out_http to optionally wrap (json) events in an (json) array
  • add an option to in_http/parse_json to be able to handle the json lines format used by out_http

Describe alternatives you've considered

Declare the out_http -> in_http case to be out of scope. In that case there should be some mention of it in the docs, to save the next person time. Two plugins which communicate over HTTP but can't talk to one another is very counter-intuitive in my opinion.

Additional context
Configuration for reproducing the problem.

# sending side
<source>
  @type dummy
  dummy {"hello":"world"}
  auto_increment_key num
  tag kube.test.tag
</source>

<match **>
  @type http
  endpoint http://localhost:9881/$tag
  open_timeout 2
  #content_type application/json #results in 'invalid event' if uncommented
  <format>
    @type json
  </format>
  <buffer>
    flush_interval 2s
  </buffer>
</match>
# receiving side
<source>
  @type http
  port 9881
  bind 0.0.0.0
  <parse> # comment out for 'json parameter required'
    @type json
  </parse>
</source>
<match **>
  @type stdout
</match>
@ganmacs
Copy link
Member

ganmacs commented Mar 26, 2020

You can send/receive data when you set @type msgpack in each config file.
However, I think fluentd does not need to support out_http -> in_http case because it already supports it with in_forward and out_forward. besides, I can't see the case that HTTP is okay but raw TCP isn't allowed since HTTP is a protocol on TCP.

Declare the out_http -> in_http case to be out of scope. In that case there should be some mention of it in the docs, to save the next person time.

I think It's a reasonable idea. WDYT? @repeatedly , @cosmo0920

@repeatedly
Copy link
Member

Looks good. Adding link to forward.

@cosmo0920
Copy link
Contributor

Almost looks good.
But forward does not support proxies.
How do we support proxy on forward plugin?

@Gellardo
Copy link
Author

Gellardo commented Mar 26, 2020

I will look into using msgpack instead of json.

besides, I can't see the case that HTTP is okay but raw TCP isn't allowed since HTTP is a protocol on TCP.

I am fine with your decision, just adding my viewpoint: For direct connections, I agree. In cases with a mandatory (reverse-) proxy in between, HTTP has far better support in my experience. Raw TCP is more of an edge-case. HTTP also only needs 1 open port for multiple services, since host-based routing can be used to forward to different backends.
Example Kubernetes: Adding HTTP ingress definitions is the default way to expose services. TCP is an edge-case, which requires special configuration, e.g. nginx.

@Gellardo
Copy link
Author

Gellardo commented Mar 26, 2020

Thanks for the suggestion of using msgpack! It remains weird to me, that the json format/parse combination is incompatible. But from my perspective, the issue can be closed.

This particular configuration of out_http -> in_http works:

# sending side
<source>
  @type dummy
  dummy {"hello":"world"}
  auto_increment_key num
  tag kube.test.tag
</source>
<match **>
  @type http
  endpoint http://localhost:9881/some.tag
  open_timeout 2
  <format>
    @type msgpack
  </format>
  <buffer>
    flush_interval 2s
  </buffer>
</match>
# receiving side
<source>
  @type http
  port 9881
  bind 0.0.0.0
  <parse>
    @type msgpack
  </parse>
  <format> # without changing the format, nothing gets printed to stdout
    @type json
  </format>
</source>
<match **>
  @type stdout
</match>

@ganmacs
Copy link
Member

ganmacs commented Apr 1, 2020

related issue #1936

@rgeraskin
Copy link
Contributor

Hello. I have a related challenge: my Fluentd should send logs to third-party ELK. The endpoint is Logstash http with json codec. But instead of Logstash endpoint, it could be Fluentd in_http with the same result.

So speaking about json format I have these issues:

  1. The HTTP request body json data format is differ (json [array] vs ndjson). And Fluentd has no setting to configure it.
  2. By default when you send buffered events to the endpoint it grabs only the first one because the only first json from ndjson is valid json. It's not obvious behavior unless you check a Content-Type for the HTTP request.

Possible workarounds:

  1. Disable buffering with a flush_mode immediate setting. Obviously we lose buffering benefits overwhelming the endpoint with a ton of requests.
  2. Use FluentBit instead of Fluentd. I should rewrite all the configs and write some lua code using lua FluentBit filter instead of Fluentd record_transformer filter functionality with a handy enable_ruby setting.
  3. Implement an external output plugin with the same functionality as out_http but with json arrays.
  4. Implement json array format support in the out_http.

The last one is more convenient for me of course. For example, in the FluentBit we may choose from json, json_stream, and json_lines. The last two options we already have in Fluentd with an add_newline setting. Maybe we should have some json payload format setting with json_array/ndjson options?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants