Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad td-agent-bit version installed on the Python image. #15164

Open
cristiangauma opened this issue Mar 28, 2023 · 0 comments
Open

Bad td-agent-bit version installed on the Python image. #15164

cristiangauma opened this issue Mar 28, 2023 · 0 comments
Labels
type: bug Something isn't working

Comments

@cristiangauma
Copy link

Prior to submitting an issue, check to see if one has already been created. If there is currently an open issue, add a thumbs-up emoji to identify that it is also affecting you.

Your Environment

  • Version: 1.8.0
  • Affected Component: Python Container (Access Gateway and Federation Gateway)
  • Affected Subcomponent: td-agent-bit on Docker Containers
  • Deployment Environment: AGW (docker), FeG (docker)

Describe the Issue

After installing some AGWs (docker based) and some FeGs, we've seen that no logs were being sent to the fluentd component on the orc8r.

When we debugged it, we saw that the fluent-bit client, was closing the TLS handshake during the fluent-bit <-> fluentd communication.

The way we have our infrastructure, the installation of the AGW (docker based) and AGW (system based) is pretty similar, but we didn't have this error on our AGW (system based) servers.

We found out that the version of fluent-bit (td-agent-bit) being used in the AGW system based installations and AGW docker installations were different:

  • System based installation is using the version 1.7.8
  • Docker based installation (AGW and FeG) is using the 1.9.8

After forcing the 1.7.8 version on the docker containers, AGWs and FeG started to send logs properly, finishing the TLS handshake properly.

The issue is that in the Python Images, the fluentbit repository is being added, and the version being installed on the containers is not pinned:

It is also present on master, so it might happen again in the future:

To Reproduce

  1. On release 1.8.0, deploy an AGW or FeG using the https://linuxfoundation.jfrog.io/magma-docker/agw_gateway_python images.
  2. Logs won't be sent to fluentd on orc8r as the communication will be closed by the fluentbit client.

Expected behavior

Logs being sent properly from AGW (docker based) and FeG (docker based) to orc8r fluentd and inserted into ES clusters.

Possible fixes

It will depent on how Magma maintainers want to maintain this component on the future, two fix are possible:

  1. If for some reason, the fluentbit.io repository needs to be really added to the Python image, then pinning the version to a correct one will work.
  2. If only the linuxfoundation.jfrog.io repository is finally present, and inside that repository the correct td-agent-bit is located, then there is no need to pin the version. The only thing to be sure is that ubuntu doesn't have a newer version than the one in the Magma repository.

My thoughts: pinning the version, and upload that particular version to the Magma repository is the way to go (assuming no modification to td-agent-bit has been done by the Magma project).

@cristiangauma cristiangauma added the type: bug Something isn't working label Mar 28, 2023
@panyogesh panyogesh assigned panyogesh and unassigned panyogesh Jun 5, 2023
akhilamoyila9 pushed a commit to akhilamoyila9/magma that referenced this issue Jun 29, 2023
Changes:
  1. Pinned the version of td-agent-bit

Testing:
  1. Deployed the docker agw and verified the version
  2. Using netstat verified the connection is established
  3. Events are getting posted to orc8r

Signed-off-by: moyilaakhila <moyilaakhila@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants