Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki fluent bit plugin fails to start on ECS with new 2.9.2 image tag #10944

Closed
kacey-lunacare opened this issue Oct 17, 2023 · 9 comments · Fixed by #11904
Closed

Loki fluent bit plugin fails to start on ECS with new 2.9.2 image tag #10944

kacey-lunacare opened this issue Oct 17, 2023 · 9 comments · Fixed by #11904
Labels
component/fluent-bit-plugin type/bug Somehing is not working as expected

Comments

@kacey-lunacare
Copy link

Describe the bug
A clear and concise description of what the bug is.

Log router container would fail to start up and threw:
fatal: morestack on g0

I didn't spend a lot of time troubleshooting it, unfortunately. I just quickly undid latest, which I believe was the new 2.9.2 image due to when all of our clusters tasks started failing to schedule and the time that 2.9.2 was pushed. I pinned 2.9.1 and now all of our tasks are working again.

Log config:

            "logConfiguration": {
                "logDriver": "awsfirelens",
                "options": {
                    "DropSingleKey": "true",
                    "LabelKeys": "container_name,ecs_task_definition,source,ecs_cluster",
                    "Labels": "{job=\"dev-${APP_NAME}\"}",
                    "LineFormat": "key_value",
                    "Name": "grafana-loki",
                    "RemoveKeys": "container_id,ecs_task_arn",
                    "TenantID": "dev-${APP_NAME}"
                },
                "secretOptions": [
                    {
                        "name": "Url",
                        "valueFrom": "arn:aws:ssm:us-west-2:${ACCOUNT_ID}:parameter/dev/ecs/${APP_NAME_URL}"
                    }
                ]
            }
        },
        {
            "name": "log_router",
            "image": "grafana/fluent-bit-plugin-loki:latest", # assuming this was using 2.9.2
            "cpu": 0,
            "memoryReservation": 150,
            "portMappings": [],
            "essential": true,
            "environment": [],
            "mountPoints": [],
            "volumesFrom": [],
            "user": "0",
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "dev-ecs",
                    "awslogs-region": "us-west-2",
                    "awslogs-stream-prefix": "dev-service-${APP_NAME}"
                }
            },
            "firelensConfiguration": {
                "type": "fluentbit",
                "options": {
                    "enable-ecs-log-metadata": "true"
                }
            }
        }

Please feel free to delete this, but we noticed, pinning to 'latest' for the grafana/fluent-bit-plugin-loki:latest image, which was just released yesterday as version 2.9.2 caused our ecs tasks to fail to start because that container would die immediately throwing fatal: morestack on g0. Pinning to 2.9.1 fixed it though!

@JStickler JStickler added component/fluent-bit-plugin type/bug Somehing is not working as expected labels Oct 23, 2023
@rifkiaz
Copy link

rifkiaz commented Oct 29, 2023

Hello @kacey-lunacare I have the same issue as you. what image tag do you use to fix the issues?

I used 2.9.1 version and service always shutdown. and if i use 2.9.2 the problem is fatal: morestack on g0

image

@kacey-lunacare
Copy link
Author

2.9.1 works great for us!
If you post your config I can compare with what we have. I'm not seeing what the error is in that output though :(

@rifkiaz
Copy link

rifkiaz commented Oct 29, 2023

this is my config. Can compare with your config @kacey-lunacare ?

        "logConfiguration": {
            "logDriver": "awsfirelens",
            "options": {
                "LabelKeys": "container_name,ecs_task_definition,source,ecs_cluster",
                "Labels": "{job=\"firelens\"}",
                "LineFormat": "key_value",
                "Name": "grafana-loki",
                "RemoveKeys": "container_id,ecs_task_arn",
                "Url": "URL_VALUE"
            },
            "secretOptions": []
        }
    },
    {
        "name": "log_router",
        "image": "grafana/fluent-bit-plugin-loki:2.9.1",
        "cpu": 0,
        "memoryReservation": 50,
        "portMappings": [],
        "essential": true,
        "environment": [
            {
                "name": "LOKI_URL",
                "value": "URL_VALUE"
            }
        ],
        "mountPoints": [],
        "volumesFrom": [],
        "user": "0",
        "logConfiguration": {
            "logDriver": "awslogs",
            "options": {
                "awslogs-create-group": "true",
                "awslogs-group": "/ecs/ecs-aws-firelens-sidecar-container",
                "awslogs-region": "ap-southeast-1",
                "awslogs-stream-prefix": "firelens"
            },
            "secretOptions": []
        },
        "firelensConfiguration": {
            "type": "fluentbit",
            "options": {
                "enable-ecs-log-metadata": "true"
            }
        }
    }
]

@kacey-lunacare
Copy link
Author

Ahh here is the rest of mine:

    "logConfiguration": {
        "logDriver": "awsfirelens",
        "options": {
            "LabelKeys": "container_name,ecs_task_definition,source,ecs_cluster",
            "Labels": "{job=\"${environment}-${app_name}\"}",
            "LineFormat": "${log_format}",
            "Name": "grafana-loki",
            "RemoveKeys": "container_id,ecs_task_arn",
            "TenantID": "${environment}-${app_name}",
            "DropSingleKey": "true"
        },
        "secretOptions": [{
          "name": "Url",
          "valueFrom": "arn:aws:ssm:${aws_region}:${account_id}:parameter/${environment}/ecs/loki-${environment}-url"
        }]
    }

If I had to take a guess, I would suggest that you are not giving it enough memory. I found that mine also died without enough memory. The secret URL is the user+pass encoded to the url so that we can do https communication

@andremichi
Copy link

Hey @rifkiaz and @kacey-lunacare For me also worked pinning the version to 2.9.1 and increasing the memory.

@jcdauchy-moodys
Copy link

A bit more info here: golang/go#62440

@rifkiaz
Copy link

rifkiaz commented Nov 6, 2023

already fix with image 2.9.1 thx all

@crstian19
Copy link

For docker grafana/fluent-bit-plugin-loki:2.9.1 works for me

@mzollneritsch-nc
Copy link

Hello!

We are experiencing the same issue with versions > 2.9.1 on Azure Kubernetes (AKS) 1.28.3

chaudum pushed a commit that referenced this issue Feb 10, 2024
#11904)

**Issue**:

Since version `grafana/fluent-bit-grafana-loki` >= `2.9.2` we could not
run the client due to an error during the startup/forward process where
the container/service crashes and exits with an error message:

`fatal: morestack on g0`

refs: 
- #10944
- golang/go#62440


**What this PR does / why we need it**:
A fix was released through the Golang version 1.22.0 and after some
tests (updating the Go version used to the fluent-bit base image), it
seems working fine now.

**Which issue(s) this PR fixes**:
Fixes #10944
rhnasc pushed a commit to inloco/loki that referenced this issue Apr 12, 2024
grafana#11904)

**Issue**:

Since version `grafana/fluent-bit-grafana-loki` >= `2.9.2` we could not
run the client due to an error during the startup/forward process where
the container/service crashes and exits with an error message:

`fatal: morestack on g0`

refs: 
- grafana#10944
- golang/go#62440


**What this PR does / why we need it**:
A fix was released through the Golang version 1.22.0 and after some
tests (updating the Go version used to the fluent-bit base image), it
seems working fine now.

**Which issue(s) this PR fixes**:
Fixes grafana#10944
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/fluent-bit-plugin type/bug Somehing is not working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants