Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrations samples on Fargate are not continuously sent #339

Closed
gsanchezgavier opened this issue Mar 9, 2021 · 6 comments · Fixed by #634
Closed

Integrations samples on Fargate are not continuously sent #339

gsanchezgavier opened this issue Mar 9, 2021 · 6 comments · Fixed by #634
Assignees
Labels
bug Categorizes issue or PR as related to a bug.

Comments

@gsanchezgavier
Copy link
Contributor

Description

When executing the agent in ECS on Fargate (newrelic/infrastructure-bundle:2.1.2), the agent just sends one sample (the first) of the nri-ecs and other integrations that sends payloads through it, except for nri-docker.
This affects the current solution of ECS on Fargate integration.

Expected Behavior

The Agent Forwards all valid payloads from all integrations.

Troubleshooting or NR Diag results

More info in this slack thread.

We did a test using a docker compose with the Infra-bundle + redis container and set up the integration with the NRIA_IS_FORWARD_ONLY and we didn’t see the problem there.

If in the same Task Definition we select NRIA_IS_SECURE_FORWARD_ONLY the samples are being send correctly.

Steps to Reproduce

Here is a Fargate Task definition where you could reproduce the issue and check that nri-ecs payloads are not being send continously.

{
    "networkMode": "awsvpc",
    "requiresCompatibilities": ["FARGATE"],
    "cpu": "512",
    "memory": "1024",
    "containerDefinitions": [
      {
        "name": "redis-farg-orig", 
        "image": "redis:latest", 
        "portMappings": [
            {
                "containerPort": 6379, 
                "protocol": "tcp"
            }
        ]
    },
      {
        "environment": [
          {
            "name": "NRIA_LICENSE_KEY",
            "value": "<licence>"
          },
          {
            "name": "NRIA_STAGING",
            "value": "true"
          },
          {
            "name": "NRIA_VERBOSE",
            "value": "1"
          },
          {
            "name": "NRIA_OVERRIDE_HOST_ROOT",
            "value": ""
          },
          {
            "name": "NRIA_IS_FORWARD_ONLY",
            "value": "true"
          },
          {
            "name": "FARGATE",
            "value": "true"
          },
          {
            "name": "ENABLE_NRI_ECS",
            "value": "true"
          },
          {
            "name": "NRIA_PASSTHROUGH_ENVIRONMENT",
            "value": "ECS_CONTAINER_METADATA_URI,ENABLE_NRI_ECS,FARGATE"
          },
          {
            "name": "NRIA_CUSTOM_ATTRIBUTES",
            "value": "{\"nrDeployMethod\":\"downloadPage\"}"
          }
        ],
        "logConfiguration": {
          "logDriver": "awslogs",
          "options": {
              "awslogs-group": "fargate-current",
              "awslogs-region": "us-east-1",
              "awslogs-stream-prefix": "nr-agent"
          }
        },
        "cpu": 256,
        "memoryReservation": 512,
        "image": "newrelic/infrastructure-bundle:2.1.2",
        "name": "newrelic-infra"
      }
    ],
    "family": "FargatTest"
  }

Your Environment

ECS Cluster

Additional context

@gsanchezgavier gsanchezgavier added the bug Categorizes issue or PR as related to a bug. label Mar 9, 2021
@paologallinaharbur
Copy link
Member

I experienced the same issue setting NRIA_IS_FORWARD_ONLY, in k8s the agent seems to be working fine sending 1-2 samples and then get stuck outputting the following:

[...]
time="2021-03-17T14:23:28Z" level=debug msg="Unable to enable/disable OHI feature." component=FeatureFlagHandler enable=true error="cannot find cfg file for feature" feature_flag=docker_enabled
time="2021-03-17T14:24:02Z" level=debug msg="Failed to detect the cloud type, retrying in 60.000000 seconds"
time="2021-03-17T14:24:28Z" level=debug msg="Unable to enable/disable OHI feature." component=FeatureFlagHandler enable=true error="cannot find cfg file for feature" feature_flag=docker_enabled
time="2021-03-17T14:25:05Z" level=debug msg="Failed to detect the cloud type, retrying in 60.000000 seconds"
time="2021-03-17T14:25:28Z" level=debug msg="Unable to enable/disable OHI feature." component=FeatureFlagHandler enable=true error="cannot find cfg file for feature" feature_flag=docker_enabled
time="2021-03-17T14:26:11Z" level=debug msg="Couldn't detect any known cloud, using no cloud type."
time="2021-03-17T14:26:29Z" level=debug msg="Unable to enable/disable OHI feature." component=FeatureFlagHandler enable=true error="cannot find cfg file for feature" feature_flag=docker_enabled
time="2021-03-17T14:27:29Z" level=debug msg="Unable to enable/disable OHI feature." component=FeatureFlagHandler enable=true error="cannot find cfg file for feature" feature_flag=docker_enabled
time="2021-03-17T14:28:29Z" level=debug msg="Unable to enable/disable OHI feature." component=FeatureFlagHandler enable=true error="cannot find cfg file for feature" feature_flag=docker_enabled

Switching back to NRIA_IS_SECURE_FORWARD_ONLY seems to be solving the issue

@paologallinaharbur
Copy link
Member

I am experiencing the same on EKS with NRIA_IS_FORWARD_ONLY, one sample is sent and then nothing and the same error message.

@varas varas added this to Backlog in CAOS: Our Daily Bread via automation Jun 4, 2021
@varas varas moved this from Backlog to Ready For Review in CAOS: Our Daily Bread Jun 4, 2021
@varas
Copy link
Contributor

varas commented Jun 21, 2021

Update from infraa-platform at https://newrelic.slack.com/archives/C5A2QGLKT/p1624016738224000

@varas varas self-assigned this Jun 28, 2021
@varas varas assigned varas and unassigned varas Jul 1, 2021
@varas varas linked a pull request Jul 9, 2021 that will close this issue
@varas
Copy link
Contributor

varas commented Jul 29, 2021

This seems to be related to the forwarder-mode blocking some telemetry data submissions. TL;DR: when forwarder mode is used connect is disabled, then some telemetry is blocked from submission, as host ID (retrieved by connect) is required for some payloads.

This was addressed at #659

I tested this task definition with latest bundle release ATM https://github.com/newrelic/infrastructure-bundle/releases/tag/2.6.4 and I got all EcsClusterSamples. See https://staging-one.newrelic.com/-/0bEjOyJgDw6

Could you verify the issue is solved for you @gsanchezgavier ?

@varas
Copy link
Contributor

varas commented Jul 29, 2021

Also we added debug entries whenever a sample is truncated or dropped, so in that case you should be able to see whether this happened. See https://github.com/newrelic/infrastructure-agent/pull/634/files

@varas
Copy link
Contributor

varas commented Aug 2, 2021

Issue resolution confirmed via Slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Categorizes issue or PR as related to a bug.
Projects
No open projects
CAOS: Our Daily Bread
Ready For Review
Development

Successfully merging a pull request may close this issue.

3 participants