Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "aws.ecs.task.id" detection to "resourcedetection" processor #8274

Open
mkielar opened this issue Mar 7, 2022 · 19 comments · Fixed by open-telemetry/semantic-conventions#597
Labels
comp:aws AWS components enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed never stale Issues marked with this label will be never staled and automatically removed priority:p2 Medium processor/resourcedetection Resource detection processor

Comments

@mkielar
Copy link
Contributor

mkielar commented Mar 7, 2022

Is your feature request related to a problem? Please describe.
I'm using OTEL as a sidecar with ECS Services. I use it to parse and filter StatsD Metrics, that AppMesh/Envoy produces, and then I use emfexporter to put the metrics to Cloudwatch via Cloudwatch Log Stream. This mostly works. However, when my ECS Service scales to multiple instances, I often see following error in my logs:

2022-03-07T10:05:33.439Z	warn	cwlogs@v0.43.0/cwlog_client.go:84	cwlog_client: Error occurs in PutLogEvents, will search the next token and retry the request	
{
    "kind": "exporter",
    "name": "awsemf/statsd/envoy_metrics",
    "error": "InvalidSequenceTokenException: The given sequenceToken is invalid. The next expected sequenceToken is: 49626189108498028449043455519612405404976381845984773650\n{\n  RespMetadata: {\n    StatusCode: 400,\n    RequestID: \"a24432dd-4d17-44ae-b245-3877cfffabb7\"\n  },\n  ExpectedSequenceToken: \"49626189108498028449043455519612405404976381845984773650\",\n  Message_: \"The given sequenceToken is invalid. The next expected sequenceToken is: 49626189108498028449043455519612405404976381845984773650\"\n}"
}

This is caused by race-condition - now, two nodes write to the same log-stream in cloudwatch, and they corrupt each ther's sequenceToken that AWS API Required to put logs to CloudWatch.

Describe the solution you'd like
I was hoping to additionally configure resourcedetection processor:

  "resource":
    "attributes":
    - "action": "insert"
      "from_attribute": "aws.ecs.task.id"
      "key": "TaskId"
    - "action": "insert"
      "from_attribute": "aws.ecs.task.arn"
      "key": "TaskARN"
  "resourcedetection":
    "detectors":
    - "env"
    - "ecs"

so that I would be able to use the {TaskId} dynamic field when configuring emfexporter, like this:

"awsemf/statsd/envoy_metrics":
    "dimension_rollup_option": "NoDimensionRollup"
    "log_group_name": "/aws/ecs/dev/hello-world"
    "log_stream_name": "emf/otel/statsd/envoy_metrics/{TaskId}"
    "namespace": "dev/AppMeshEnvoy"

However, when I run my service, I can see that only the following is detected by resourcedetection:

2022-03-07T13:11:17.808Z	info	internal/resourcedetection.go:139	detected resource information	
{
    "kind": "processor",
    "name": "resourcedetection",
    "resource": {
        "aws.ecs.cluster.arn": "arn:aws:ecs:eu-west-1:506501033716:cluster/dev",
        "aws.ecs.launchtype": "fargate",
        "aws.ecs.task.arn": "arn:aws:ecs:eu-west-1:506501033716:task/dev/1a8d528834e046b183d4913feeaa16bc",
        "aws.ecs.task.family": "dev-hello-world",
        "aws.ecs.task.revision": "43",
        "cloud.account.id": "506501033716",
        "cloud.availability_zone": "eu-west-1a",
        "cloud.platform": "aws_ecs",
        "cloud.provider": "aws",
        "cloud.region": "eu-west-1"
    }
}

Describe alternatives you've considered
Tried to use TaskARN, but that just lead to not having LogStream created at all. Most likely, the reason is that TaskARNs contain characters that are illegal for LogStream Name, the the emfexporter fails silently, not being able to create one.

Additional context
N/A.

@mkielar
Copy link
Contributor Author

mkielar commented Mar 7, 2022

I have found a working workaround - I condfigure resource processor with extract action, like this:

  "resource":
    "attributes":
    - "action": "extract"
      "key": "aws.ecs.task.arn"
      "pattern": "^arn:aws:ecs:(?P<Region>.*):(?P<AccountId>.*):task/(?P<ClusterName>.*)/(?P<TaskId>.*)$"

Then, I configure emfexporter like this:

"awsemf/statsd/envoy_metrics":
    "dimension_rollup_option": "NoDimensionRollup"
    "log_group_name": "/aws/ecs/dev/hello-world"
    "log_stream_name": "emf/otel/statsd/envoy_metrics/{TaskId}"
    "namespace": "dev/AppMeshEnvoy"

This makes emfexporter to create CloudWatch Logstreams for each task separately.

Still, not having to do this, would be the preferred way, especially that the docs for emfexporter explicitly state they accept both TaskId and aws.ecs.task.id for {TaskId} references in the Log Stream Name.

@mx-psi mx-psi added comp:aws AWS components enhancement New feature or request labels Mar 7, 2022
@jpkrohling
Copy link
Member

@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Nov 8, 2022
@mkielar
Copy link
Contributor Author

mkielar commented Nov 16, 2022

Bad, bad bot :P
The issue is still relevant, and I'd still like to see that implemented. I believe having aws.ecs.task.id explicitly exported makes sense, and improves internal consistency of the whole software package, because it makes various components within the package (like emfexporter) cooperate better without having to do extra configuration steps.

@tnsardesai
Copy link

We also have a similar setup with app mesh + ecs fargate and are seeing the same issue. Thanks for also including the workaround @mkielar. I agree that aws.ecs.task.id should be explicitly exported especially because it is listed here https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/awsemfexporter#exporter-configuration.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Jan 18, 2023
@mkielar
Copy link
Contributor Author

mkielar commented Jan 18, 2023

Bad bot. The issue is still relevant and everything I wrote on Nov 16th still holds. Pinging @open-telemetry/collector-contrib-triagers, as suggested (hahah, it doesn't work :P).

@fatsheep9146 fatsheep9146 added processor/resourcedetection Resource detection processor and removed Stale labels Feb 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2023

Pinging code owners for processor/resourcedetection: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 3, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Apr 3, 2023
@mkielar
Copy link
Contributor Author

mkielar commented Apr 3, 2023

I guess I'm just gonna keep it alive then ;)

@jpkrohling jpkrohling removed the Stale label Apr 5, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jun 5, 2023
@mkielar
Copy link
Contributor Author

mkielar commented Jun 5, 2023

Bad bot :P
Pinging code owners: @Aneurysm9 @dashpole

@github-actions github-actions bot removed the Stale label Jun 6, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Aug 7, 2023
@jpkrohling
Copy link
Member

Code owners, are you able to provide feedback on this one?

cc @Aneurysm9 @dashpole

@jpkrohling jpkrohling removed the Stale label Aug 22, 2023
@bryan-aguilar
Copy link
Contributor

I don't see any issues with this. Having aws.ecs.task.id could be useful in more than just this specific use case and would prevent users from having to do the extraction themselves as mentioned earlier.

@mkielar would you be willing to contribute a PR for this? I would expect that the implementation would be to extract the task-id out of the task arn similar to your regex.

@bryan-aguilar bryan-aguilar added help wanted Extra attention is needed priority:p2 Medium labels Aug 22, 2023
@bryan-aguilar bryan-aguilar added the good first issue Good for newcomers label Aug 22, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • issue: Github issue template generation code needs this to generate the corresponding labels.
  • processor/resourcedetection: @Aneurysm9 @dashpole

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@mkielar
Copy link
Contributor Author

mkielar commented Dec 1, 2023

Hi @bryan-aguilar, I finally found time to learn Go, and made an attempt on implementing this feature. The PR is here: #29602 please have a look.

Also, pinging code owners: @Aneurysm9 @dashpole

@github-actions github-actions bot removed the Stale label Dec 2, 2023
mx-psi pushed a commit that referenced this issue Dec 12, 2023
…r ECS Tasks (#8274) (#29602)

**Description:**
The `resourcedetection` processor now populates `aws.ecs.task.id`
property (in addition to other `aws.ecs.task.*`
properties). This simplifies configuration of `awsemfexporter`, which
internally searches for `aws.ecs.task.id`
property when using `TaskId` placeholder in `loggroup` / `logstream`
name template.

**Link to tracking Issue:**
#8274

**Testing:** ECS Task ARNs come in two versions. In the old one, the
last part of the ARN contains only the `task/<task-id>`. In the new one,
it contains `task/cluster-name/task-id`. Implementation and Unit Tests
have been added to handle both cases.

**Documentation:** `README.md` now also mentions `aws.ecs.task.id` as
inferred property for ECS.
Copy link
Contributor

github-actions bot commented Feb 1, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 1, 2024
@mx-psi mx-psi removed the Stale label Feb 1, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Apr 2, 2024
@mx-psi mx-psi added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:aws AWS components enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed never stale Issues marked with this label will be never staled and automatically removed priority:p2 Medium processor/resourcedetection Resource detection processor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants