Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: Github Actions Receiver #27460

Open
1 of 2 tasks
krzko opened this issue Oct 6, 2023 · 44 comments
Open
1 of 2 tasks

New component: Github Actions Receiver #27460

krzko opened this issue Oct 6, 2023 · 44 comments
Labels
Accepted Component New component has been sponsored

Comments

@krzko
Copy link

krzko commented Oct 6, 2023

The purpose and use-cases of the new component

The GitHub Actions Receiver processes GitHub Actions webhook events to observe workflows and jobs. It handles workflow_job and workflow_run event payloads, transforming them into trace telemetry.

Each GitHub Action workflow or job, along with its steps, are converted into trace spans, allowing the observation of workflow execution times, success, and failure rates.

If a secret is configured (recommended), it validates the payload ensuring data integrity before processing.

Example configuration for the component

receivers:
  githubactions:
    endpoint: 0.0.0.0:443
    path: /ghaevents
    secret: YourSecr3t
    tls:
      certfile: /path/to/cert
      keyfile: /path/to/key

Telemetry data types supported

traces

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

No response

Sponsor (optional)

No response

Additional context

Multi Job

image

Matrix Strategy

image

Deterministic Step Spans

image
@krzko krzko added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels Oct 6, 2023
@bryan-aguilar
Copy link
Contributor

hi @krzko,

Thank you for the new component proposal. If you have not already please make sure you review the new component guidelines.

If you have not found a volunteer sponsor yet then I encourage you to come to our weekly collector sig meetings. You can add an item to the agenda to discuss this new component proposal.

@bryan-aguilar bryan-aguilar removed the needs triage New item requiring triage label Oct 13, 2023
@krzko
Copy link
Author

krzko commented Oct 15, 2023

Thanks @bryan-aguilar for the heads up.

No sponsor as of yet, so I'll have to add an agenda item for this component and try and make the EU-APAC meeting, as I'm based out of Australia.

@krzko
Copy link
Author

krzko commented Nov 1, 2023

Jumped into the Collector SIG (EU-APAC) meeting today, but nobody around 🤷

Not looking forward to do doing a 3AM call for the main SIG meeting.

@bryan-aguilar
Copy link
Contributor

What experience have you had with long running workflows/jobs? We have some workflows that will run for 3+ hours.
Are there any examples of what a trace looks like for jobs that heavily use job matrices? Another scenario my team is in is that we have a one job that will fire off 100+ jobs based on it's matrix config at the time.

@krzko
Copy link
Author

krzko commented Nov 15, 2023

Matrix strategies are represented as traces quite well using all the o11y backends that I've uses, so I think this use case is covered.

I use the workflow_run event as the root span, for some long running workflows your o11y backbend might report a missing root span if you view the trace before the workflow run has completed. We emit the root span when status is completed and then we set the span status based on the conclusion.

For the 100+ scenario, that might be a rather large waterfall view, have been toying around with potentially using links, but so far keeping it simple.

@adrielp
Copy link
Contributor

adrielp commented Nov 15, 2023

Just curious here @krzko - had you tried the webhook receiver? We were playing around with event logging through GitHub apps leveraging the Webhook receiver from OTEL, but from what I recall, we found two main issues.

  1. support for auth extensions
  2. flattening of data support in the transform processor

It's been a while, so this is from memory, but curious if you had tried/experimented with that.

Additionally, wondering if you've tried tracing at the runner level instead of event logging to inferring traces after the fact. We had found that in order to accomplish tracing at the runner level we'd have to use env vars as propagators (which isn't in the spec yet, but is currently being looked at as an OTEP) cross workflows.

One aspect of doing it at the runner level is being to see exactly the time the runner starts up, etc.

@krzko
Copy link
Author

krzko commented Nov 15, 2023

@adrielp I've not had a chance to play around with the extensions as of yet or apply transforms to the data that we emit. The receiver is based on the standard internal components so there is no reasoning why that would not work.

The basis of the design was on the Zipkin (HTTP) Receiver, so whatever that supports, we would like wise.

With respect to tracing from the runner, we looked into that as well and went down another route, as we we didnt want to have the chore of updating user workflows for tracing to work.

But, to compliment this receiver, if you want additional telemetry within the steps, Since we create deterministic IDs for traces and spans, I also wrote a run-with-telemetry action that will use the receiver's step span as the parent ID and emit the associated telemetry. Already working quite well for us.

With that design, Im actually injecting a bunch of EnvVars into the shell, such as TRACEPARENT, amongst others, so if you use other tools like the excellent otel-cli by @tobert, it'll, just work (tm). Have a look at this screenshot and note the otel-cli-curl service.

image

This will provide for fine grained timings within the step. which sadly GitHub doesnt provide as they only use seconds for units of measure.

@andrzej-stencel
Copy link
Member

Anything stopping us from naming it simply "GitHub Actions" receiver instead of "GitHub Actions Event" receiver?

@krzko
Copy link
Author

krzko commented Nov 16, 2023

No there isn't, we can simplify it to that. I'll make it so.

There was some reason I used that name, but the reasoning escapes me at the moment 🤷🏻

@krzko krzko changed the title New component: Github Actions Event Receiver New component: Github Actions Receiver Nov 16, 2023
@adrielp
Copy link
Contributor

adrielp commented Nov 17, 2023

thanks for the response on that @krzko! I still maintain my earlier statement from the SIG that this is a great step in the direction for CI/CD observability 😄

My one caveat here is that to do distributed tracing, the CI/CD system needs itself needs to inject the carriers such that all pieces in the pipeline can leverage. The Jenkins plugin does a good job of this, even though environment variables as carriers wasn't part of the spec. Once it does become part of the spec, I think we're going to see vendors enable that support more broadly which would thereby enable the GitHub runner to instantiate and propagate the carriers to all steps, enabling distributed tracing. GitLab has had a related issue open for years on this. Some of our folks did a quick PoC on what'd that could look like at the runner level when using GitLab.

It's possible that some time in the (potentially "near" but still speculative) future this might become OBE. However, in the real world it definitely has value today!

I'd be curious to know @astencel-sumo & other's thoughts on that.

@adrielp
Copy link
Contributor

adrielp commented Nov 17, 2023

re: the name

Personally I'd call it the githubworkflowevents receiver due to the data it's injesting and how it lines up with the terminology on the GitHub side. But that's a SUPER long name, and hard to type 😅 .

@krzko
Copy link
Author

krzko commented Nov 23, 2023

I've renamed and refactored from githubactionseventreceiver to githubactionsreceiver via krzko@8c8d9c8.

Keeping the name short and succinct as per @astencel-sumo suggestion, as opposed to the long descriptive name.

Only thing I see outstanding is adding more test cases. Will look to add in the next couple of days.

Built and deployed internally. Working as expected.

@adrielp
Copy link
Contributor

adrielp commented Dec 7, 2023

Just a heads up @krzko - talked about this in yesterdays SIG and a sponsor has to be someone who is an approver or a maintainer in the OTEL collector contrib repo. I'm neither so can't sponsor at this time. Apologize for the confusion on my end there.

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Mar 13, 2024

I very much want a component like this, if accepted, to be in alignment with the CICD Working Group. @adrielp would you be willing to be an additional code owner with @krzko if the component is accepted?

@krzko
Copy link
Author

krzko commented Mar 14, 2024

I've added a public image of this component via this repo https://github.com/krzko/otelcol-distributions, built using ocb.

Here's the direct link to the GHCR image - https://github.com/krzko/otelcol-distributions/pkgs/container/otelcol-distributions%2Fgithubactions

@adrielp
Copy link
Contributor

adrielp commented Mar 14, 2024

@TylerHelmuth Yes, I'd be willing to be an additional code owner. @krzko has been very active in soliciting comments from the folks in #otel-cicd from day one and working to be part of the convention initiative, so I definitely think that as the conventions evolve, so will this.

My somewhat hot take is that components like this would ideally not be needed in the future, and SCM Vendors like GitHub / GitLab would just provide tracing out of the box at the runner level emitting over OTLP, and there wouldn't be post processing from event logs needed to build traces through a receiver.

We have both GitHub members & GitLab members in the CI/CD WG so maybe in the future that'll happen. Even if that end state is reached, I think there's value now in the industry for stuff like this and hopefully this will help accelerate that end goal as this evolves alongside the conventions.

The expectation, if this gets accepted, is still to follow the fundamental processes for multiple pull requests to break down the cognitive overhead of reviewing them correct?

@TylerHelmuth
Copy link
Member

The expectation, if this gets accepted, is still to follow the fundamental processes for multiple pull requests to break down the cognitive overhead of reviewing them correct?

Correct.

My somewhat hot take is that components like this would ideally not be needed in the future, and SCM Vendors like GitHub / GitLab would just provide tracing out of the box at the runner level emitting over OTLP, and there wouldn't be post processing from event logs needed to build traces through a receiver.

Agreed, this is something that makes me hesitant to put it directly in Contrib. Luckily @krzko is hosting the component already for others if they want it.

@TylerHelmuth
Copy link
Member

@krzko does this component handle measuring queue times?

@TylerHelmuth
Copy link
Member

I will sponsor this component. I believe it could end up being used to observe our actions here in Contrib.

@krzko please move forward with the first PR outlined in CONTRIBUTING.md.

@TylerHelmuth TylerHelmuth added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Mar 15, 2024
@krzko
Copy link
Author

krzko commented Mar 15, 2024

Thanks @TylerHelmuth, much appreciated.

I'll start to kick off the process outlined, over the weekend.

Yes, we will be able to derive the queue times, when waiting for the runners to pick up a job based of the implemented span attributes.

@an-mmx
Copy link
Contributor

an-mmx commented Apr 26, 2024

Hey there.
I'm also interested in this receiver and would like to use it in my project. Any updates regarding release dates? (no push, just curious)

@krzko
Copy link
Author

krzko commented Apr 30, 2024

Hey there. I'm also interested in this receiver and would like to use it in my project. Any updates regarding release dates? (no push, just curious)

Some refactoring needed to be done prior to submitting. This has now been done, so will move ahead with it.

In the mean time you can use the component via a custom collector build here;

docker pull ghcr.io/krzko/otelcol-distributions/githubactions:0.99.1

The PR readme has the steps for getting it configured.

@an-mmx
Copy link
Contributor

an-mmx commented Apr 30, 2024

Thanks for the update!

@adrielp
Copy link
Contributor

adrielp commented May 30, 2024

@krzko - any update on the getting the first skeleton pull request opened up? Wanted to make sure I hadn't missed an update here.

@krzko
Copy link
Author

krzko commented Jun 12, 2024

@krzko - any update on the getting the first skeleton pull request opened up? Wanted to make sure I hadn't missed an update here.

@adrielp Started on it, and then #life. I'll see if I can restart the effort again.

@adrielp
Copy link
Contributor

adrielp commented Aug 8, 2024

@TylerHelmuth - are you still willing to sponsor this component? I talked to @krzko and got permission to go ahead and make the contribution myself.

@TylerHelmuth
Copy link
Member

Yes

@adrielp
Copy link
Contributor

adrielp commented Aug 20, 2024

@TylerHelmuth - I'm curious on your thoughts on potentially combining components

@TylerHelmuth
Copy link
Member

Are both components pulling data from GitHub?

@adrielp
Copy link
Contributor

adrielp commented Aug 20, 2024

Are both components pulling data from GitHub?

@TylerHelmuth To an extent.

Currently the VCS*Receiver (was the Git Provider Receiver and is being renamed to better match the new CICD Semantic Conventions) only pulled repository focused metrics from GitHub. Since the receiver structure is similar to that of the host metrics receiver, the originally developed version of the receiver includes a scraper for GitLab as well. It appears that there is precedence for having more than metric scraping functionality exists within a component. The mongodbatlas receiver scrapes the api for metrics, receivers events, and receives alerts all within the same component. This component would take care of the event receiving and trace creation. The good news is, both GitHub and GitLab are somewhat similar in implementation both scraping & event receiving. (others like bitbucket do too)

Assuming it's reasonable to turn the receiver into support multiple vendor events, and given the change(s) in and coming to SemConv (alongside the need for more built prototypes) this could be a reasonable combination.

@TylerHelmuth
Copy link
Member

Assuming it's reasonable to turn the receiver into support multiple vendor events

There are probably risks here since the receiver would need to maintain multiple logic flows for interacting with the different APIs/data models. In my mind it is safest if there was a receiver just for grabbing telemetry from Github.

I don't know this space well enough to feel strongly about that opinion. I'd be curious what @andrzej-stencel thinks.

@adrielp
Copy link
Contributor

adrielp commented Aug 21, 2024

There are probably risks here since the receiver would need to maintain multiple logic flows for interacting with the different APIs/data models. In my mind it is safest if there was a receiver just for grabbing telemetry from Github.

I agree. With scraping it was easy to handle it. Same metadata metrics, just a uniquely named scraper.

I was thinking at a high level, because the implementation are similar, that the model could be the same, but one would configure web hooks separately like done with the scraper. I can bring up in the collector SIG today to talk more through it. Reason I'm thinking about it quickly now is because I'm 1) renaming the git provider receiver and the question got brought up there && 2) my next thing to do was contribute the skeleton for first iteration of the GitHub actions receiver.

@TylerHelmuth
Copy link
Member

Outcome of the SIG discussion today was to split the gitprovider receiver into github and gitlab specific receviers and then add this trace/log functionality to the github specific receiver

@msarahan
Copy link

@adrielp are you actively working on this? I want to use this feature, and I am happy to help work on the code to help move things forward. If you have work in progress that I can contribute to, please point me to it.

@adrielp
Copy link
Contributor

adrielp commented Aug 30, 2024

@adrielp are you actively working on this? I want to use this feature, and I am happy to help work on the code to help move things forward. If you have work in progress that I can contribute to, please point me to it.

@msarahan - I haven't started yet. We recently decided to combine this functionality into the gitprovider receiver which is being renamed (and purposed) to be the github receiver providing a perfect landing spot for this functionality. My order of operations is currently:

  • get that renamed merged in
  • promote it to be an alpha component (and do some minor additional cleanups)
  • add the skeleton of the tracing portion into the receiver
  • iterate and bring that functionality it

With all that said, super happy to get help so if you'd like to make a skeleton PR w/ the functionality into the GitHub receiver and get things rolling quickly, feel free! If not, I'll have that skeleton hopefully up and going hopefully next week, ready for iterations.

TylerHelmuth pushed a commit that referenced this issue Sep 3, 2024
**Description:**

Adding @TylerHelmuth (GitHub Actions Receiver component sponsor) as
codeowner in the GitHub Receiver after the decision to incorporate the
tracing and log capability mentioned in #27460 into the GitHub receiver
was made.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored
Projects
None yet
Development

No branches or pull requests

7 participants