Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get logs for all pipelines, even if they're not running #9749

Merged
merged 2 commits into from Feb 20, 2024

Conversation

jrockway
Copy link
Member

@jrockway jrockway commented Feb 16, 2024

The current algorithm for determining which pods to get logs for involves looking at running pods. This is not correct for pipelines; a pipeline could be autoscaled to 0. This PR adjusts the algorithm to add in any pipelines that don't currently have pods to the lokiLogs request. For example:

$ pachctl debug template
...
    describes:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
    logs:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
    lokiLogs:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: default/montage
          pipeline:
            name: montage
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
...

Montage has autoscaling and isn't currently up; it now appears in lokiLogs, but not logs or describes.

@jrockway jrockway force-pushed the jonathan/core-2144-get-logs-for-all-pipelines branch from 298d6b5 to d70c737 Compare February 17, 2024 00:02
Copy link

codecov bot commented Feb 17, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (435af77) 59.10% compared to head (c78845c) 59.16%.

Files Patch % Lines
src/server/debug/server/server.go 96.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9749      +/-   ##
==========================================
+ Coverage   59.10%   59.16%   +0.05%     
==========================================
  Files         582      582              
  Lines       69880    69904      +24     
==========================================
+ Hits        41304    41358      +54     
+ Misses      27983    27964      -19     
+ Partials      593      582      -11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

func (s *debugServer) listApps(ctx context.Context) (_ []*debug.App, retErr error) {
// list apps returns a list of running apps, and a list of apps which may possibly exist. The
// intent is to use the first result for things like "kubectl describe" and the second for getting
// loki logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be refactored? Might it be clearer to merge podApps (which would be the old listApps and pipelineApps (which would generate all apps for all known pipelines), rather than in GetDumpV2Template listing pipelines, passing that into listApps, then in listApps iterating over running pods, deleting a pipeline pod’s associated pod from the list of pipelines, then creating an app for each remaining pipeline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be implemented in some completely different way; running pods should only affect logs, loki should be queried to enumerate all logs that loki can get, etc. But this is a release blocker because everyone is using autoscaling now and all the debug dumps that people are generating are useless, so I'm trying to keep it as minimal as possible for the backport.

@jrockway jrockway force-pushed the jonathan/core-2144-get-logs-for-all-pipelines branch from d70c737 to c78845c Compare February 20, 2024 16:32
Copy link
Contributor

@acohen4 acohen4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jrockway jrockway merged commit 21e224b into master Feb 20, 2024
21 checks passed
jrockway added a commit that referenced this pull request Feb 20, 2024
The current algorithm for determining which pods to get logs for
involves looking at running pods. This is not correct for pipelines; a
pipeline could be autoscaled to 0. This PR adjusts the algorithm to add
in any pipelines that don't currently have pods to the lokiLogs request.
For example:

```
$ pachctl debug template
...
    describes:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
    logs:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
    lokiLogs:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: default/montage
          pipeline:
            name: montage
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
...
```
Montage has autoscaling and isn't currently up; it now appears in
lokiLogs, but not logs or describes.
acohen4 pushed a commit that referenced this pull request Feb 21, 2024
… (#9750)

The current algorithm for determining which pods to get logs for
involves looking at running pods. This is not correct for pipelines; a
pipeline could be autoscaled to 0. This PR adjusts the algorithm to add
in any pipelines that don't currently have pods to the lokiLogs request.
For example:

```
$ pachctl debug template
...
    describes:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
    logs:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
    lokiLogs:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: default/montage
          pipeline:
            name: montage
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
...
```
Montage has autoscaling and isn't currently up; it now appears in
lokiLogs, but not logs or describes.
zmajeed pushed a commit that referenced this pull request Feb 21, 2024
The current algorithm for determining which pods to get logs for
involves looking at running pods. This is not correct for pipelines; a
pipeline could be autoscaled to 0. This PR adjusts the algorithm to add
in any pipelines that don't currently have pods to the lokiLogs request.
For example:

```
$ pachctl debug template
...
    describes:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
    logs:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
    lokiLogs:
        - name: console
        - name: default/edges
          pipeline:
            name: edges
            project: default
        - name: default/montage
          pipeline:
            name: montage
            project: default
        - name: etcd
        - name: pachd
        - name: pachyderm-kube-event-tail
        - name: pachyderm-proxy
        - name: pg-bouncer
...
```
Montage has autoscaling and isn't currently up; it now appears in
lokiLogs, but not logs or describes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants