must-gather: Add kernel level logs #893

rajatcing · 2020-11-11T12:31:24Z

Signed-off-by: RAJAT SINGH rajasing@redhat.com
Fixes #840

rajatcing · 2020-11-11T12:32:17Z

@jarrpa I have some doubts that need some clarification.

must-gather/collection-scripts/gather_clusterscoped_resources

rajatcing · 2020-11-12T12:05:02Z

Well, collection from paths /var/log works. But the script keeps looping fo one of the pod.

sent 138 bytes  received 160035 bytes  24642.00 bytes/sec
total size is 159588  speedup is 1.00
collecting finished for node ip-10-0-131-8.ec2.internal
collecting kernel logs  from node ip-10-0-244-172.ec2.internal
Starting pod/ip-10-0-244-172ec2internal-debug ...
To use host binaries, run `chroot /host`
waiting for the debug pod to be in ready state
waiting for the debug pod to be in ready state
waiting for the debug pod to be in ready state
waiting for the debug pod to be in ready state
waiting for the debug pod to be in ready state
waiting for the debug pod to be in ready state

cc @crombus

rajatcing · 2020-11-12T12:07:34Z

/hold

crombus

Thanks for the PR.

must-gather/collection-scripts/gather_clusterscoped_resources

crombus

based on comments on the bug. Create func for kernel level logs which will include whole of the /val/log folder using rsync and journalctl -o verbose cmd. Call that func inside the crash_collection() before rsync cmd and make sure to start a different process. This will work fine for now.

rajatcing · 2020-11-24T05:45:37Z

That makes sense but again, there's some issue that I'm hitting that you might need to take a look at.

rajatcing · 2020-11-24T05:54:14Z

Added some code to take a look for you 👍

must-gather/collection-scripts/gather_ceph_resources

rajatcing · 2020-12-08T19:45:59Z

@crombus While I see what you're going for with requesting the functions be moved to the cluster-scoped file, I think it makes sense to leave them in the Ceph file. Though they operate on Nodes, which are cluster-scoped, they are being run for the purpose of collecting system-level information to troubleshoot Ceph problems specifically. And we certainly want to avoid looping through nodes more times than is strictly necessary.

Right Jose, that is somewhat I was thinking, It does not make sense to start debug pods twice, once for journal collection and then, once again for ceph related collection. We can keep it as it is for now and will start making changes in the next PRs as I briefed above.

crombus · 2020-12-09T14:12:36Z

@rajatsing I tested in my local I had some strange issues

[must-gather-5mnnl] POD rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1650) [Receiver=3.1.2]
[must-gather-5mnnl] POD rsync: [Receiver] write error: Broken pipe (32)
[must-gather-5mnnl] POD error: exit status 23

rajatcing · 2020-12-09T14:25:26Z

@rajatsing I tested in my local I had some strange issues

[must-gather-5mnnl] POD rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1650) [Receiver=3.1.2]
[must-gather-5mnnl] POD rsync: [Receiver] write error: Broken pipe (32)
[must-gather-5mnnl] POD error: exit status 23

I believe that these are coming because the path does not exist

[must-gather-h7hzm] POD rsync: link_stat "/host/var/log/sysstat" failed: No such file or directory (2)
[must-gather-h7hzm] POD rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1659) [Receiver=3.1.3]
[must-gather-h7hzm] POD rsync: [Receiver] write error: Broken pipe (32)

rajatcing · 2020-12-10T10:43:12Z

I don't understand why CI fails when it passes locally make ocs-operator-ci. Are there any issues with the CI, it also does not give me some proper error info to work on.

error: some steps failed:
  * could not run steps: step ocs-operator-ci failed: test "ocs-operator-ci" failed: the pod ci-op-rxlt4gb8/ocs-operator-ci was deleted without completing after 30s (failed containers: )
time="2020-12-10T10:17:07Z" level=info msg="Reporting job state 'failed' with reason 'executing_graph:step_failed:running_pod'"

crombus

overall look good. just one request I think you can merge the two functions journal_collection and kernel_collections

must-gather/collection-scripts/gather_ceph_resources

Signed-off-by: RAJAT SINGH <rajasing@redhat.com>

crombus

perfect.

openshift-ci-robot · 2020-12-10T11:58:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crombus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~must-gather/OWNERS~~ [crombus]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-merge-robot · 2020-12-10T13:08:42Z

@rajatsing: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/red-hat-storage-ocs-ci-e2e-aws	`69903ed`	link	`/test red-hat-storage-ocs-ci-e2e-aws`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

crombus · 2020-12-11T05:58:57Z

/hold until the decision is made on this.

openshift-bot · 2021-03-11T08:10:51Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-04-10T10:05:31Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-ci-robot · 2021-04-10T10:05:38Z

@rajatsing: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-bot · 2021-05-10T13:01:59Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2021-05-10T13:02:15Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2020

openshift-ci-robot requested review from crombus and umangachapagain November 11, 2020 12:31

rajatcing commented Nov 11, 2020

View reviewed changes

must-gather/collection-scripts/gather_clusterscoped_resources Outdated Show resolved Hide resolved

rajatcing commented Nov 11, 2020

View reviewed changes

must-gather/collection-scripts/gather_clusterscoped_resources Outdated Show resolved Hide resolved

rajatcing force-pushed the must-gather-ceph-commands branch from 873fed9 to 1c64dc0 Compare November 12, 2020 12:02

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 12, 2020

rajatcing marked this pull request as draft November 12, 2020 12:10

crombus reviewed Nov 16, 2020

View reviewed changes

must-gather/collection-scripts/gather_clusterscoped_resources Outdated Show resolved Hide resolved

nbalacha reviewed Nov 18, 2020

View reviewed changes

must-gather/collection-scripts/gather_clusterscoped_resources Outdated Show resolved Hide resolved

crombus suggested changes Nov 23, 2020

View reviewed changes

rajatcing force-pushed the must-gather-ceph-commands branch from 1c64dc0 to 67fc1fd Compare November 24, 2020 05:44

rajatcing force-pushed the must-gather-ceph-commands branch 2 times, most recently from e1fb847 to 8d29cd9 Compare November 24, 2020 05:49

crombus suggested changes Nov 24, 2020

View reviewed changes

must-gather/collection-scripts/gather_ceph_resources Outdated Show resolved Hide resolved

crombus added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 24, 2020

rajatcing force-pushed the must-gather-ceph-commands branch 3 times, most recently from 72c3cd2 to dcc1b67 Compare November 26, 2020 11:44

jarrpa reviewed Nov 30, 2020

View reviewed changes

must-gather/collection-scripts/gather_ceph_resources Outdated Show resolved Hide resolved

rajatcing force-pushed the must-gather-ceph-commands branch from dcc1b67 to e8150b8 Compare December 1, 2020 10:45

rajatcing commented Dec 1, 2020

View reviewed changes

must-gather/collection-scripts/gather_ceph_resources Outdated Show resolved Hide resolved

rajatcing requested a review from jarrpa December 1, 2020 14:02

rajatcing force-pushed the must-gather-ceph-commands branch 2 times, most recently from 2b1a0ab to 3c96f2d Compare December 2, 2020 08:31

rajatcing force-pushed the must-gather-ceph-commands branch 3 times, most recently from d4a9878 to a2db3a6 Compare December 9, 2020 11:06

rajatcing requested a review from crombus December 9, 2020 14:09

rajatcing force-pushed the must-gather-ceph-commands branch from a2db3a6 to 8f71e48 Compare December 10, 2020 10:13

rajatcing force-pushed the must-gather-ceph-commands branch from 8f71e48 to 82c1b84 Compare December 10, 2020 11:09

crombus suggested changes Dec 10, 2020

View reviewed changes

must-gather/collection-scripts/gather_ceph_resources Outdated Show resolved Hide resolved

must-gather: Add journal logs

69903ed

Signed-off-by: RAJAT SINGH <rajasing@redhat.com>

rajatcing force-pushed the must-gather-ceph-commands branch from 82c1b84 to 69903ed Compare December 10, 2020 11:54

rajatcing requested a review from crombus December 10, 2020 11:55

crombus approved these changes Dec 10, 2020

View reviewed changes

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 10, 2020

crombus added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 11, 2020

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 11, 2021

openshift-ci bot closed this May 10, 2021

must-gather: Add kernel level logs #893

must-gather: Add kernel level logs #893

Uh oh!

Conversation

rajatcing commented Nov 11, 2020

Uh oh!

rajatcing commented Nov 11, 2020

Uh oh!

Uh oh!

Uh oh!

rajatcing commented Nov 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rajatcing commented Nov 12, 2020

Uh oh!

crombus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

crombus left a comment

Choose a reason for hiding this comment

Uh oh!

rajatcing commented Nov 24, 2020

Uh oh!

rajatcing commented Nov 24, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rajatcing commented Dec 8, 2020

Uh oh!

crombus commented Dec 9, 2020

Uh oh!

rajatcing commented Dec 9, 2020

Uh oh!

rajatcing commented Dec 10, 2020

Uh oh!

crombus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

crombus left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Dec 10, 2020

Uh oh!

openshift-merge-robot commented Dec 10, 2020

Uh oh!

crombus commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-bot commented Mar 11, 2021

Uh oh!

openshift-bot commented Apr 10, 2021

Uh oh!

openshift-ci-robot commented Apr 10, 2021

Uh oh!

openshift-bot commented May 10, 2021

Uh oh!

openshift-ci bot commented May 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

rajatcing commented Nov 12, 2020 •

edited

Loading

crombus commented Dec 11, 2020 •

edited

Loading