Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPROVEMENT] Support bundle doesn't collect the snapshot yamls #4285

Closed
innobead opened this issue Jul 27, 2022 · 14 comments
Closed

[IMPROVEMENT] Support bundle doesn't collect the snapshot yamls #4285

innobead opened this issue Jul 27, 2022 · 14 comments
Assignees
Labels
area/troubleshoot Troubleshoot related backport/1.3.1 component/longhorn-manager Longhorn manager (control plane) kind/improvement Request for improvement of existing function
Milestone

Comments

@innobead
Copy link
Member

TODO for Longhorn team: the support bundle doesn't collect the snapshot yamls

Originally posted by @PhanLe1010 in #4278 (comment)

@innobead innobead added this to the v1.4.0 milestone Jul 27, 2022
@innobead innobead added backport/1.3.1 component/longhorn-manager Longhorn manager (control plane) labels Jul 27, 2022
@innobead innobead changed the title TODO for Longhorn team: the support bundle doesn't collect the snapshot yamls [IMPROVEMENT] Support bundle doesn't collect the snapshot yamls Jul 27, 2022
@innobead innobead added the area/troubleshoot Troubleshoot related label Jul 27, 2022
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Jul 30, 2022

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at: [IMPROVEMENT] Support bundle doesn't collect the snapshot yamls #4285 (comment)

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Does the PR include the explanation for the fix or the feature?

  • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
    The PR for the YAML change is at:
    The PR for the chart change is at:

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at Collect the snapshot yamls in support bundle longhorn-manager#1439

  • Which areas/issues this PR might have potential impacts on?
    Area Support bundle
    Issues

  • If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    The LEP PR is at

  • If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    The UI issue/PR is at

  • If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at

  • If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at
    The automation test case PR is at
    The issue of automation test case implementation is at (please create by the template)

  • If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    The engine automation PR is at

  • If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at

  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at

@PhanLe1010
Copy link
Contributor

PhanLe1010 commented Jul 30, 2022

Test steps:

  1. Create/attach a 150 volumes
  2. Create a new recurring snapshot jobs that select all volume, take snapshot every minute, and retain 100 snapshots
  3. Wait for 101 minutes
  4. Generate a support bundle
  5. Verify that snapshot yamls are in /yaml/longhorn/snapshots

@joshimoo
Copy link
Contributor

Probably a good idea to test with a large set of snapshots, the generation should work fine and not timeout.

@innobead
Copy link
Member Author

innobead commented Aug 1, 2022

@PhanLe1010 Please move to ready-for-testing if this and the backport are ready. I also suggest having scaling testing (simulation should be fine) to see if any impact on the bundle generation.

@PhanLe1010
Copy link
Contributor

PhanLe1010 commented Aug 1, 2022

Please move to ready-for-testing if this and the backport are ready.

Thank you. Done

also suggest having scaling testing (simulation should be fine) to see if any impact on the bundle generation.

I updated the test steps to generate 15k snapshots

@yangchiu
Copy link
Member

yangchiu commented Aug 4, 2022

Tested failed on master-head (longhorn-manager bd383f9).

Test steps:
(1) Create/attach a 150 volumes
(2) Create a new recurring snapshot jobs that select all volume, take snapshot every minute, and retain 50 snapshots
(Setting retain to 100 will show error message: unable to create recurring job c-u6rz9o: admission webhook "validator.longhorn.io" denied the request: retain in body should be less than or equal to 50 )
(3) Wait for 51 minutes
(4) Generate a support bundle => Get 504 Timeout error (Could be from /v1/supportbundles POST call). UI stuck in Generating.... Unable to get support bundle.
(5) After the error, close/reopen the UI page and try to regenerate the support bundle, still unable to get the support bundle, the API /v1/supportbundles/ip-10-0-1-180/longhorn-support-bundle_c6d6cc3d-1a42-46a6-8c9f-cf6adaf9fbe8_2022-08-04T07-56-06Z keeps polling for 10 minutes until it gets non-zero progressPercentage and starts the downloading.

==> User can get the support bundle, but may need to retry or wait. Is it OK?

@PhanLe1010 @innobead

@innobead
Copy link
Member Author

innobead commented Aug 4, 2022

How about the content of the downloaded support bundle?

@yangchiu
Copy link
Member

yangchiu commented Aug 4, 2022

How about the content of the downloaded support bundle?

The content of the downloaded support bundle is intact.
In the above test scenario, the downloaded support bundle is with compressed size 11.6MB, uncompressed size 204.2MB, and the snapshots.yaml is 44.3MB in size, with 2,841,199 lines.

@innobead
Copy link
Member Author

innobead commented Aug 4, 2022

LGTM, but we need to have a notice in our doc.

@PhanLe1010 WDYT?

@PhanLe1010
Copy link
Contributor

Generate a support bundle => Get 504 Timeout error (Could be from /v1/supportbundles POST call). UI stuck in Generating.... Unable to get support bundle.

This is unexpected as the init request is async: link . Can I access your env?

After the error, close/reopen the UI page and try to regenerate the support bundle, still unable to get the support bundle, the API /v1/supportbundles/ip-10-0-1-180/longhorn-support-bundle_c6d6cc3d-1a42-46a6-8c9f-cf6adaf9fbe8_2022-08-04T07-56-06Z keeps polling for 10 minutes until it gets non-zero progressPercentage and starts the downloading

This is expected. Longhorn generate support bundle by generating yamls and logs. There is no progress at the beginning when Longhorn try to generate yamls. We can improve on this.

@yangchiu
Copy link
Member

yangchiu commented Aug 5, 2022

Generate a support bundle => Get 504 Timeout error (Could be from /v1/supportbundles POST call). UI stuck in Generating.... Unable to get support bundle.

This is unexpected as the init request is async: link . Can I access your env?

The env has been destroyed. But once I can reproduce it, I'll let you know!

@yangchiu
Copy link
Member

yangchiu commented Aug 5, 2022

Test steps:

  1. Create/attach a 150 volumes
  2. Create a new recurring snapshot jobs that select all volume, take snapshot every minute, and retain 100 snapshots
  3. Wait for 101 minutes
  4. Generate a support bundle
  5. Verify that snapshot yamls are in /yaml/longhorn/snapshots

@PhanLe1010 one more question: In the step 5: Verify that snapshot yamls are in /yaml/longhorn/snapshots

Seems it expects separate yaml files for different snapshots in the /yaml/longhorn/snapshots folder, but currently in the downloaded support bundle, all the yaml contents of all the snapshot are concatenated into the same snapshots.yaml file, and there is no /yaml/longhorn/snapshots folder. Just like the attached file:
longhorn-support-bundle_0b2b7f6e-9bd1-4ccf-a435-c45106ef7ab0_2022-08-05T03-54-40Z.zip

Is it what we expected?

@PhanLe1010
Copy link
Contributor

Seems it expects separate yaml files for different snapshots in the /yaml/longhorn/snapshots folder, but currently in the downloaded support bundle, all the yaml contents of all the snapshot are concatenated into the same snapshots.yaml file, and there is no /yaml/longhorn/snapshots folder. Just like the attached file:

Sorry for the confusion. All the yaml contents of all the snapshot are concatenated into the same snapshots.yaml is the correct behavior

@yangchiu
Copy link
Member

yangchiu commented Aug 6, 2022

I'm unable to reproduce the 504 timeout error again on either master-head or v1.3.x-head.
The support bundle can be generated and downloaded successfully, and the content of snapshots.yaml are as our expected.
The functionality of collecting snapshot yamls in support bundle works, so close this ticket for now.

@yangchiu yangchiu closed this as completed Aug 6, 2022
@innobead innobead added the kind/improvement Request for improvement of existing function label Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/troubleshoot Troubleshoot related backport/1.3.1 component/longhorn-manager Longhorn manager (control plane) kind/improvement Request for improvement of existing function
Projects
None yet
Development

No branches or pull requests

5 participants