Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Download backing image failed with HTTP 502 error if Storage Network configured #4807

Closed
WebberHuang1118 opened this issue Dec 4, 2023 · 9 comments
Assignees
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release priority/0 Must be fixed in this release reproduce/always Reproducible 100% of the time severity/1 Function broken (a critical incident with very high impact)
Milestone

Comments

@WebberHuang1118
Copy link
Member

Describe the bug (馃悰 if you encounter this issue)

If the Storage Network is configured, downloading backing image will fail with HTTP 502 error, and LH manager appears error message http: proxy error: dial tcp xxxxx:8001: i/o timeout

To Reproduce

Steps:

  • Creating Harvester cluster (single node is sufficient)
  • Uploading an image to Harvester
  • Enable Storage Network
  • Download the image from Harvester UI

Result:

  • Backing image download will fail with HTTP 502 error

Expected behavior

Backing image download success

Environment

  • Longhorn version: v1.4.3
  • Harvester version: v1.2.1

Some LH source code observation

It's brought by LH manager tries to access backing-image-manager via URL from Storage-Network, but LH manager pods itself are not included in Storage-Network, following are the details:

I've tried to have modifications to LH manager (force access backing image manager with pod IP) and updated it to my environment, the image could be downloaded during Storage-Network configured.

@WebberHuang1118 WebberHuang1118 added kind/bug Issues that are defects reported by users or that we know have reached a real release reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one labels Dec 4, 2023
@WebberHuang1118
Copy link
Member Author

The solution should be tracked on LH issue #7326

@WebberHuang1118 WebberHuang1118 self-assigned this Dec 4, 2023
@irishgordo
Copy link

@WebberHuang1118 thanks for reporting this. I do have a 192.168.14.0/24 network configured as a storage-network for a 2 node bare-metal v1.2.1 Harvester cluster and attempted to reproduce this.
I'm not able to.
As I'm able to download the image from a file-server that the 14.0/24 network can communicate out to (and others) and not encounter the issue of that backing-image running into things.
I also am able to use that image in a VM - and validate that the replicas in longhorn have the corresponding Storage IPs I can see from the switch level on my network.

test4807.mp4

@irishgordo irishgordo added reproduce/rare Reproducible less than 10% of the time and removed reproduce/needed Reminder to add a reproduce label and to remove this one labels Dec 4, 2023
@WebberHuang1118
Copy link
Member Author

@WebberHuang1118 thanks for reporting this. I do have a 192.168.14.0/24 network configured as a storage-network for a 2 node bare-metal v1.2.1 Harvester cluster and attempted to reproduce this. I'm not able to. As I'm able to download the image from a file-server that the 14.0/24 network can communicate out to (and others) and not encounter the issue of that backing-image running into things. I also am able to use that image in a VM - and validate that the replicas in longhorn have the corresponding Storage IPs I can see from the switch level on my network.

test4807.mp4

@irishgordo This issue is about downloading an existing image from Harvester to local rather than fetching an image to Harvester from a particular URL. Sometimes, the context is a little confusing, thanks for your verification :)

@bk201 bk201 added reproduce/always Reproducible 100% of the time priority/0 Must be fixed in this release severity/1 Function broken (a critical incident with very high impact) and removed reproduce/rare Reproducible less than 10% of the time severity/needed Reminder to add a severity label and to remove this one labels Dec 5, 2023
@bk201 bk201 added this to the v1.3.0 milestone Dec 5, 2023
@bk201
Copy link
Member

bk201 commented Dec 5, 2023

Tentative 1.3.0, if longhorn/longhorn#7236 can't make it we move to the next milestone.

@WebberHuang1118
Copy link
Member Author

Wating for LH v1.5.4

@WebberHuang1118
Copy link
Member Author

LH PR longhorn/backing-image-manager#151 is merged.
We can re-verify as Harvester bump LH to v1.6.0

@harvesterhci-io-github-bot
Copy link

harvesterhci-io-github-bot commented Jan 29, 2024

Pre Ready-For-Testing Checklist

  • If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted?
    The HEP PR is at:

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

    • Creating Harvester cluster (single node is sufficient)
    • Uploading an image to Harvester
    • Enable Storage Network
    • Download the image from Harvester UI
    • The image should be downloaded successfully
  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

    • Directly downloading image file from the node
  • Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)?
    The PR is at: fix(download): provide pod ip to longhorn manager when download backing image to local聽longhorn/backing-image-manager#151

    • Does the PR include the explanation for the fix or the feature?

    • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
      The PR for the YAML change is at:
      The PR for the chart change is at:

  • If labeled: area/ui Has the UI issue filed or ready to be merged?
    The UI issue/PR is at:

  • If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged?
    The documentation/KB PR is at:

  • If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?

    • The automation skeleton PR is at:
    • The automation test case PR is at:
  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at:

@harvesterhci-io-github-bot

Automation e2e test issue: harvester/tests#1083

@lanfon72
Copy link
Member

Verified this bug has been fixed.

Test Information

  • Environment: qemu/KVM 2 nodes
  • Harvester Version: v1.3.0-rc1 with Longhorn v1.6.0-rc2
  • ui-source Option: Auto

Verify Steps

  1. Install Harvester with any nodes
  2. Follow Steps in [BUG] Download backing image failed with HTTP 502 error if Storage Network configured聽#4807 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release priority/0 Must be fixed in this release reproduce/always Reproducible 100% of the time severity/1 Function broken (a critical incident with very high impact)
Projects
None yet
Development

No branches or pull requests

5 participants