Failing download. #10592

dgarner-cg · 2023-11-03T23:05:01Z

I have attempted everything to resolve this issue .. for over a week or more.
It's getting frustrating.
I've attempted this with the newest version of everything involved (Ansible, Kubespray, etc..) in standard OS and alternatively I've attempted this in a venv with requirements.txt versions of everything.
I've attempted to eliminate all troubleshooting options possible before posting and it always comes down to the same Download section.
I'm seeing a lot of info I haven't seen before with this inventory but I have to run to Rx before it closes and want to post immediately as it's already been an issue for actually more like 3 weeks, just the last week I have consistently focused on it.

fyi..

• pve-cos-pri: Outside server not involved in k8s cluster, this can be considered the primary server of the network.
• pve-k8s-... obviously cluster
• Primary network subnet: 10.0.0.0/24
• "DCHP" slots for k8s: 10.0.0.82 - 88
• dnssubdomain is separate for k8s cluster and is on the mgmt.sub.domain.tld portion.
• All machines have good dhcp/dns/resolve curl ifconfig.me properly..

Can't think of much else outside of the process it could be / to run through.. now onto the other stuff and I'll be back later.

Thanks guys,

Environment:
Local Baremetal Proxmox,
Dual Socket Xeon Gold 6148 80 Core with 256 GB RAM.

OS:
Control Server: Debian GNU/Linux 12 (Bookworm) Linux 6.1.0-13-amd64 x86_64

7 Node K8s Cluster, all the same.

Version of Ansible (ansible --version):
2.14.11
Version of Python (python --version):
Python 3.11.2

Kubespray version (commit) (git rev-parse --short HEAD):
22f58a5

Network plugin used:
Calico

Full inventory with variables:

https://gist.github.com/dgarner-cg/c5ea336fdc78b369145cf52cd075dfee

Command used to invoke ansible:

ansible-playbook
-i inventory/k8-mg/hosts.yaml
--private-key=~/.ssh/id_rsa
-u root
--become
cluster.yml

Output of ansible run:

https://gist.github.com/dgarner-cg/3f57fe502a970ead3529ac7fd836b043

Anything else do we need to know:
I would look into why this is throwing as this may be another issue, but I've got to run out before Rx closes rq..

https://gist.github.com/dgarner-cg/d055057c89634705e8366b14208c5223

The text was updated successfully, but these errors were encountered:

dgarner-cg · 2023-11-04T03:53:27Z

This is insanely frustrating.

I've added the following to /roles/.../download_files.yml

- name: Download_file | Download item
  block:
    - name: Download file
      get_url:
        url: "{{ valid_mirror_urls | random }}"
        dest: "{{ file_path_cached if download_force_cache else download.dest }}"
        owner: "{{ omit if download_localhost else (download.owner | default(omit)) }}"
        mode: "{{ omit if download_localhost else (download.mode | default(omit)) }}"
        checksum: "{{ 'sha256:' + download.sha256 if download.sha256 else omit }}"
        validate_certs: "{{ download_validate_certs }}"
        url_username: "{{ download.username | default(omit) }}"
        url_password: "{{ download.password | default(omit) }}"
        force_basic_auth: "{{ download.force_basic_auth | default(omit) }}"
        timeout: "{{ download.timeout | default(omit) }}"
        delegate_to: "{{ download_delegate if download_force_cache else inventory_hostname }}"
        run_once: "{{ download_force_cache }}"
        register: get_url_result
        become: "{{ not download_localhost }}"
        environment: "{{ proxy_env }}"
        no_log: "{{ not (unsafe_show_logs | bool) }}"
    
    - name: Handle Download Errors
      fail:
        msg: "Download failed: {{ get_url_result.msg }}"
      when: get_url_result.failed

  rescue:
- name: Retry on failure
  debug:
    msg: "Retrying download..."
  register: retry_debug_result
  until: "'OK' in get_url_result.msg or 'file already exists' in get_url_result.msg"
  retries: "{{ download_retries }}"
  delay: "{{ retry_stagger | default(5) }}"
  when: retry_debug_result is not defined or retry_debug_result.failed

  always:
    - name: Print Results
      debug:
        var: get_url_result

And below is further output ...

https://gist.github.com/dgarner-cg/064541f36bbac6b3ea49590f759989b0

dgarner-cg · 2023-11-09T19:31:06Z

Bro, same ish on all Ubuntu systems.. what the f.

FaraSys · 2023-11-11T17:14:48Z

Hi.
I have the same issue on Ubuntu 22.04 LTS
Kubespray Release 2.23.1

dgarner-cg · 2023-11-11T17:16:04Z

I am .. making progress, I have literally been working on this for a week.

arusa · 2023-11-19T12:53:21Z

I experience very similar problems. DNS problems coming up all the time. Currently I'm trying to add a new node and cp-node using cluster.yml and scale.yml and it always results in servers not being able to download stuff because kubespray updated their /etc/systemd/resolved.conf to resolve using coredns, but they don't have access to coredns yet :(

I'm very happy that I'm only running a test cluster.

dgarner-cg · 2023-11-19T12:59:59Z

Thanks for your feedback, I want to say that all the nodes are reachable via valid DNS but I will check.. I know my outside-of-cluster installer controller and 2 k8s-controller nodes all have valid DNS from here to Google, but I also had no ide about the CoreDNS issue either ..

I am looking as I have time to ensure Cilium is used across all files and use a local repo, but work has picked up going into the Holiday, just got off a 7 straight week, 24/7 on call stretch. :D

I will take a look at this again in a few moments and hope to knock it out.

arusa · 2023-11-19T13:20:17Z

I finally managed to add the new node. When I saw in the ansible output, that it just updated the /etc/systemd/resolved.conf file, I quickly opened it up on the new node and changed the line:

DNS=10.233.0...

to

DNS=1.1.1.1

and ran:

systemctl restart systemd-resolved.service

This way the node managed to finish all downloads executed by ansible. And in the end the resolved.conf was already changed back to use the coredns service as a resolver.

Btw. I also had to set enable_nodelocaldns to false yesterday, because I had a similar resolving problem while rolling out some changes using kubespray. At one point the nodes couldn't resolve anything because the nodelocaldns iptables rules probably weren't ready.

So DNS feels generally very fragile with Kubespray.

marvin0815 · 2023-11-22T10:12:31Z

No idea if it's new but GitHub now gives a 401 Forbidden for me when validating mirrors in Kubespray

See:
curl -vJL -X HEAD https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.24.0/crictl-v1.24.0-linux-amd64.tar.gz

marvin0815 · 2023-11-23T15:16:36Z

I get a 200 again today. Sill modified the download role to check with GET instead of HEAD to deploy.

diff --git a/roles/download/tasks/download_file.yml b/roles/download/tasks/download_file.yml
index 376a15e8a..88f83c8cb 100644
--- a/roles/download/tasks/download_file.yml
+++ b/roles/download/tasks/download_file.yml
@@ -55,7 +55,7 @@
   - name: download_file | Validate mirrors
     uri:
       url: "{{ mirror }}"
-      method: HEAD
+      method: GET
       validate_certs: "{{ download_validate_certs }}"
       url_username: "{{ download.username | default(omit) }}"
       url_password: "{{ download.password | default(omit) }}"

Just in case it's a random bug in GitHub's cache system or something.

mdbudnick · 2023-12-15T04:47:06Z

It looks like I am having a similar issue and this is my output when running with the block: and outputting get_url_result:

ok: [workernode-3] => {
    "get_url_result": {
        "attempts": 4,
        "changed": false,
        "checksum_dest": null,
        "checksum_src": "d11d2f438da1892c8b1bdfc638ddb6764dbd0e2c",
        "dest": "/tmp/releases/runc-v1.1.9.arm64",
        "elapsed": 0,
        "failed": true,
        "msg": "Destination /tmp/releases does not exist",
        "src": "/home/mb/.ansible/tmp/ansible-tmp-1702614795.2398012-24550-25178834028643/tmpr_hbccf9",
        "url": "https://github.com/opencontainers/runc/releases/download/v1.1.9/runc.arm64"
    }
}

Please note: "Destination /tmp/releases does not exist" is not the issue as it fails with the same msg after adding an explicit file task before to create the directory.

Edit There is no checksum issue

I will try v2.22.1 and other versions and investigate the difference if I get it to work.

mdbudnick · 2023-12-16T20:37:18Z

Nevermind me, TIL --check has major limitations. This is my first Ansible playbook outside of tutorials, in my defense.

VannTen · 2024-01-16T15:32:04Z

Kubespray version (commit) (git rev-parse --short HEAD):
22f58a5

I can't find this commit in the repository.
From your gist

{{ etcd_supported_versions[kube_major_version] }}: 'dict object' has no attribute 'v1.28'. 'dict object' has no attribute 'v1.28'. {{ etcd_supported_versions[kube_major_version] }}: 'dict object' has no attribute 'v1.28'. 'dict object' has no attribute 'v1.28'\n\nThe error appears to be in '/etc/ansible/usr-playbooks/cg-k8-ctrl/roles/download/tasks/download_file.yml': line 10, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Download_file | Starting download of file\n ^ here\n"}

Looks like you tried to use 1.28 on unsupported versions.
I'm going to close this, feel free to reopen if you actually still encounter a bug
/close

k8s-ci-robot · 2024-01-16T15:32:10Z

@VannTen: Closing this issue.

In response to this:

Kubespray version (commit) (git rev-parse --short HEAD):
22f58a5

I can't find this commit in the repository.
From your gist
{{ etcd_supported_versions[kube_major_version] }}: 'dict object' has no attribute 'v1.28'. 'dict object' has no attribute 'v1.28'. {{ etcd_supported_versions[kube_major_version] }}: 'dict object' has no attribute 'v1.28'. 'dict object' has no attribute 'v1.28'\n\nThe error appears to be in '/etc/ansible/usr-playbooks/cg-k8-ctrl/roles/download/tasks/download_file.yml': line 10, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Download_file | Starting download of file\n ^ here\n"}
Looks like you tried to use 1.28 on unsupported versions.
I'm going to close this, feel free to reopen if you actually still encounter a bug
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dgarner-cg added the kind/bug Categorizes issue or PR as related to a bug. label Nov 3, 2023

mdbudnick mentioned this issue Dec 15, 2023

The conditional check ''OK' in get_url_result.msg or 'file already exists' in get_url_result.msg or get_url_result.status_code == 304' failed. #10494

Closed

prairiewolf-by mentioned this issue Jan 3, 2024

Kubespray cannot download file: FAILED - RETRYING: Download_file | Validate mirrors #10750

Closed

k8s-ci-robot closed this as completed Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing download. #10592

Failing download. #10592

dgarner-cg commented Nov 3, 2023 •

edited

Loading

dgarner-cg commented Nov 4, 2023 •

edited

Loading

dgarner-cg commented Nov 9, 2023

FaraSys commented Nov 11, 2023

dgarner-cg commented Nov 11, 2023

arusa commented Nov 19, 2023

dgarner-cg commented Nov 19, 2023

arusa commented Nov 19, 2023

marvin0815 commented Nov 22, 2023

marvin0815 commented Nov 23, 2023

mdbudnick commented Dec 15, 2023 •

edited

Loading

mdbudnick commented Dec 16, 2023

VannTen commented Jan 16, 2024

k8s-ci-robot commented Jan 16, 2024

Failing download. #10592

Failing download. #10592

Comments

dgarner-cg commented Nov 3, 2023 • edited Loading

dgarner-cg commented Nov 4, 2023 • edited Loading

dgarner-cg commented Nov 9, 2023

FaraSys commented Nov 11, 2023

dgarner-cg commented Nov 11, 2023

arusa commented Nov 19, 2023

dgarner-cg commented Nov 19, 2023

arusa commented Nov 19, 2023

marvin0815 commented Nov 22, 2023

marvin0815 commented Nov 23, 2023

mdbudnick commented Dec 15, 2023 • edited Loading

mdbudnick commented Dec 16, 2023

VannTen commented Jan 16, 2024

k8s-ci-robot commented Jan 16, 2024

dgarner-cg commented Nov 3, 2023 •

edited

Loading

dgarner-cg commented Nov 4, 2023 •

edited

Loading

mdbudnick commented Dec 15, 2023 •

edited

Loading