-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing download. #10592
Comments
This is insanely frustrating. I've added the following to /roles/.../download_files.yml
And below is further output ... https://gist.github.com/dgarner-cg/064541f36bbac6b3ea49590f759989b0 |
Bro, same ish on all Ubuntu systems.. what the f. |
Hi. |
I am .. making progress, I have literally been working on this for a week. |
I experience very similar problems. DNS problems coming up all the time. Currently I'm trying to add a new node and cp-node using cluster.yml and scale.yml and it always results in servers not being able to download stuff because kubespray updated their /etc/systemd/resolved.conf to resolve using coredns, but they don't have access to coredns yet :( I'm very happy that I'm only running a test cluster. |
Thanks for your feedback, I want to say that all the nodes are reachable via valid DNS but I will check.. I know my outside-of-cluster installer controller and 2 k8s-controller nodes all have valid DNS from here to Google, but I also had no ide about the CoreDNS issue either .. I am looking as I have time to ensure Cilium is used across all files and use a local repo, but work has picked up going into the Holiday, just got off a 7 straight week, 24/7 on call stretch. :D I will take a look at this again in a few moments and hope to knock it out. |
I finally managed to add the new node. When I saw in the ansible output, that it just updated the /etc/systemd/resolved.conf file, I quickly opened it up on the new node and changed the line:
to
and ran:
This way the node managed to finish all downloads executed by ansible. And in the end the resolved.conf was already changed back to use the coredns service as a resolver. Btw. I also had to set So DNS feels generally very fragile with Kubespray. |
No idea if it's new but GitHub now gives a 401 Forbidden for me when validating mirrors in Kubespray See: |
I get a 200 again today. Sill modified the download role to check with GET instead of HEAD to deploy.
Just in case it's a random bug in GitHub's cache system or something. |
It looks like I am having a similar issue and this is my output when running with the
Please note: "Destination /tmp/releases does not exist" is not the issue as it fails with the same msg after adding an explicit file task before to create the directory. Edit There is no checksum issue I will try v2.22.1 and other versions and investigate the difference if I get it to work. |
Nevermind me, TIL |
I can't find this commit in the repository.
Looks like you tried to use 1.28 on unsupported versions. |
@VannTen: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I have attempted everything to resolve this issue .. for over a week or more.
It's getting frustrating.
I've attempted this with the newest version of everything involved (Ansible, Kubespray, etc..) in standard OS and alternatively I've attempted this in a venv with requirements.txt versions of everything.
I've attempted to eliminate all troubleshooting options possible before posting and it always comes down to the same Download section.
I'm seeing a lot of info I haven't seen before with this inventory but I have to run to Rx before it closes and want to post immediately as it's already been an issue for actually more like 3 weeks, just the last week I have consistently focused on it.
fyi..
• pve-cos-pri: Outside server not involved in k8s cluster, this can be considered the primary server of the network.
• pve-k8s-... obviously cluster
• Primary network subnet: 10.0.0.0/24
• "DCHP" slots for k8s: 10.0.0.82 - 88
• dnssubdomain is separate for k8s cluster and is on the mgmt.sub.domain.tld portion.
• All machines have good dhcp/dns/resolve
curl ifconfig.me
properly..Can't think of much else outside of the process it could be / to run through.. now onto the other stuff and I'll be back later.
Thanks guys,
Environment:
Local Baremetal Proxmox,
Dual Socket Xeon Gold 6148 80 Core with 256 GB RAM.
Control Server: Debian GNU/Linux 12 (Bookworm) Linux 6.1.0-13-amd64 x86_64
7 Node K8s Cluster, all the same.
Version of Ansible (
ansible --version
):2.14.11
Version of Python (
python --version
):Python 3.11.2
Kubespray version (commit) (
git rev-parse --short HEAD
):22f58a5
Network plugin used:
Calico
Full inventory with variables:
https://gist.github.com/dgarner-cg/c5ea336fdc78b369145cf52cd075dfee
Command used to invoke ansible:
ansible-playbook
-i inventory/k8-mg/hosts.yaml
--private-key=~/.ssh/id_rsa
-u root
--become
cluster.yml
Output of ansible run:
https://gist.github.com/dgarner-cg/3f57fe502a970ead3529ac7fd836b043
Anything else do we need to know:
I would look into why this is throwing as this may be another issue, but I've got to run out before Rx closes rq..
https://gist.github.com/dgarner-cg/d055057c89634705e8366b14208c5223
The text was updated successfully, but these errors were encountered: