Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provision rke2 cluster from 3rd party node driver #34782

Closed
danbaie opened this issue Sep 17, 2021 · 33 comments
Closed

Provision rke2 cluster from 3rd party node driver #34782

danbaie opened this issue Sep 17, 2021 · 33 comments

Comments

@danbaie
Copy link

danbaie commented Sep 17, 2021

Rancher Server Setup

  • Rancher version: 2.6.0
  • Installation option (Docker install/Helm Chart): Helm Chart
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): v1.21.4+rke2r3
  • Proxy/Cert Details: lets-encrypt

Information about the Cluster

  • Kubernetes version: v1.21.4+rke2r3
  • Cloud provider: Hetzner Cloud

Describe the bug

Cannot create RKE2 cluster from RancherUI because I can't create cloud credentials for hetzner node driver.

To Reproduce

  • Add Hetzner node driver to Rancher via UI
  • Go to Cluster Management -> Cluster -> Create
  • Get prompted a form where I can create cloud credentials
    image
  • Add information and click on create
  • Get prompted a dropdown list where I can select the cloud provider to use
  • Dropdown list is empty
    image

I tried to add cloud credentials from the cloud credentials menu but there is no option for hetzner or custom or similar.
I tried to activate some of the built-in node drivers in rancher and some have the same problem but some work.

Result
Not able to provision RKE2 cluster with Hetzner node driver because of cloud credentials

Expected Result

I should be able to create cloud credentials in some way. When activating a node driver, the cloud provider should be added to the list of creatable cloud credentials.

Screenshots

Additional context

@Negashev
Copy link

Negashev commented Oct 12, 2021

It's fix UI bug

rancher/tasks#20 (comment)

but next problem on rancher with nodedriver management:

in rancher logs: cannot find a suitable driver, I think the problem is that it searches by name and not by ID

upd:
add annotation to your /v3/nodeDrivers/nd-xxxx (hetzner driver)

annotations": {
"privateCredentialFields": "apiToken"
}

deactivate and activate your hetzner driver -> it's "rebuild" your /v3/schemas/hetznercredentialconfig and add apiToken to resourceFields

@Negashev
Copy link

Negashev commented Oct 21, 2021

Fix on #33392 or #34914?

@Negashev
Copy link

Add rancher docker with name by CRD in local cluster (with rancher server)

apiVersion: management.cattle.io/v3
kind: NodeDriver
metadata:
  annotations:
    lifecycle.cattle.io/create.node-driver-controller: "true"
    privateCredentialFields: apiToken
  name: hetzner # <- fix my generated ID on rancher
spec:
  active: false
  addCloudCredential: false
  builtin: false
  checksum: ""
  description: ""
  displayName: hetzner
  externalId: ""
  uiUrl: https://storage.googleapis.com/hcloud-rancher-v2-ui-driver/component.js
  url: https://github.com/JonasProgrammer/docker-machine-driver-hetzner/releases/download/3.2.0/docker-machine-driver-hetzner_3.2.0_linux_amd64.tar.gz
  whitelistDomains:
  - storage.googleapis.com

Now, Ican create/delete node on cloud (hetzner and another custom node driver)

But now i have problem with UserData (cloud-init)
https://github.com/rancher/machine/blob/e51aa220eacad5bd89cfcd05cab620c2131338b9/commands/create.go#L233

Because not any docker-machine drivers work correctly with cloud-init

@Negashev
Copy link

Hm.....

When I copy node driver job, and start it with --debug key on rancher-machine
Screenshot (10)

user data file replaced with /tmp/modified-user-data557062213

and ssh to my hetzner VM, i find in cat /var/lib/cloud/instances/15488979/user-data.txt

/tmp/modified-user-data557062213

not content like

#cloud-config
runcmd:
- sh /usr/local/custom_script/install.sh
write_files:
- content: H4sIAAAAAAAA/wAALONG-LONG-LONG-LONG-AP//1Hx/d9u2DAAP//K+6J769cAAA=
  encoding: gzip+b64
  path: /usr/local/custom_script/install.sh
  permissions: "0644"

something wrong on https://github.com/rancher/machine/blob/e51aa220eacad5bd89cfcd05cab620c2131338b9/commands/create.go#L233

@phal0r
Copy link

phal0r commented Nov 14, 2021

@Negashev
Can you tell, when your fix will be merged? I can provision VMs with your workaround aswell. But initializing rancher gets stuck. The last log entry is Custom install script was sent via userdata, provisioning complete.... So I guess, this is the same problem.

@Negashev
Copy link

@phal0r
I don’t know when it will be merged, it’s out of my control

@stale
Copy link

stale bot commented Jan 14, 2022

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jan 14, 2022
@Negashev
Copy link

/upd )

@stale stale bot removed the status/stale label Jan 14, 2022
@stale
Copy link

stale bot commented Mar 16, 2022

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Mar 16, 2022
@apo2k7
Copy link

apo2k7 commented Mar 24, 2022

Are there any news to this? I ran into the exact problem. @Negashev did you find a workaround if this dosn't get fixed?

@stale stale bot removed the status/stale label Mar 24, 2022
@Negashev
Copy link

Negashev commented May 9, 2022

@apo2k7
I have not yet found the right solution, in the pull request, they offer me a fix, but I have not tested it) rancher/machine#153

Work with yandex

apiVersion: management.cattle.io/v3
kind: NodeDriver
metadata:
  annotations:
    privateCredentialFields: saKeyFile
  name: yandex  # fix name
spec:
  active: true
  addCloudCredential: false
  builtin: false
  checksum: ""
  description: ""
  displayName: yandex
  externalId: ""
  uiUrl: ""
  url: https://github.com/yandex-cloud/docker-machine-driver-yandex/releases/download/v0.1.35/docker-machine-driver-yandex_0.1.35_linux_amd64.tar.gz

but problem with ENV name in rancher YANDEX_* in docker-driver YC_*

and we can't hide token to YANDEX_TOKEN in rancher because in yandex docker driver use YC_TOKEN

i'm not use sa-key-file and in creadential it empty)

apiVersion: v1
data:
  yandexcredentialConfig-saKeyFile: ""
kind: Secret
metadata:
  annotations:
    field.cattle.io/creatorId: user-xxxxx
    field.cattle.io/name: My-Team
    provisioning.cattle.io/driver: yandex
  labels:
    cattle.io/creator: norman
  name: cc-xxxxx
  namespace: cattle-global-data
type: Opaque

@dcardellino
Copy link

I also ran in this issue with Hetzner, is there any update on this issue?

@github-actions
Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@phal0r
Copy link

phal0r commented Jul 31, 2022

still relevant

@github-actions
Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@hoerup
Copy link

hoerup commented Sep 30, 2022

still relevant

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2022

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@hoerup
Copy link

hoerup commented Dec 1, 2022

Still relevant

@Audifire
Copy link

Audifire commented Dec 5, 2022

I also ran in this issue with Hetzner, is there any update on this issue?

Same for me. I also use Hetzner and ran into this problem.
How to generate a log output for this? Which params are delivered to docker-machine?

@Negashev
Copy link

Negashev commented Dec 6, 2022

I think we need to add changes on the hetzner side of the docker driver, a key that would read userdata from a file, I'll try to make a pull request today

@Audifire
Copy link

Audifire commented Dec 6, 2022

I think we need to add changes on the hetzner side of the docker driver, a key that would read userdata from a file, I'll try to make a pull request today

Yep, that's what I also found out. I tried to change it. But sadly I have not much experience with Go and I get other errors when importing the driver. 😅

@Negashev
Copy link

Negashev commented Dec 7, 2022

Ohhhh.... next level problem...

2022/12/07 02:03:47 [ERROR] error syncing 'hetzner': handler node-driver-controller: DynamicSchema.management.cattle.io "hetzner_instrumentedconfig" is invalid: metadata.name: Invalid value: "hetzner_instrumentedconfig": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), requeuing

in hetzner driver, instrumented auto added

rancher generate docker-machine-driver-hetzner_instrumented

@Negashev
Copy link

Negashev commented Dec 7, 2022

After 3.9.2 (hetzner docker machine) tar.gz include 2 binary files, it seems the rancher takes the one with the end _instrumented

@Negashev
Copy link

Negashev commented Dec 7, 2022

Okay, FIX for hetzner after 3.10.0 (in 3.9.2 added *_instrumented)

  1. create proxy download (nginx deploy in local cluster) for tar.gz without binary like docker-machine-driver-hetzner_instrumented

    deploy that in you local cluster in ns cattle-system

apiVersion: v1
kind: Service
metadata:
  name: docker-machine-driver-hetzner
  namespace: cattle-system
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    docker-machine-driver: hetzner
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    docker-machine-driver: hetzner
  name: docker-machine-driver-hetzner
  namespace: cattle-system
spec:
  replicas: 1
  selector:
    matchLabels:
      docker-machine-driver: hetzner
  template:
    metadata:
      labels:
        docker-machine-driver: hetzner
    spec:
      containers:
      - image: nginx:stable-alpine
        imagePullPolicy: Always
        name: nginx
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/nginx/html
          name: storage
      dnsPolicy: ClusterFirst
      initContainers:
      - args:
        - -c
        - |
          wget -O /tmp/source.tar.gz "https://github.com/JonasProgrammer/docker-machine-driver-hetzner/releases/download/${VERSION}/docker-machine-driver-hetzner_${VERSION}_linux_amd64.tar.gz"
          cd /tmp/
          tar -zxvf source.tar.gz
          tar -czvf "docker-machine-driver-hetzner_${VERSION}_linux_amd64.tar.gz" docker-machine-driver-hetzner
        command:
        - /bin/sh
        env:
        - name: VERSION
          value: 3.10.0
        image: busybox
        imagePullPolicy: Always
        name: clean-tar
        volumeMounts:
        - mountPath: /tmp
          name: storage
      volumes:
      - emptyDir: {}
        name: storage
  1. create nodedriver in local cluster yaml deploy (check name, displayName and url - it's download from local)
apiVersion: management.cattle.io/v3
kind: NodeDriver
metadata:
  annotations:
    lifecycle.cattle.io/create.node-driver-controller: "true"
    privateCredentialFields: apiToken
  generation: 13
  name: hetzner
spec:
  active: true
  addCloudCredential: false
  builtin: false
  checksum: ""
  description: ""
  displayName: hetzner
  externalId: ""
  uiUrl: https://storage.googleapis.com/hcloud-rancher-v2-ui-driver/component.js
  url: http://docker-machine-driver-hetzner.cattle-system.svc/docker-machine-driver-hetzner_3.10.0_linux_amd64.tar.gz
  whitelistDomains:
  - storage.googleapis.com
  1. go to UI and activate driver

  2. Add credentials for hetzner
    Screenshot (2)

  3. create hetzner cluster, and use User data is file in all pools
    image

  4. Enjoy!
    image

@simonostendorf
Copy link

Getting "[ERROR] error syncing 'fleet-default/test-pool1-b0030872-r5rgz': handler machine-provision: nodedrivers.management.cattle.io "hetzner" not found, requeuing" when using your instructions @Negashev .
Any idea why third party node drivers (hetzner) not working for me?

@Negashev
Copy link

@simonostendorf
It's important to add nodedriver in local cluster as yaml deploy with metadata/name: hetzner, not in rancher nodedriver UI

@simonostendorf
Copy link

@simonostendorf It's important to add nodedriver in local cluster as yaml deploy with metadata/name: hetzner, not in rancher nodedriver UI

Thank you very much, i thought that if i set the name via GUI it is also possible. This fixed my initial problem, hope there will be no more big problems :D

@klauserber
Copy link

The Solution does not working for me. It fixes the driver filename successfully and there are no related error messages in the log or on screen anymore. But I still cannot create a credential for Hetzner, the key field is not populated.

The Version 3.10.1 is switching back the driver name, so we should not need this workaround anymore. But this Version is also not working for me to create RKE2 clusters. Btw, the creation of RKE clusters works fine.

@Negashev
Copy link

Negashev commented Feb 6, 2023

@klauserber
I create Cloud Credential, before i create cluster
image
image

and annotation "privateCredentialFields" required

metadata:
  annotations:
    privateCredentialFields: apiToken

@klauserber
Copy link

@Negashev
ok, interesting. That solves it. The Hetzner NodeDriver CR has the privateCredentialFields annotation.

I already have tried to create a credential as a Secret like so:

apiVersion: v1
kind: Secret
metadata:
  name: cc-test
  namespace: cattle-global-data
  labels:
    cattle.io/creator: norman
  annotations:
    field.cattle.io/creatorId: user-r8flk
    field.cattle.io/name: hetzner
    provisioning.cattle.io/driver: hetzner
type: Opaque
stringData:
  apiToken: XXXX  

But this secret does not appear in the UI. After that I can add Hetzner Cloud Credentials. The resulting secrets have a Field called hetznercredentialConfig-apiToken an the provisioning.cattle.io/driver: hetzner is obviously not needed.

apiVersion: v1
kind: Secret
metadata:
  name: cc-lab2-2
  namespace: cattle-global-data
  labels:
    cattle.io/creator: norman
  annotations:
    field.cattle.io/creatorId: user-r8flk
    field.cattle.io/name: hetzner-lab2-2
type: Opaque
stringData:
  hetznercredentialConfig-apiToken: XXXX

Now I have the Hetzner credentials type and I can create them and get into the UI.

Don't understand it completely, but it works. Thank you.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 8, 2023

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@ItsReddi
Copy link

ItsReddi commented Apr 28, 2023

We already have the nodedriver added via UI and also running an rke1 cluster with it.
Is it safe to delete it from UI and deploy it like @Negashev ?
Or would it break the already running cluster...

More than that. I do not not fully understand the issue.
Where do we need fix, . Driver? Rancher? UI-Driver? So anything is working "normally" via rancher UI.

Its a issue with the node Driver installation correct? So what must be fixed in the node driver..

@Negashev
Copy link

Negashev commented Apr 28, 2023

@ItsReddi

I don't remember exactly, it's better to test removing the driver on a test environment)

yes, it is still important to add the driver via yaml in the local cluster

if you want a cluster on rke2 and a hetzner driver, you will have to do installation manipulations through the local cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants