-
-
Notifications
You must be signed in to change notification settings - Fork 32
Add kubernetes type clustering option #560
Comments
I'm not sure if I understand correctly but this sounds like it needs a separate component acting as a k8s controller for gnmic. It would be responsible for managing the state of the cluster. A guide to deploy gnmic on k8s will be very helpful, it would fit nicely with the docs. |
Argocd stores most of its configuration in Secrets (but I am sure ConfigMaps would also be fine for gnmic) and Custom Resource Definitions (which would be too much for the simple use case in gmnic) which are basically key-value stores. They don't have a specific way to set a TTL but I am sure you could just create an entry in the specific configmap with the TTL value if that is needed. For the guide, I will start working on it right away. |
Thanks for working on the guide and thanks for the details about argocd. Consul does a little bit more that just storage. About using k8s as KV store for clustering, I think
I believe this should work, open to comments and suggestions, I might have missed something or expected a piece to work differently from its real behavior. |
@melkypie if you can give the 0.25.0-beta release you will be able to try k8s based clustering. The deployment method is similar to what you already did with Consul except:
clustering:
cluster-name: cluster1
targets-watch-timer: 30s
leader-wait-timer: 30s
locker:
type: k8s
namespace: gnmic # default to "default"
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: gnmic
name: svc-pod-lease-reader
rules:
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["get", "watch", "list"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: gnmic-user
namespace: gnmic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods-leases
namespace: gnmic
subjects:
- kind: ServiceAccount
name: gnmic-user
roleRef:
kind: Role
name: svc-pod-lease-reader
apiGroup: rbac.authorization.k8s.io
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: gnmic-ss
labels:
app: gnmic
spec:
replicas: 3
selector:
matchLabels:
app: gnmic
serviceName: gnmic-svc
template:
metadata:
labels:
app: gnmic
spec:
containers:
- args:
- subscribe
- --config
- /app/config.yaml
image: gnmic:0.0.0-k
imagePullPolicy: IfNotPresent
name: gnmic
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
ports:
- containerPort: 9804
name: prom-output
protocol: TCP
- containerPort: 7890
name: gnmic-api
protocol: TCP
resources:
limits:
cpu: 100m
memory: 400Mi
requests:
cpu: 50m
memory: 200Mi
envFrom:
- secretRef:
name: gnmic-login
env:
- name: GNMIC_API
value: :7890
- name: GNMIC_CLUSTERING_INSTANCE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: GNMIC_CLUSTERING_SERVICE_ADDRESS
value: "$(GNMIC_CLUSTERING_INSTANCE_NAME).gnmic-svc.gnmic.svc.cluster.local"
- name: GNMIC_OUTPUTS_OUTPUT1_LISTEN
value: "$(GNMIC_CLUSTERING_INSTANCE_NAME).gnmic-svc.gnmic.svc.cluster.local:9804"
volumeMounts:
- mountPath: /app/config.yaml
name: config
subPath: config.yaml
serviceAccountName: gnmic-user # <-- service account name created earlier
volumes:
- configMap:
defaultMode: 420
name: gnmic-config
name: config
apiVersion: v1
kind: Service
metadata:
name: cluster1-gnmic-api
labels:
app: gnmic
spec:
ports:
- name: http
port: 7890
protocol: TCP
targetPort: 7890
selector:
app: gnmic
clusterIP: None I did some tests on my side, it seems to be stable even when shrinking the SS size
There is no mechanism to redistribute the targets when growing the SS It would be helpful if you could give it a go to see if it fits your needs. |
Will do, I won't be able to get back to you until Tuesday as I don't have access to cluster where I could test out GNMI due to easter holidays. |
I gave it a try.
StatefulSet.yaml apiVersion: apps/v1
kind: StatefulSet
metadata:
name: gnmic-ss
namespace: gnmic
labels:
app: gnmic
spec:
replicas: 3
selector:
matchLabels:
app: gnmic
serviceName: gnmic-svc
template:
metadata:
labels:
app: gnmic
version: 0.25.0-beta
spec:
containers:
- args:
- subscribe
- --config
- /app/config.yaml
image: ghcr.io/karimra/gnmic:0.25.0-beta-scratch
imagePullPolicy: IfNotPresent
name: gnmic
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
ports:
- containerPort: 9804
name: prom-output
protocol: TCP
- containerPort: 7890
name: gnmic-api
protocol: TCP
resources:
limits:
cpu: 100m
memory: 400Mi
requests:
cpu: 50m
memory: 200Mi
envFrom:
- secretRef:
name: gnmic-login
env:
- name: GNMIC_API
value: :7890
- name: GNMIC_CLUSTERING_INSTANCE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: GNMIC_CLUSTERING_SERVICE_ADDRESS
value: "$(GNMIC_CLUSTERING_INSTANCE_NAME).gnmic-svc.gnmic.svc.cluster.local"
- name: GNMIC_OUTPUTS_PROM_LISTEN
value: "$(GNMIC_CLUSTERING_INSTANCE_NAME).gnmic-svc.gnmic.svc.cluster.local:9804"
volumeMounts:
- mountPath: /app/config.yaml
name: config
subPath: config.yaml
serviceAccountName: gnmic-user
volumes:
- configMap:
defaultMode: 420
name: gnmic-config
name: config RBAC.yaml apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: gnmic
name: svc-pod-lease-reader
rules:
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["get", "watch", "list"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: gnmic-user
namespace: gnmic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods-leases
namespace: gnmic
subjects:
- kind: ServiceAccount
name: gnmic-user
roleRef:
kind: Role
name: svc-pod-lease-reader
apiGroup: rbac.authorization.k8s.io Service.yaml apiVersion: v1
kind: Service
metadata:
name: gnmic-svc
namespace: gnmic
labels:
app: gnmic
spec:
ports:
- name: http
port: 9804
protocol: TCP
targetPort: 9804
selector:
app: gnmic
clusterIP: None
---
apiVersion: v1
kind: Service
metadata:
name: cluster1-gnmic-api
namespace: gnmic
spec:
ports:
- name: http
port: 7890
protocol: TCP
targetPort: 7890
selector:
app: gnmic
clusterIP: None ConfigMap.yaml apiVersion: v1
kind: ConfigMap
metadata:
name: gnmic-config
namespace: gnmic
data:
config.yaml: |
insecure: true
encoding: json_ietf
log: true
clustering:
cluster-name: cluster1
targets-watch-timer: 30s
leader-wait-timer: 30s
locker:
type: k8s
namespace: gnmic
targets:
device1:
address: device1:6030
subscriptions:
- general
device2:
address: device2:6030
subscriptions:
- general
device3:
address: device3:6030
subscriptions:
- general
device4:
address: device4:6030
subscriptions:
- general
subscriptions:
general:
paths:
- /interfaces/interface/state/counters
stream-mode: sample
sample-interval: 5s
outputs:
prom:
type: prometheus
strings-as-labels: true Also adding sanitized log files ( also I noticed that gnmic seems to be logging plaintext passwords in logs which would be great if it did not do that ): The logs are from trying it a second time, so you can't see where it created the device1 lease. |
I'm not sure what is going wrong here, I re tested with a single node as well as 1 control and 2 worker nodes (1.23.4 and 1.22.7) |
The leader assigning the target to itself I understood, but yea the most interesting part is that the lock/lease is not being recognized by the leader although if you look at the leases it is there. |
Finally got around to testing it and I found the error! gnmic/lockers/k8s_locker/k8s_registration.go Line 106 in 3caa03e
It is my fault for not providing exact configs I used to deploy as then it might have been easier to debug. EDIT: Also seems to be the case with targets having |
That part actually replaces The leader keeps a mapping of the transformed key ( I got rid of the key mapping and added the original key as an annotation to the lease, that's how the Name: gnmic-cluster-1-targets-172.20.20.2
Namespace: gnmic
Labels: app=gnmic
gnmic-cluster-1-targets-172.20.20.2=gnmic-ss-2
Annotations: original-key: gnmic/cluster-1/targets/172.20.20.2
API Version: coordination.k8s.io/v1
Kind: Lease
Metadata:
Creation Timestamp: 2022-04-26T05:31:05Z
Managed Fields:
API Version: coordination.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:original-key:
f:labels:
.:
f:app:
f:gnmic-cluster-1-targets-172.20.20.2:
f:spec:
f:acquireTime:
f:holderIdentity:
f:leaseDurationSeconds:
f:renewTime:
Manager: gnmic
Operation: Update
Time: 2022-04-26T05:31:05Z
Resource Version: 1876693
UID: ea0e4259-b39a-47f2-a62a-60dfb64cccb1
Spec:
Acquire Time: 2022-04-26T05:39:53.085031Z
Holder Identity: gnmic-ss-2
Lease Duration Seconds: 10
Renew Time: 2022-04-26T05:39:53.085031Z
Events: <none> I will issue a release shortly with this code so you can test it (if you don't mid) |
Seems to be fine. Works with both cluster name and targets having The targets not being redistributed if the statefulset is scaled up does not currently work as you said is quite an important feature but that is out of scope for this issue. |
Thanks for testing it, I will write some docs about k8s based clustering before releasing. Concerning redistribution, I think this can be done periodically (enabled via a knob |
Currently the only KV storage we can use for clustering is Consul. A nice feature would be to add Kubernetes type and store all of the key/value information in Kubernetes objects similar to how argocd does it. This would allow the user to not have to maintain another KV storage solution.
I know this is quite a big ask but I have already managed to deploy gnmic clustering on Kubernetes with consul and having this would allow me to not worry about having another KV storage. If needed I could write a guide on how to do deploy it to kubernetes and help with the serviceaccount/rolebinding/role objects and other kubernetes related things.
The text was updated successfully, but these errors were encountered: