Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flows do not persist pod restart #70

Closed
juldrixx opened this issue Mar 24, 2022 · 8 comments
Closed

Flows do not persist pod restart #70

juldrixx opened this issue Mar 24, 2022 · 8 comments

Comments

@juldrixx
Copy link
Contributor

From nifikop created by andrew-musoke: Orange-OpenSource/nifikop#201

Type of question

Are you asking about community best practices, how to implement a specific feature, or about general context and help around nifikop ?
General help with Nifikop.

Question

What did you do?
I deployed Nifi with 2 pods via NifiKops. After creating a flow on the UI, I exported the process groups to a nifi-registry as well. The cluster run for days. This is the CR I used. I then deleted the cluster pods to test resilience.

apiVersion: nifi.orange.com/v1alpha1
kind: NifiCluster
metadata:
  name: simplenifi
  namespace: dataops
spec:
  service:
    headlessEnabled: true
  zkAddress: "zookeeper.dataops.svc.cluster.local.:2181"
  zkPath: "/simplenifi"
  clusterImage: "apache/nifi:1.12.1"
  oneNifiNodePerNode: false
  nodeConfigGroups:
    default_group:
      isNode: true
      imagePullPolicy: IfNotPresent
      storageConfigs:
        - mountPath: "/opt/nifi/nifi-current/logs"
          name: logs
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "gp2"
            resources:
              requests:
                storage: 10Gi
      serviceAccountName: "default"
      resourcesRequirements:
        limits:
          cpu: "0.5"
          memory: 2Gi
        requests:
          cpu: "0.5"
          memory: 2Gi
  clientType: "basic"
  nodes:
    - id: 1
      nodeConfigGroup: "default_group"
    - id: 2
      nodeConfigGroup: "default_group"
  propagateLabels: true
  nifiClusterTaskSpec:
    retryDurationMinutes: 10
  listenersConfig:
    internalListeners:
      - type: "http"
        name: "http"
        containerPort: 8080
      - type: "cluster"
        name: "cluster"
        containerPort: 6007
      - type: "s2s"
        name: "s2s"
        containerPort: 10000

What did you expect to see?
I expected the cluster to run properly and survive restarts since PVs are created. I expected to see the pipelines continue running after the pods started up.

What did you see instead? Under which circumstances?
When the pods came back up and were healthy, the UI had no flows or process groups. The registry configuration had also disappeared. I have to manually re-register the nifi-registry, re-import the process groups, add the secrets and restart the pipelines.

  1. Why would this happen when Nifi has persistent volumes?
  2. How can this behaviour be stopped?
  3. How can I persist the flows or at least automate the re-importing and restarting of pipelines from nifi-registry.

Environment

  • nifikop version:
    v0.7.5-release

  • Kubernetes version information:

 Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.11-eks-f17b81", GitCommit:"f17b810c9e5a82200d28b6210b458497ddfcf31b", GitTreeState:"clean", BuildDate:"2021-10-15T21:46:21Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
  • NiFi version:

apache/nifi:1.12.1

@juldrixx
Copy link
Contributor Author

I got this response from one of the alternate communication channels. But I cannot make sense of it. Could this be an issue?

It sounds like the flow.xml.gz is perhaps not saved on a persistent volume? The ideal behavior would be to have several different persistent volumes:

  • One for content repo
  • One for flowfile repo
  • One for provenance repo
  • One for logs
  • One for conf/ directory, any additional configuration resources. (this could easily be combined with the logs/ volume)

@juldrixx
Copy link
Contributor Author

I recently opened a PR that provides an option to NifiClusterSpec where, when specified, does not remove the flow.xml.gz file on pod startup.

In the current implementation, even though the flows.xml.gz file is persisted, it is removed every time the pod starts. https://github.com/Orange-OpenSource/nifikop/blob/master/pkg/resources/nifi/pod.go#L418

@juldrixx
Copy link
Contributor Author

You should deploy a NiFiDataflow so that NiFiKOp re-deploys the versioned dataflow from NiFi Registry.

https://orange-opensource.github.io/nifikop/docs/5_references/5_nifi_dataflow

I could be wrong, but I suppose you could also make sure the flow.xml.gz is persisted on a persistent volume but it's not necessary if you deploy a NiFiDataflow since nifikop will just put it back once the pod comes up.

@mh013370
Copy link
Member

mh013370 commented Jul 15, 2022

For production clusters where you've configured nifikop to deploy flows to, this isn't really a problem. However, I do think this would be a useful feature for the following reason:

If you use a single cluster deployment as a place to create flows and version control them, then you wouldn't be configuring flows to be deployed to it. Since nifikop wipes the flow.xml.gz on each pod restart, you have to manually re-import all of the flows you are working on to be deployed to other clusters.

I personally feel that the PR previously mentioned, raised by @genehynson, would be a useful feature and should be re-opened in this repo.

@genehynson
Copy link
Contributor

genehynson commented Jul 15, 2022

After upgrading to NiFi 1.16 we are no longer running into this issue. I believe this is because NiFi migrated to a new file, flow.json.gz which is not deleted by the NiFi pod startup script provided by nifikop.

Also with NiFi 1.16 we've been able to do clean, rolling upgrades by creating a PodDisruptionBudget and only allowing 1 NiFi node to be updated by k8s at a time. NiFi 1.16 introduced a new "flow negotiation" system that allows for each node in the NiFi cluster to have slightly different versions of the flow.json.gz file (like different processor versions, for example).

So even if nifikop does start deleting the flow.json.gz file I think we'll be fine because when a NiFi pod rolls it will get the contents for the flow.json.gz from the primary NiFi node that has not rolled yet (or has already rolled).

So that being said, the usecase for the PR mentioned is only if you're running 1 NiFi node or are running an older version of NiFi.

@mh013370
Copy link
Member

mh013370 commented Jul 18, 2022

Good to know! Thanks for the follow up. I do think that NiFi is writing both the flow.xml.gz and the flow.json.gz temporarily as they transition to the json variant. But it's good to know that with 1.16+ and the changes around flow negotiation that it's a minor issue.

Maybe we can resolve this issue then?

@genehynson
Copy link
Contributor

I do think that NiFi is writing both the flow.xml.gz and the flow.json.gz temporarily as they transition to the json variant

Correct, but it only uses one of them. Whichever you have defined in nifi.flow.configuration.file (flow.xml.gz is the default). And to get the benefits of the new flow negotiation stuff you have to switch to the flow.json.gz file.

That being said, I'm also fine with resolving this issue.

@erdrix
Copy link
Contributor

erdrix commented Aug 19, 2022

The flow.xml.gz is not removed anymore at pod restart !

@erdrix erdrix closed this as completed Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants