Skip to content

dry-run sometimes misses metadata and causes failed to prune fields error during CRD conversion #227

@LittleWat

Description

@LittleWat

Hello! We are developing a custom operator and utilizing fluxCD.

We have the v1alpha1 custom resource which is deployed by fluxCD.
When we upgraded the custom resource operator from v1alpha1 to v1alpha2, flux notified us that dryrun failed with the following error message.

dry-run failed, error: failed to prune fields: failed add back owned items: failed to convert pruned object at version <foo.com>/v1alpha1: conversion webhook for <foo.com>/v1alpha2, Kind=<resource>  returned invalid metadata: invalid metadata of type <nil> in input object

Running the following dry-run command actually sometimes(about once in 5 times...?) fails with the same error message.

$ kubectl apply --server-side --dry-run=server  -f <v1alpha1-resource.yaml> --field-manager kustomize-controller

Error from server: failed to prune fields: failed add back owned items: failed to convert pruned object at version <foo.com>/v1alpha1: conversion webhook for <foo.com>/v1alpha2, Kind=<resource>  returned invalid metadata: invalid metadata of type <nil> in input object

But performing the actual conversion (the following command) never fails

performing the actual conversion also sometimes fails.

$ kubectl apply --server-side -f  <v1alpha1-resource.yaml> --field-manager kustomize-controller

<foo.com>/<resource> serverside-applied

The flakiness might be a key to solving this.

Our conversion code is similar to https://github.com/IBM/operator-sample-go/blob/b79e66026a5cc5b4994222f2ef7aa962de9f7766/operator-application/api/v1alpha1/application_conversion.go#L37

We checked the conversion log. Just one dryrun command called ConvertTo function for 3 times and ConvertFrom function for 3 times. For the last one time for each ConvertTo and ConvertFrom, we noticed that the request has lacking the information of metadata and spec when it fails.
The error log is like "metadata":{"creationTimestamp":null},"spec":{}
(The normal log is like "metadata":{"name":"<foo>","namespace":"<foo>","uid":"09b69792-56d5-4217-b23c-4d418d3f904b","resourceVersion":"1707796","generation":3,"creationTimestamp":"2022-09-16T07:28:54Z","labels":{"kustomize.toolkit.fluxcd.io/name":"<foo>","kustomize.toolkit.fluxcd.io/namespace":"flux-system"}},"spec":{"attribute1":[{...)

We could confirm that this weird thing happens when the managedField has two components(kustomization-controller and our-operator) as follows:

apiVersion: <foo.com>/v1alpha2
kind: <MyResource>
metadata:
  creationTimestamp: "2022-09-15T04:52:03Z"
  generation: 1
  labels:
    kustomize.toolkit.fluxcd.io/name: operator-sample
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  managedFields:
  - apiVersion: <foo.com>/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          f:kustomize.toolkit.fluxcd.io/name: {}
          f:kustomize.toolkit.fluxcd.io/namespace: {}
      f:spec:
        f:attribute1: {}
        f:attribute2: {}
    manager: kustomize-controller
    operation: Apply
    time: "2022-09-15T04:52:03Z"
  - apiVersion: <foo.com>/v1alpha2
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:attribute1: {}
        f:attribute2: {}
    manager: <our-operator>
    operation: Update
    time: "2022-09-15T04:52:04Z"
  name: v1alpha1-flux
  namespace: flux
  resourceVersion: "483157"
  uid: 696bed77-a12b-45d0-b240-8d685cf790e0

spec:
  ...
status:
  ...

I asked this question in the flux repo but I could not get the reason why.
fluxcd/flux2#3105

I got stuck in this for more than one week and any ideas are really appreciated.
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions