Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splunk Operator: something breaking local config files on pod restart #1212

Closed
yaroslav-nakonechnikov opened this issue Aug 9, 2023 · 13 comments
Assignees
Labels

Comments

@yaroslav-nakonechnikov
Copy link

Please select the type of request

Bug

Tell us more

Describe the request
We see time to time strange behavior, that config files, which were pushed thru default.yml is broken after pod restart.

[splunk@splunk-prod-cluster-manager-0 splunk]$ cat /opt/splunk/etc/system/local/authentication.conf
 
[authentication]
authSettings = saml
authType = SAML
authSettings
authType
 
[saml]
entityId = splunkACSEntityId
fqdn = https://cm.fqdn.cloud
idpSSOUrl = https://idp.fqdn.com/idp/SSO.saml2
inboundDigestMethod = SHA1;SHA256;SHA384;SHA512
inboundSignatureAlgorithm = RSA-SHA1;RSA-SHA256;RSA-SHA384;RSA-SHA512
issuerId = idp:fqdn.com:saml2
lockRoleToFullDN = True
redirectAfterLogoutToUrl = https://www.splunk.com
redirectPort = 443
replicateCertificates = True
signAuthnRequest = True
signatureAlgorithm = RSA-SHA1
signedAssertion = True
sloBinding = HTTP-POST
ssoBinding = HTTP-POST
clientCert = /mnt/certs/saml_sig.pem
idpCertPath = /mnt/certs/
entityId
fqdn
idpSSOUrl
inboundDigestMethod
inboundSignatureAlgorithm
issuerId
lockRoleToFullDN
redirectAfterLogoutToUrl
redirectPort
replicateCertificates
signAuthnRequest
signatureAlgorithm
signedAssertion
sloBinding
ssoBinding
clientCert
idpCertPath
 
[roleMap_SAML]
admin = ldap-group-a
cloudgateway = ldap-group-b
dashboard = ldap-group-c
ess_admin = ldap-group-d
ess_analyst = ldap-group-e;ldap-group-f;ldap-group-g
...
splunk_soc_l1_l2 = ldap-group-y
splunk_soc_l3 = ldap-group-x
admin
cloudgateway
dashboard
ess_admin
ess_analyst
...
splunk_soc_l1_l2
splunk_soc_l3

so, list of keys were duplicated without value.

Here is a configmap:

[yn@ip-10-224-31-36 /]$ kubectl get configmap splunk-prod-indexer-defaults -o yaml
apiVersion: v1
data:
  default.yml: |-
    splunk:
      site: site1
      multisite_master: localhost
      all_sites: site1,site2,site3,site4,site5,site6
      multisite_replication_factor_origin: 1
      multisite_replication_factor_total: 3
      multisite_search_factor_origin: 1
      multisite_search_factor_total: 3
      idxc:
        # search_factor: 3
        # replication_factor: 3
        app_paths_install:
          default:
            - https://path.to.app/config-explorer_1715.tgz
        apps_location:
          - https://path.to.app/config-explorer_1715.tgz
      app_paths:
        idxc: "/opt/splunk/etc/manager-apps"
      app_paths_install:
        default:
          - https://path.to.app/config-explorer_1715.tgz
        idxc:
          - https://path.to.app/cmp_indexer_indexes.tgz
          - https://path.to.app/cmp_resmonitor.tgz
          - https://path.to.app/cmp_soar_indexes.tgz
      conf:
        - key: server
          value:
            directory: /opt/splunk/etc/system/local
            content:
              imds:
                imds_version: v2
        - key: deploymentclient
          value:
            directory: /opt/splunk/etc/system/local
            content:
              deployment-client :
                disabled : false
              target-broker:deploymentServer :
                targetUri : ds.shared.cmp-a.internal.cmpgroup.cloud:8089
        - key: web
          value:
            directory: /opt/splunk/etc/system/local
            content:
              settings:
                enableSplunkWebSSL: true
        - key: authentication
          value:
            directory: /opt/splunk/etc/system/local
            content:
              authentication:
                authSettings : saml
                authType : SAML
              saml:
                entityId : splunkACSEntityId
                fqdn : https://cm.fqdn.cloud
                idpSSOUrl : https://idp.fqdn.com/idp/SSO.saml2
                inboundDigestMethod : SHA1;SHA256;SHA384;SHA512
                inboundSignatureAlgorithm : RSA-SHA1;RSA-SHA256;RSA-SHA384;RSA-SHA512
                issuerId : idp:fqdn.com:saml2
                lockRoleToFullDN : true
                redirectAfterLogoutToUrl : https://www.splunk.com
                redirectPort : 443
                replicateCertificates : true
                signAuthnRequest : true
                signatureAlgorithm : RSA-SHA1
                signedAssertion : true
                sloBinding : HTTP-POST
                ssoBinding : HTTP-POST
                clientCert : /mnt/certs/saml_sig.pem
                idpCertPath: /mnt/certs/
              roleMap_SAML:
                admin : ldap-group-a
                cloudgateway : ldap-group-b
                dashboard : ldap-group-c
                ess_admin : ldap-group-d
                ess_analyst : ldap-group-e;ldap-group-f;ldap-group-g
                ...
                splunk_soc_l1_l2 : ldap-group-y
                splunk_soc_l3 : ldap-group-x
        - key: authorize
          value:
            directory: /opt/splunk/etc/system/local
            content:
              role_admin:
                run_script_adhocremotesearchraw : enabled
                run_script_adhocremotesearch : enabled
                run_script_environmentpoller : enabled
                run_script_sleepy : enabled
kind: ConfigMap
metadata:
  creationTimestamp: "2023-02-24T16:53:17Z"
  name: splunk-prod-indexer-defaults
  namespace: splunk-operator
  ownerReferences:
  - apiVersion: enterprise.splunk.com/v4
    controller: true
    kind: ClusterManager
    name: prod
    uid: 84aa7496-eb5a-4ffb-9549-c42f7780450e
  resourceVersion: "95698835"
  uid: 47b70fd9-0398-4aa0-ace5-20a5ac9d4842

Expected behavior
default.yml is rendering each run same way. without issues.

Splunk setup on K8S
EKS 1.27
Splunk Operator 2.3.0
Splunk 9.1.0.2

Reproduction/Testing steps
after some unpredicted restart of pod, new pod started with broken config.

@yaroslav-nakonechnikov
Copy link
Author

same thing happened in etc/system/local/server.conf:

[splunk@splunk-prod-cluster-manager-0 splunk]$ cat etc/system/local/server.conf | grep "\[imds\]" -A 3
[imds]
imds_version = v2
imds_version

and etc/system/local/web.conf

[splunk@splunk-prod-cluster-manager-0 splunk]$ cat etc/system/local/web.conf | grep "\[settings\]" -A 3
[settings]
mgmtHostPort = 0.0.0.0:8089
enableSplunkWebSSL = True
enableSplunkWebSSL

so, each file, which was defined in conf section is broken.

@yaroslav-nakonechnikov
Copy link
Author

kubectl delete pod - initiates recreation of pod, and all seems fine.
But we want to find root cause, as this can happen anywhere!

@yaroslav-nakonechnikov
Copy link
Author

unmasked diag uploaded in case #3285863

@yaroslav-nakonechnikov
Copy link
Author

i found how i can replicate issue: delete/stop/whatever with splunk process in pod and in sometime liveness probe will trigger restart of pod and after that you'll see broken config

@yaroslav-nakonechnikov
Copy link
Author

reported: splunk/splunk-ansible#751

@vivekr-splunk
Copy link
Collaborator

@iaroslav-nakonechnikov we are looking into this issue now, will update you with our findings.

@yaroslav-nakonechnikov
Copy link
Author

issue still exist in 9.1.1

@vivekr-splunk
Copy link
Collaborator

@yaroslav-nakonechnikov , we are working with splunk-ansible team to fix this issue. will update you once that is done.

@yaroslav-nakonechnikov
Copy link
Author

was it fixed?

@vivekr-splunk vivekr-splunk assigned akondur and unassigned kumarajeet and jryb Nov 9, 2023
@vivekr-splunk
Copy link
Collaborator

Hi @yaroslav-nakonechnikov , this fix didnt go in 9.1.1 . its planned for 9.1.2 . will update you once the release is complete.

@akondur akondur added 9.1.2 and removed 9.1.1 labels Nov 13, 2023
@yaroslav-nakonechnikov
Copy link
Author

@vivekr-splunk 9.1.2 released, but still no news here.
is there any ETA?

@vivekr-splunk
Copy link
Collaborator

Hello @yaroslav-nakonechnikov this is fixed in 9.1.2 build.

@yaroslav-nakonechnikov
Copy link
Author

i managed to test it, and yes. it looks like this fixed.
but #1260

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants