Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cronjobs intermittently failed due to Istio sidecar terminating with error "unable to read file: etc/istio/proxy/envoy-rev.json" #50743

Open
2 tasks done
jbilliau-rcd opened this issue Apr 29, 2024 · 1 comment

Comments

@jbilliau-rcd
Copy link

jbilliau-rcd commented Apr 29, 2024

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

We have CronJobs pods that are intermittently failing across multiple namespaces, multiple clusters, all for the same reason; the Istio sidecar blows up with a critical error critical envoy main external/envoy/source/server/server.cc:134 error initializing config ' etc/istio/proxy/envoy-rev.json': unable to read file: etc/istio/proxy/envoy-rev.json thread=24 2024-04-25T02:44:03.976354499Z unable to read file: etc/istio/proxy/envoy-rev.json. When this happens, the app fails as well sine it can't egress out to anywhere, within the cluster or to the internet, due to the sidecar not being present.

We are on 1.19.7. I can't find any pattern or cause as to why this is happening. Can anybody shed some light on what it could possibly be or what I can check?

Full sidecar log:

2024-04-25T02:44:01.345805Z	info	FLAG: --concurrency="0"
2024-04-25T02:44:01.345831Z	info	FLAG: --domain="prod-time-tracking-service.svc.cluster.local"
2024-04-25T02:44:01.345837Z	info	FLAG: --help="false"
2024-04-25T02:44:01.345839Z	info	FLAG: --log_as_json="false"
2024-04-25T02:44:01.345842Z	info	FLAG: --log_caller=""
2024-04-25T02:44:01.345846Z	info	FLAG: --log_output_level="default:info"
2024-04-25T02:44:01.345848Z	info	FLAG: --log_rotate=""
2024-04-25T02:44:01.345851Z	info	FLAG: --log_rotate_max_age="30"
2024-04-25T02:44:01.345854Z	info	FLAG: --log_rotate_max_backups="1000"
2024-04-25T02:44:01.345857Z	info	FLAG: --log_rotate_max_size="104857600"
2024-04-25T02:44:01.345860Z	info	FLAG: --log_stacktrace_level="default:none"
2024-04-25T02:44:01.345867Z	info	FLAG: --log_target="[stdout]"
2024-04-25T02:44:01.345870Z	info	FLAG: --meshConfig="./etc/istio/config/mesh"
2024-04-25T02:44:01.345872Z	info	FLAG: --outlierLogPath=""
2024-04-25T02:44:01.345875Z	info	FLAG: --profiling="true"
2024-04-25T02:44:01.345877Z	info	FLAG: --proxyComponentLogLevel="misc:error"
2024-04-25T02:44:01.345879Z	info	FLAG: --proxyLogLevel="warning"
2024-04-25T02:44:01.345882Z	info	FLAG: --serviceCluster="istio-proxy"
2024-04-25T02:44:01.345885Z	info	FLAG: --stsPort="0"
2024-04-25T02:44:01.345887Z	info	FLAG: --templateFile=""
2024-04-25T02:44:01.345890Z	info	FLAG: --tokenManagerPlugin="GoogleTokenExchange"
2024-04-25T02:44:01.345894Z	info	FLAG: --vklog="0"
2024-04-25T02:44:01.345897Z	info	Version 1.19.7-42b7d96f3587652551b302499e1c8761bc3a0b49-Clean
2024-04-25T02:44:01.347895Z	info	Maximum file descriptors (ulimit -n): 1048576
2024-04-25T02:44:01.348101Z	info	Proxy role	ips=[100.64.31.56] type=sidecar id=prod-time-tracking-service-export-punches-to-wd-cron-28566phc5t.prod-time-tracking-service domain=prod-time-tracking-service.svc.cluster.local
2024-04-25T02:44:01.348166Z	info	Apply proxy config from env {"discoveryAddress":"istiod-1-19-7.istio-system.svc:15012","proxyMetadata":{"ISTIO_META_DNS_AUTO_ALLOCATE":"false","ISTIO_META_DNS_CAPTURE":"false"},"extraStatTags":["jwt_client_name","destination_istio_version","source_istio_version"],"holdApplicationUntilProxyStarts":true}

2024-04-25T02:44:01.350745Z	info	cpu limit detected as 2, setting concurrency
2024-04-25T02:44:01.351553Z	info	Effective config: binaryPath: /usr/local/bin/envoy
concurrency: 2
configPath: ./etc/istio/proxy
controlPlaneAuthPolicy: MUTUAL_TLS
discoveryAddress: istiod-1-19-7.istio-system.svc:15012
drainDuration: 45s
extraStatTags:
- jwt_client_name
- destination_istio_version
- source_istio_version
holdApplicationUntilProxyStarts: true
proxyAdminPort: 15000
proxyMetadata:
  ISTIO_META_DNS_AUTO_ALLOCATE: "false"
  ISTIO_META_DNS_CAPTURE: "false"
serviceCluster: istio-proxy
statNameLength: 189
statusPort: 15020
terminationDrainDuration: 5s
tracing:
  zipkin:
    address: zipkin.istio-system:9411

2024-04-25T02:44:01.351576Z	info	JWT policy is third-party-jwt
2024-04-25T02:44:01.351581Z	info	using credential fetcher of JWT type in cluster.local trust domain
2024-04-25T02:44:01.355620Z	info	platform detected is AWS
2024-04-25T02:44:01.358770Z	info	Workload SDS socket not found. Starting Istio SDS Server
2024-04-25T02:44:01.358793Z	info	CA Endpoint istiod-1-19-7.istio-system.svc:15012, provider Citadel
2024-04-25T02:44:01.358810Z	info	Using CA istiod-1-19-7.istio-system.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem
2024-04-25T02:44:01.359053Z	info	Opening status port 15020
2024-04-25T02:44:01.374919Z	info	ads	All caches have been synced up in 29.418493ms, marking server ready
2024-04-25T02:44:01.375146Z	info	xdsproxy	Initializing with upstream address "istiod-1-19-7.istio-system.svc:15012" and cluster "Kubernetes"
2024-04-25T02:44:01.376604Z	info	Pilot SAN: [istiod-1-19-7.istio-system.svc]
2024-04-25T02:44:01.378481Z	info	Starting proxy agent
2024-04-25T02:44:01.378509Z	info	starting
2024-04-25T02:44:01.378524Z	info	Envoy command: [-c etc/istio/proxy/envoy-rev.json --drain-time-s 45 --drain-strategy immediate --local-address-ip-version v4 --file-flush-interval-msec 1000 --disable-hot-restart --allow-unknown-static-fields --log-format %Y-%m-%dT%T.%fZ	%l	envoy %n %g:%#	%v	thread=%t -l warning --component-log-level misc:error --concurrency 2]
2024-04-25T02:44:01.392879Z	info	sds	Starting SDS grpc server
2024-04-25T02:44:01.393043Z	info	starting Http service at 127.0.0.1:15004
2024-04-25T02:44:01.506105Z	info	cache	generated new workload certificate	latency=130.930072ms ttl=23h59m59.493898689s
2024-04-25T02:44:01.506142Z	info	cache	Root cert has changed, start rotating root cert
2024-04-25T02:44:01.506158Z	info	ads	XDS: Incremental Pushing ConnectedEndpoints:0 Version:
2024-04-25T02:44:01.506201Z	info	cache	returned workload trust anchor from cache	ttl=23h59m59.493799774s
2024-04-25T02:44:03.975910Z	critical	envoy main external/envoy/source/server/server.cc:134	error initializing config '  etc/istio/proxy/envoy-rev.json': unable to read file: etc/istio/proxy/envoy-rev.json	thread=24
unable to read file: etc/istio/proxy/envoy-rev.json
2024-04-25T02:44:03.978527Z	error	Envoy exited with error: exit status 1
2024-04-25T02:44:03.978728Z	info	sds	SDS server for workload certificates started, listening on "./var/run/secrets/workload-spiffe-uds/socket"

Version

~ istioctl version
kubectl version
client version: 1.21.2
control plane version: 1.19.7
data plane version: 1.19.7 (18 proxies)

~ kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.26.14-eks-b9c9ed7

Additional Information

istio-namespace: istio-system
full-secrets: false
timeout (mins): 30
include: {  }
exclude: { Namespaces: kube-node-lease,kube-public,kube-system,local-path-storage }
end-time: 2024-04-29 17:17:53.422749 -0400 EDT



Cluster endpoint: https://rancher.xxxxxxx.ffffff.zone/k8s/clusters/c-jh2b9
CLI version:
version.BuildInfo{Version:"1.21.2", GitRevision:"Homebrew", GolangVersion:"go1.22.2", BuildStatus:"Homebrew", GitTag:"1.21.2"}

The following Istio control plane revisions/versions were found in the cluster:
Revision 1-19-7:
&version.MeshInfo{
    {
        Component: "istiod",
        Revision:  "1-19-7",
        Info:      version.BuildInfo{Version:"1.19.7", GitRevision:"42b7d96f3587652551b302499e1c8761bc3a0b49", GolangVersion:"", BuildStatus:"Clean", GitTag:"1.19.7"},
    },
    {
        Component: "istiod",
        Revision:  "1-19-7",
        Info:      version.BuildInfo{Version:"1.19.7", GitRevision:"42b7d96f3587652551b302499e1c8761bc3a0b49", GolangVersion:"", BuildStatus:"Clean", GitTag:"1.19.7"},
    },
    {
        Component: "istiod",
        Revision:  "1-19-7",
        Info:      version.BuildInfo{Version:"1.19.7", GitRevision:"42b7d96f3587652551b302499e1c8761bc3a0b49", GolangVersion:"", BuildStatus:"Clean", GitTag:"1.19.7"},
    },
}

The following proxy revisions/versions were found in the cluster:
Revision 1-19-7: Versions {1.19.7}
@jbilliau-rcd
Copy link
Author

Stay tuned, this looks like it might be related to Dynatrace, which we use for monitoring in our cluster. Testing more to validate...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant