-
Notifications
You must be signed in to change notification settings - Fork 130
Description
Describe the bug
We deploy our NGF installation with the helm chart from this GitHub repo (currently version 2.0.2).
Our backend applications are deployed using helm charts as well that mainly include a service, a deployment, an HTTPRoute
and a ClientSettingsPolicy
to update the HTTP maxSize. To test our application, we roll out one instance of our application with a specific PathPrefix
that then accepts our test traffic. After our testing is done, we're shutting down the application by deleting/uninstalling this temporary helm chart.
In one of our deployments from last week, the NGF Gateway got into a bad state where the Nginx config could not be updated anymore.
Logs are: msg: Config apply failed, rolling back config; error: failed to parse config open /etc/nginx/includes/ClientSettingsPolicy_xyz-ngf-default-gateway-max-body.conf: no such file or directory in /etc/nginx/conf.d/http.conf:143
When opening a shell into the pods of the data plane, the file it was complaining about was actually still there. Unfortunately, of course if the Gateway can't update the Nginx config, over time our traffic won't get routed anymore because cluster internal IPs from instances of our applications that get shutdown won't be removed and new instances won't be added.
The "fix" was to rolling restart the data plane deployment.
To Reproduce
We don't have a great repro as it only happened once.
Steps to reproduce the behavior:
- Deploy deployment/service/httproute/clientsettingspolicy to cluster running NGF with helm chart
- Delete helm chart
- Status on Gateway indicated failure to apply Nginx config
- See error
Expected behavior
We expect that any additions/removal of NGF resources should not cause any issues and ultimately lead to downtime against our services. I also would like to have a metric that would tell me when the Gateway is in a state that it can't recover (see discussion here)
Your environment
- Version of the NGINX Gateway Fabric: version=v3.0.1 commit=a45c50b
- Version of Kubernetes: 1.32.2
- Kubernetes platform: AKS
- Details on how you expose the NGINX Gateway Fabric Pod: Service of type LoadBalancer
- Logs of NGINX container:
kubectl -n <nginx-deployment-namespace> logs deployments/<nginx-deployment>
msg: Config apply failed, rolling back config; error: failed to parse config open /etc/nginx/includes/ClientSettingsPolicy_xyz_canary-ngf-default-gateway-max-body.conf: no such file or directory in /etc/nginx/conf.d/http.conf:143
Two different stacktraces:
github.com/nginx/nginx-gateway-fabric/internal/controller.(*eventHandlerImpl).waitForStatusUpdates /home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/controller/handler.go:262
github.com/nginx/nginx-gateway-fabric/internal/controller/nginx/agent.(*commandService).logAndSendErrorStatus
/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/controller/nginx/agent/command.go:365
github.com/nginx/nginx-gateway-fabric/internal/controller/nginx/agent.(*commandService).setInitialConfig
/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/controller/nginx/agent/command.go:322
github.com/nginx/nginx-gateway-fabric/internal/controller/nginx/agent.(*commandService).Subscribe
/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/controller/nginx/agent/command.go:149
github.com/nginx/agent/v3/api/grpc/mpi/v1._CommandService_Subscribe_Handler
pkg/mod/github.com/nginx/agent/v3@v3.0.1/api/grpc/mpi/v1/command_grpc.pb.go:233
github.com/nginx/nginx-gateway-fabric/internal/controller/nginx/agent/grpc/interceptor.(*ContextSetter).Stream.ContextSetter.Stream.func1
/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/controller/nginx/agent/grpc/interceptor/interceptor.go:64
google.golang.org/grpc.(*Server).processStreamingRPC
pkg/mod/google.golang.org/grpc@v1.72.2/server.go:1702
google.golang.org/grpc.(*Server).handleStream
pkg/mod/google.golang.org/grpc@v1.72.2/server.go:1819
google.golang.org/grpc.(*Server).serveStreams.func2.1
pkg/mod/google.golang.org/grpc@v1.72.2/server.go:1035
- NGINX Configuration:
kubectl -n <nginx-deployment-namespace> exec -it deployments/<nginx-deployment> -- nginx -T
Defaulted container "nginx" out of: nginx, init (init)
2025/08/04 17:46:35 [notice] 57922#57922: js vm init njs: 000071AA71839C80
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
# configuration file /etc/nginx/nginx.conf:
load_module /usr/lib/nginx/modules/ngx_http_js_module.so;
include /etc/nginx/main-includes/*.conf;
worker_processes auto;
pid /var/run/nginx/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/mime.types;
js_import /usr/lib/nginx/modules/njs/httpmatches.js;
default_type application/octet-stream;
proxy_headers_hash_bucket_size 512;
proxy_headers_hash_max_size 1024;
server_names_hash_bucket_size 256;
server_names_hash_max_size 1024;
variables_hash_bucket_size 512;
variables_hash_max_size 1024;
sendfile on;
tcp_nopush on;
server_tokens off;
server {
listen unix:/var/run/nginx/nginx-status.sock;
access_log off;
location /stub_status {
stub_status;
}
}
}
stream {
variables_hash_bucket_size 512;
variables_hash_max_size 1024;
map_hash_max_size 2048;
map_hash_bucket_size 256;
log_format stream-main '$remote_addr [$time_local] '
'$protocol $status $bytes_sent $bytes_received '
'$session_time "$ssl_preread_server_name"';
access_log /dev/stdout stream-main;
include /etc/nginx/stream-conf.d/*.conf;
}
# configuration file /etc/nginx/main-includes/main.conf:
error_log stderr info;
# configuration file /etc/nginx/conf.d/http.conf:
http2 on;
# Set $gw_api_compliant_host variable to the value of $http_host unless $http_host is empty, then set it to the value
# of $host. We prefer $http_host because it contains the original value of the host header, which is required by the
# Gateway API. However, in an HTTP/1.0 request, it's possible that $http_host can be empty. In this case, we will use
# the value of $host. See http://nginx.org/en/docs/http/ngx_http_core_module.html#var_host.
map $http_host $gw_api_compliant_host {
'' $host;
default $http_host;
}
# Set $connection_header variable to upgrade when the $http_upgrade header is set, otherwise, set it to close. This
# allows support for websocket connections. See https://nginx.org/en/docs/http/websocket.html.
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
## Returns just the path from the original request URI.
map $request_uri $request_uri_path {
"~^(?P<path>[^?]*)(\?.*)?$" $path;
}
include /etc/nginx/includes/SnippetsFilter_http_abc_ngf-default-gateway.conf;
include /etc/nginx/includes/SnippetsFilter_http_xyz_ngf-default-gateway.conf;
js_preload_object matches from /etc/nginx/conf.d/matches.json;server {
listen 443 ssl default_server;
ssl_reject_handshake on;
}
server {
listen 443 ssl;
ssl_certificate /etc/nginx/secrets/ssl_keypair_ngf_ngf--ssl.pem;
ssl_certificate_key /etc/nginx/secrets/ssl_keypair_ngf_ngf--ssl.pem;
if ($ssl_server_name != $host) {
return 421;
}
server_name abc-southcentralus.not-important.net;
include /etc/nginx/includes/SnippetsFilter_http.server_abc_ngf-default-gateway.conf;
location / {
include /etc/nginx/includes/ClientSettingsPolicy_abc_ngf-default-gateway-max-body.conf;
proxy_http_version 1.1;
proxy_set_header Host "$gw_api_compliant_host";
proxy_set_header X-Forwarded-For "$proxy_add_x_forwarded_for";
proxy_set_header X-Real-IP "$remote_addr";
proxy_set_header X-Forwarded-Proto "$scheme";
proxy_set_header X-Forwarded-Host "$host";
proxy_set_header X-Forwarded-Port "$server_port";
proxy_set_header Upgrade "$http_upgrade";
proxy_set_header Connection "$connection_upgrade";
proxy_pass http://$group_abc__mcabc_client_api_rule0$request_uri;
}
location /canary/ {
rewrite ^/canary/([^?]*)? /$1?$args? break;
proxy_http_version 1.1;
proxy_set_header Host "$gw_api_compliant_host";
proxy_set_header X-Forwarded-For "$proxy_add_x_forwarded_for";
proxy_set_header X-Real-IP "$remote_addr";
proxy_set_header X-Forwarded-Proto "$scheme";
proxy_set_header X-Forwarded-Host "$host";
proxy_set_header X-Forwarded-Port "$server_port";
proxy_set_header Upgrade "$http_upgrade";
proxy_set_header Connection "$connection_upgrade";
proxy_pass http://abc_canary-mcabc-client-api_443;
}
}
server {
listen 443 ssl;
ssl_certificate /etc/nginx/secrets/ssl_keypair_ngf_ngf--ssl.pem;
ssl_certificate_key /etc/nginx/secrets/ssl_keypair_ngf_ngf--ssl.pem;
if ($ssl_server_name != $host) {
return 421;
}
server_name xyz-southcentralus.not-important.net;
include /etc/nginx/includes/SnippetsFilter_http.server_xyz_ngf-default-gateway.conf;
location / {
include /etc/nginx/includes/ClientSettingsPolicy_xyz_ngf-default-gateway-max-body.conf;
proxy_http_version 1.1;
proxy_set_header Host "$gw_api_compliant_host";
proxy_set_header X-Forwarded-For "$proxy_add_x_forwarded_for";
proxy_set_header X-Real-IP "$remote_addr";
proxy_set_header X-Forwarded-Proto "$scheme";
proxy_set_header X-Forwarded-Host "$host";
proxy_set_header X-Forwarded-Port "$server_port";
proxy_set_header Upgrade "$http_upgrade";
proxy_set_header Connection "$connection_upgrade";
proxy_pass http://xyz_mcsvcxyz-client-api_443$request_uri;
}
}
server {
listen 443 ssl;
ssl_certificate /etc/nginx/secrets/ssl_keypair_ngf_ngf--ssl.pem;
ssl_certificate_key /etc/nginx/secrets/ssl_keypair_ngf_ngf--ssl.pem;
if ($ssl_server_name != $host) {
return 421;
}
server_name ~^;
location / {
return 404 "";
proxy_http_version 1.1;
}
}
server {
listen unix:/var/run/nginx/nginx-503-server.sock;
access_log off;
return 503;
}
server {
listen unix:/var/run/nginx/nginx-500-server.sock;
access_log off;
return 500;
}
upstream abc_canary-mcabc-client-api_443 {
random two least_conn;
zone abc_canary-mcabc-client-api_443 512k;
server 10.240.1.162:9091;
server 10.240.1.244:9091;
}
upstream abc_mcabc-client-api_443 {
random two least_conn;
zone abc_mcabc-client-api_443 512k;
server 10.240.1.162:9091;
server 10.240.1.244:9091;
}
upstream xyz_mcsvcxyz-client-api_443 {
random two least_conn;
zone xyz_mcsvcxyz-client-api_443 512k;
server 10.240.2.167:9091;
server 10.240.2.144:9091;
}
upstream invalid-backend-ref {
random two least_conn;
server unix:/var/run/nginx/nginx-500-server.sock;
}
split_clients $request_id $group_abc__mcabc_client_api_rule0 {
100.00% abc_mcabc-client-api_443;
# 0.00% abc_canary-mcabc-client-api_443;
}
# configuration file /etc/nginx/includes/SnippetsFilter_http_abc_ngf-default-gateway.conf:
log_format abc escape=json '{"uri_stem": "$uri", "uri_query": "$args", "protocol": "$server_protocol", "ip": "$remote_addr", "user_agent": "$http_user_agent", "host": "$host", "status": "$status", "request_method": "$request_method", "request_bytes": "$request_length", "response_bytes": "$body_bytes_sent", "time_taken": "$request_time", "proxy_ips": "$proxy_add_x_forwarded_for", "request_id": "$sent_http_request_id", "afd_request_id": "$http_x_azure_ref", "client_ip": "$http_x_azure_clientip"}';
access_log /dev/stdout abc;
# configuration file /etc/nginx/includes/SnippetsFilter_http_xyz_ngf-default-gateway.conf:
log_format xyz escape=json '{"uri_stem": "$uri", "uri_query": "$args", "protocol": "$server_protocol", "ip": "$remote_addr", "user_agent": "$http_user_agent", "host": "$host", "status": "$status", "request_method": "$request_method", "request_bytes": "$request_length", "response_bytes": "$body_bytes_sent", "time_taken": "$request_time", "proxy_ips": "$proxy_add_x_forwarded_for", "request_id": "$sent_http_request_id", "afd_request_id": "$http_x_azure_ref", "client_ip": "$http_x_azure_clientip"}';
access_log /dev/stdout xyz;
# configuration file /etc/nginx/includes/SnippetsFilter_http.server_abc_ngf-default-gateway.conf:
client_header_buffer_size 16k;
large_client_header_buffers 2 16k;
location /status/startup {
deny all;
return 404;
}
location /status/metrics {
deny all;
return 404;
}
location /recover {
default_type text/html;
add_header Retry-After 60 always;
return 503 "<body><h1>503 Service unavailable</h1></body>";
}
proxy_connect_timeout 60s;
proxy_read_timeout 60s;
proxy_send_timeout 60s;
# configuration file /etc/nginx/includes/ClientSettingsPolicy_abc_ngf-default-gateway-max-body.conf:
client_max_body_size 1m;
# configuration file /etc/nginx/includes/SnippetsFilter_http.server_xyz_ngf-default-gateway.conf:
client_header_buffer_size 16k;
large_client_header_buffers 2 16k;
location /status/startup {
deny all;
return 404;
}
location /status/metrics {
deny all;
return 404;
}
location /recover {
default_type text/html;
add_header Retry-After 60 always;
return 503 "<body><h1>503 Service unavailable</h1></body>";
}
proxy_connect_timeout 60s;
proxy_read_timeout 60s;
proxy_send_timeout 60s;
# configuration file /etc/nginx/includes/ClientSettingsPolicy_xyz_ngf-default-gateway-max-body.conf:
client_max_body_size 5m;
# configuration file /etc/nginx/mime.types:
types {
text/html html htm shtml;
text/css css;
text/xml xml;
image/gif gif;
image/jpeg jpeg jpg;
application/javascript js;
application/atom+xml atom;
application/rss+xml rss;
text/mathml mml;
text/plain txt;
text/vnd.sun.j2me.app-descriptor jad;
text/vnd.wap.wml wml;
text/x-component htc;
image/avif avif;
image/png png;
image/svg+xml svg svgz;
image/tiff tif tiff;
image/vnd.wap.wbmp wbmp;
image/webp webp;
image/x-icon ico;
image/x-jng jng;
image/x-ms-bmp bmp;
font/woff woff;
font/woff2 woff2;
application/java-archive jar war ear;
application/json json;
application/mac-binhex40 hqx;
application/msword doc;
application/pdf pdf;
application/postscript ps eps ai;
application/rtf rtf;
application/vnd.apple.mpegurl m3u8;
application/vnd.google-earth.kml+xml kml;
application/vnd.google-earth.kmz kmz;
application/vnd.ms-excel xls;
application/vnd.ms-fontobject eot;
application/vnd.ms-powerpoint ppt;
application/vnd.oasis.opendocument.graphics odg;
application/vnd.oasis.opendocument.presentation odp;
application/vnd.oasis.opendocument.spreadsheet ods;
application/vnd.oasis.opendocument.text odt;
application/vnd.openxmlformats-officedocument.presentationml.presentation
pptx;
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
xlsx;
application/vnd.openxmlformats-officedocument.wordprocessingml.document
docx;
application/vnd.wap.wmlc wmlc;
application/wasm wasm;
application/x-7z-compressed 7z;
application/x-cocoa cco;
application/x-java-archive-diff jardiff;
application/x-java-jnlp-file jnlp;
application/x-makeself run;
application/x-perl pl pm;
application/x-pilot prc pdb;
application/x-rar-compressed rar;
application/x-redhat-package-manager rpm;
application/x-sea sea;
application/x-shockwave-flash swf;
application/x-stuffit sit;
application/x-tcl tcl tk;
application/x-x509-ca-cert der pem crt;
application/x-xpinstall xpi;
application/xhtml+xml xhtml;
application/xspf+xml xspf;
application/zip zip;
application/octet-stream bin exe dll;
application/octet-stream deb;
application/octet-stream dmg;
application/octet-stream iso img;
application/octet-stream msi msp msm;
audio/midi mid midi kar;
audio/mpeg mp3;
audio/ogg ogg;
audio/x-m4a m4a;
audio/x-realaudio ra;
video/3gpp 3gpp 3gp;
video/mp2t ts;
video/mp4 mp4;
video/mpeg mpeg mpg;
video/quicktime mov;
video/webm webm;
video/x-flv flv;
video/x-m4v m4v;
video/x-mng mng;
video/x-ms-asf asx asf;
video/x-ms-wmv wmv;
video/x-msvideo avi;
}
# configuration file /etc/nginx/stream-conf.d/stream.conf:
server {
listen unix:/var/run/nginx/connection-closed-server.sock;
return "";
}
Additional context
Add any other context about the problem here. Any log files you want to share.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status