Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nginx 反向代理请求处理耗时很久的后端 #30

Open
san3Xian opened this issue May 10, 2022 · 0 comments
Open

Nginx 反向代理请求处理耗时很久的后端 #30

san3Xian opened this issue May 10, 2022 · 0 comments

Comments

@san3Xian
Copy link
Owner

san3Xian commented May 10, 2022

如果nginx proxy_pass指向的后端在处理某些请求耗时非常久(>3分钟 >5分钟 等),需要注意开启TCP keepalive

  • 一般来说个人建议如果后端处理某个请求需要很久才返回response,应该将请求设计为异步请求,即通过不断得请求某个state接口获取处理状态,而不是一直holding等待返回

首先这种情况下,proxy_connect_timeout, proxy_read_timeout 和 proxy_send_timeout 参数值都要根据实际情况调大
然后需要注意nginx -> upstream的链路上是否有防火墙策略配置,特别是有状态型防火墙
nginx默认对client和 upstream都没有开启 TCP keepalive,即TCP会话存活检查 (不是HTTP keepalive,不是TCP会话复用!!!)
这种情况下,如果nginx -> upstream的链路上部署了有状态防火墙策略配置且该防火墙配置了300s timeout
若一个请求后端处理需要530秒,nginx将处理请求转发给upstream,nginx就会一直hold着会话(ESTABLISHED)
在等候后端处理完毕回包期间,该TCP会话上nginx<->upstream之间不会有额外的流量
这样就容易导致链路上的有状态防火墙判定该会话已经失效,然后拦截
当后端处理完毕回包时,报文无法被nginx接收到,然后待nginx 到达 proxy_read_timeout阈值后,nginx向请求方抛出504 timeout错误
且此时在nginx error.log中可见 upstream timed out (110: Connection timed out) while reading response header from upstream 错误字样

这种情况下,需要启用nginx对upstream的tcp keepalive (即socket中的SO_KEEPALIVE option)

proxy_socket_keepalive on;

开启后,nginx就会在会话中根据内核参数中的 net.ipv4.tcp_keepalive_intvl , net.ipv4.tcp_keepalive_probes 以及 net.ipv4.tcp_keepalive_time 配置的规则对 upstream 发送keepalive probe packet 探测会话存活,同时避免链路有状态防火墙拦截请求(建议多数情况下都开启,根据实际情况调整参数值,方便nginx在upstream不健康时主动断开)

ps:

  • 如果需要nginx对client发送tcp keepalive probe,需要在listen指令中配置 so_keepalive=on (忽略此参数的话,操作系统的设置将对套接字有效,而linux下TCP KeepAlive并不是默认开启的,在Linux系统上没有一个全局的选项去开启TCP的KeepAlive。需要开启KeepAlive的应用必须在TCP的socket中单独开启)
  • nginx upstream block配置中, 带有keepalive的指令基本都是用于配置http keepalive的(会话复用),与上述提及的TCP keepalive无关

简述关于tcp keepalive probe (转):

Linux Kernel有三个选项影响到KeepAlive的行为:

tcp_keepalive_time 7200// 距离上次传送数据多少时间未收到新报文判断为开始检测,单位秒,默认7200s
tcp_keepalive_intvl 75// 检测开始每多少时间发送心跳包,单位秒,默认75s
tcp_keepalive_probes 9// 发送几次心跳包对方未响应则close连接,默认9次

TCP socket也有三个选项和内核对应,通过setsockopt系统调用针对单独的socket进行设置:

TCPKEEPCNT: 覆盖 tcpkeepaliveprobes
TCPKEEPIDLE: 覆盖 tcpkeepalivetime
TCPKEEPINTVL: 覆盖 tcpkeepalive_intvl

参考:

@san3Xian san3Xian changed the title Nginx 反向代理请求处理时间很长的后端 Nginx 反向代理请求处理耗时很久的后端 May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant