New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
glusterfsd 10.4 core dump in __gf_free - potetially related to cache invalidation #4255
Comments
@agronaught Thanks for sharing the stacktrace of core to analyze the issue. The brick process is getting a crash due to |
A brick process may crash while it try to send upcall notification to the client and client disconnect is being process. Solution: Avoid upcall event notification to the client if disconnect is being process for the same client. Fixes: gluster#4255 Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
thank you very much for this one. that confirms a suspicion and provides a solution. Cheers. |
A brick process may crash while it try to send upcall notification to the client and client disconnect is being process. Solution: Avoid upcall event notification to the client if disconnect is being process for the same client. Fixes: #4255 Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
A brick process may crash while it try to send upcall notification to the client and client disconnect is being process. Solution: Avoid upcall event notification to the client if disconnect is being process for the same client. > Fixes: gluster#4255 > Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (Cherry picked from commit b98d0d7) > (Reviewed on upstream release gluster#4256) Fixes: gluster#4255 Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
A brick process may crash while it try to send upcall notification to the client and client disconnect is being process. Solution: Avoid upcall event notification to the client if disconnect is being process for the same client. > Fixes: gluster#4255 > Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (Cherry picked from commit b98d0d7) > (Reviewed on upstream release gluster#4256) Fixes: gluster#4255 Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
A brick process may crash while it try to send upcall notification to the client and client disconnect is being process. Solution: Avoid upcall event notification to the client if disconnect is being process for the same client. > Fixes: gluster#4255 > Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (Cherry picked from commit b98d0d7) > (Reviewed on upstream release gluster#4256) Fixes: gluster#4255 Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
A brick process may crash while it try to send upcall notification to the client and client disconnect is being process. Solution: Avoid upcall event notification to the client if disconnect is being process for the same client. > Fixes: #4255 > Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (Cherry picked from commit b98d0d7) > (Reviewed on upstream release #4256) Fixes: #4255 Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
A brick process may crash while it try to send upcall notification to the client and client disconnect is being process. Solution: Avoid upcall event notification to the client if disconnect is being process for the same client. > Fixes: #4255 > Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (Cherry picked from commit b98d0d7) > (Reviewed on upstream release #4256) Fixes: #4255 Change-Id: I80478d7f4a038b04a10fb21a1290b4309e9fe4dd Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Description of problem:
We are seeing the occasional core dump with a brick going offline in a replica-3 file system. This is potentially load related, and it would be a very short and sharp load spike if this is indeed related to the core issue.
This was previously occurring weekly when bitrot checking was enabled, since disabling bitrot checks the crash rate is now anecdotally monthly. I have not yet managed to reproduce this error outside the production environment.
This is the first time since disabling bitrot checking and the first time we've had a core file generated.
potentially related to: #4241
The exact command to reproduce the issue:
I have not yet managed to reproduce this error outside the production environment.
The full output of the command that failed:
Expected results:
Mandatory info:
- The output of the
gluster volume info
command:- The output of the
gluster volume status
command:- The output of the
gluster volume heal
command:**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/
**- Is there any crash ? Provide the backtrace and coredump
gdb_thread_full.txt.gz
gdb_bt_full.txt.gz
core file is large (>50MB compressed)
Additional info:
will be disabling performance.cache-invalidation and features.cache-invalidation on this system as it looks to be (potentially) related.
- The operating system / glusterfs version:
Centos 9 kernel 5.14.0-283.el9.x86_64
Gluster 10.4-1
brick FS: ZFS 2.1.9
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration
The text was updated successfully, but these errors were encountered: