Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuse causes glusterd to dump core #1225

Closed
PisikeSipelgas opened this issue May 7, 2020 · 18 comments
Closed

fuse causes glusterd to dump core #1225

PisikeSipelgas opened this issue May 7, 2020 · 18 comments

Comments

@PisikeSipelgas
Copy link

PisikeSipelgas commented May 7, 2020

Description of problem:
Created 2 mirrored gluserfs volumes. On second node fuse causes glusterd to crash while issuing "pg_ctl initdb" against glusterfs mount. First node (xt-ha1) seems to be not affected. Only when issuing "initdb" command on second node (xt-ha2) causes glusterd to crash.
Both machines are deployed from same vmware template, both are updated and have same software/patchlevel versions.

The exact command to reproduce the issue:
postgres@xt-ha2:~$ /usr/lib/postgresql/10/bin/pg_ctl initdb -D /pgdata/pgdata

The full output of the command that failed:

postgres@xt-ha2:~$ /usr/lib/postgresql/10/bin/pg_ctl initdb -D /pgdata/pgdata
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /pgdata/pgdata ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Europe/Tallinn
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... 2020-05-07 10:51:37.616 EEST [8986] LOG: could not open file "pg_wal/000000010000000000000001": Software caused connection abort
2020-05-07 10:51:37.616 EEST [8986] FATAL: could not open file "pg_wal/000000010000000000000001": Transport endpoint is not connected
child process exited with exit code 1
initdb: removing contents of data directory "/pgdata/pgdata"
could not open directory "/pgdata/pgdata": Transport endpoint is not connected
initdb: failed to remove contents of data directory
pg_ctl: database system initialization failed

Expected result:

postgres@xt-ha1:~$ /usr/lib/postgresql/10/bin/pg_ctl initdb -D /pgdata/pgdata The files belonging to this database system will be owned by user "postgres". This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /pgdata/pgdata ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Europe/Tallinn
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

/usr/lib/postgresql/10/bin/pg_ctl -D /pgdata/pgdata -l logfile start

postgres@xt-ha1:~$

Stack trace:
glusterd_trace.txt

Additional info:
/etc/hosts:
192.168.57.186 xt-ha1.example.com
192.168.57.187 xt-ha2.example.com

/dev/mapper/vgglupgdata-data01 on /glupgdata type xfs (rw,relatime,attr2,inode64,noquota)
/dev/mapper/vgglupgbackup-backup on /glupgbackup type xfs (rw,relatime,attr2,inode64,noquota)

192.168.57.187:/glu-pgdata on /pgdata type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)
192.168.57.187:/glu-pgbackup on /pgbackup type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)

- The output of the gluster volume info command:

# gluster volume info

Volume Name: glu-pgbackup
Type: Replicate
Volume ID: 30d323bd-3eab-4e36-9e14-c1508b03b804
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: xt-ha1.example.com:/glupgbackup/pgbackup
Brick2: xt-ha2.example.com:/glupgbackup/pgbackup
Options Reconfigured:
cluster.self-heal-daemon: enable
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off
features.barrier: disable

Volume Name: glu-pgdata
Type: Replicate
Volume ID: 232c30d0-8c5e-4a71-9fa6-45f39d64fc6c
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: xt-ha1.example.com:/glupgdata/pgdata
Brick2: xt-ha2.example.com:/glupgdata/pgdata
Options Reconfigured:
features.barrier: disable
performance.client-io-threads: off
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
cluster.self-heal-daemon: enable

- The operating system / glusterfs version:
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic

root@xt-ha2:/# dpkg -l|egrep "fuse|gluster"
ii  fuse                                  2.9.7-1ubuntu1                                  amd64        Filesystem in Userspace
ii  glusterfs-client                      7.5-ubuntu1~bionic1                             amd64        clustered file-system (client package)
ii  glusterfs-common                      7.5-ubuntu1~bionic1                             amd64        GlusterFS common libraries and translator modules
ii  glusterfs-dbg                         7.5-ubuntu1~bionic1                             amd64        GlusterFS debugging symbols
ii  glusterfs-server                      7.5-ubuntu1~bionic1                             amd64        clustered file-system (server package)
ii  libfuse2:amd64                        2.9.7-1ubuntu1                                  amd64        Filesystem in Userspace (library)
@mohit84
Copy link
Contributor

mohit84 commented May 7, 2020

This is a known issue and it is fixed from the patch (https://review.gluster.org/#/c/glusterfs/+/24231/)

@xhernandez
Copy link
Contributor

I think there must be some corner case not fixed by the patch, because it shouldn't fail in 7.5 (the patch is already present in 7.5)

@PisikeSipelgas PisikeSipelgas changed the title fuse causes gluserd to dump core fuse causes glusterd to dump core May 7, 2020
@xhernandez
Copy link
Contributor

While the cause is analyzed, you can disable open-behind to avoid the crash:

# gluster volume set <volname> open-behind off

@PisikeSipelgas
Copy link
Author

root@xt-ha2:~# gluster volume list
glu-pgbackup
glu-pgdata

root@xt-ha2:~# gluster volume set glu-pgdata open-behind off
volume set: success
root@xt-ha2:~# gluster volume set glu-pgbackup open-behind off
volume set: success

root@xt-ha2:~# su - postgres

postgres@xt-ha2:~$ /usr/lib/postgresql/10/bin/pg_ctl initdb -D /pgdata/pgdata
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /pgdata/pgdata ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Europe/Tallinn
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/lib/postgresql/10/bin/pg_ctl -D /pgdata/pgdata -l logfile start

postgres@xt-ha2:~$ rm -rf /pgdata/pgdata/*
postgres@xt-ha2:~$ logout

root@xt-ha2:~# gluster volume set glu-pgdata open-behind on
volume set: success

root@xt-ha2:~# su - postgres
postgres@xt-ha2:~$ /usr/lib/postgresql/10/bin/pg_ctl initdb -D /pgdata/pgdata
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /pgdata/pgdata ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Europe/Tallinn
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... 2020-05-08 14:34:48.192 EEST [17478] LOG:  could not open file "pg_wal/000000010000000000000001": Software caused connection abort
2020-05-08 14:34:48.192 EEST [17478] FATAL:  could not open file "pg_wal/000000010000000000000001": Transport endpoint is not connected
child process exited with exit code 1
initdb: removing contents of data directory "/pgdata/pgdata"
could not open directory "/pgdata/pgdata": Transport endpoint is not connected
initdb: failed to remove contents of data directory
pg_ctl: database system initialization failed
postgres@xt-ha2:~$

@xhernandez
Copy link
Contributor

Can you post the output of gluster volume info and share the mount log and the core dump ?

@PisikeSipelgas
Copy link
Author

PisikeSipelgas commented May 9, 2020

gluster volume info

Volume Name: glu-pgbackup
Type: Replicate
Volume ID: 840fbeae-7e59-4893-b3a8-30343d85c44d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: xt-ha1.example.com:/glupgbackup/pgbackup
Brick2: xt-ha2.example.com:/glupgbackup/pgbackup
Options Reconfigured:
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off

Volume Name: glu-pgdata
Type: Replicate
Volume ID: 5df17c0c-3648-43f6-8dad-76620ce2ca9c
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: xt-ha1.example.com:/glupgdata/pgdata
Brick2: xt-ha2.example.com:/glupgdata/pgdata
Options Reconfigured:
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off

Here are the log files:
pgdata.log
glustershd.log
glusterd.log

And zipped core:
core.zip

@xhernandez
Copy link
Contributor

I'm not sure why, but open-behind is still enabled (it should appear as disabled in the Options Reconfigured section from gluster volume info)

Looking at the pgdata log, I can also see that open-behind is present in the last configuration just before crashing, and the crash is related to open-behind.

Can you disable it again and check that it's actually disabled with a gluster volume info ? if possible, restart the volume and remount, just in case something else is wrong.

@PisikeSipelgas
Copy link
Author

I have similar setup on Centos7 and i am not able to reproduce this situation there. Only with ubuntu and on second node.
"open-behind" was "on" because i re-created those volumes. It works when open-behind is disabled.

root@xt-ha2:~# umount /pgdata 
root@xt-ha2:~# umount /pgbackup

root@xt-ha2:~# gluster volume set glu-pgdata open-behind off
volume set: success
root@xt-ha2:~# gluster volume set glu-pgbackup open-behind off
volume set: success

root@xt-ha2:~# mount -a
root@xt-ha2:~# gluster volume info

Volume Name: glu-pgbackup
Type: Replicate
Volume ID: 840fbeae-7e59-4893-b3a8-30343d85c44d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: xt-ha1.example.com:/glupgbackup/pgbackup
Brick2: xt-ha2.example.com:/glupgbackup/pgbackup
Options Reconfigured:
performance.open-behind: off
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off

Volume Name: glu-pgdata
Type: Replicate
Volume ID: 5df17c0c-3648-43f6-8dad-76620ce2ca9c
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: xt-ha1.example.com:/glupgdata/pgdata
Brick2: xt-ha2.example.com:/glupgdata/pgdata
Options Reconfigured:
performance.open-behind: off
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off


root@xt-ha2:~# su - postgres
postgres@xt-ha2:~$ rm -rf /pgdata/pgdata/*
postgres@xt-ha2:~$ /usr/lib/postgresql/10/bin/pg_ctl initdb -D /pgdata/pgdata
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /pgdata/pgdata ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Europe/Tallinn
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/lib/postgresql/10/bin/pg_ctl -D /pgdata/pgdata -l logfile start

postgres@xt-ha2:~$

@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/24451 has been posted that references this issue.

open-behind: rewrite of internal logic

There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez xhernandez@redhat.com

@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/24542 has been posted that references this issue.

open-behind: rewrite of internal logic

There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez xhernandez@redhat.com

@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/24544 has been posted that references this issue.

open-behind: rewrite of internal logic

There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez xhernandez@redhat.com

@xhernandez
Copy link
Contributor

The patch posted should fix the issue, but it's a big change, so I recommend testing it before going to production with open-behind enabled.

gluster-ant pushed a commit that referenced this issue Jun 15, 2020
There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
gluster-ant pushed a commit that referenced this issue Jun 29, 2020
There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
@FrelDX
Copy link

FrelDX commented Oct 15, 2020

[root@test1 /]# ll /var/lib/portsip/pgsql/data/pg_stat_tmp/
ls: cannot access /var/lib/portsip/pgsql/data/pg_stat_tmp/global.stat: Transport endpoint is not connected
ls: cannot access /var/lib/portsip/pgsql/data/pg_stat_tmp/db_0.stat: Transport endpoint is not connected
ls: cannot access /var/lib/portsip/pgsql/data/pg_stat_tmp/db_16384.stat: Transport endpoint is not connected
total 0
-????????? ? ? ? ? ? db_0.stat
-????????? ? ? ? ? ? db_16384.stat
-????????? ? ? ? ? ? global.stat
[root@test1 /]#

@FrelDX
Copy link

FrelDX commented Oct 15, 2020

I also encounter this problem. Running PgSQL will cause PgSQL to fail to start when a gluster node is unavailable

@xhernandez
Copy link
Contributor

@FrelDX how is your problem related with this issue ?

Does the problem disappear if you disable open-behind ?

@faciulula
Copy link

faciulula commented Oct 15, 2020

@FrelDX Disable open-behind has helped my a lot.

@jkroonza
Copy link
Contributor

Whilst open-behind helps our case a LOT it does not eliminate it.

@pranithk
Copy link
Member

@jkroonza Please open a new issue with the stacktrace when it happens again. This issue is tracking open-behind issue which is now fixed in both the latest release-7 and 8 and master. Closing it.

csabahenk pushed a commit to csabahenk/glusterfs that referenced this issue Mar 7, 2023
There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Upstream patch:
> Upstream-patch-link: https://review.gluster.org/#/c/glusterfs/+/24451
> Change-Id: I6376a5491368e0e1c283cc452849032636261592
> Fixes: gluster#1225
> Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>

BUG: 1830713
Change-Id: I6376a5491368e0e1c283cc452849032636261592
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Reviewed-on: https://code.engineering.redhat.com/gerrit/224487
Tested-by: RHGS Build Bot <nigelb@redhat.com>
Reviewed-by: Sunil Kumar Heggodu Gopala Acharya <sheggodu@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants