Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug:1618932] dht-selfheal.c: Directory selfheal failed #957

Closed
gluster-ant opened this issue Mar 12, 2020 · 28 comments
Closed

[bug:1618932] dht-selfheal.c: Directory selfheal failed #957

gluster-ant opened this issue Mar 12, 2020 · 28 comments
Assignees
Labels
Migrated Type:Bug wontfix Managed by stale[bot]

Comments

@gluster-ant
Copy link
Collaborator

URL: https://bugzilla.redhat.com/1618932
Creator: frostyplanet at gmail
Time: 20180818T12:49:41

Created attachment 1476762
gfapi log

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

There's mulitple application using gfapi concurrently creating file in the same directory
(e51fd83622674cc9) and (e21ea6832d2b13d0) are log from different application processes.

application log

timezone is GMT+8

2018-08-18 19:35:03,703 DEBUG -31021968- writing to file cluster=4 FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88 (app:1461)(e51fd83622674cc9)
2018-08-18 19:35:03,734 DEBUG -32369552- writing to file cluster=4 FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88 (app:1461)(e21ea6832d2b13d0)
2018-08-18 19:35:03,786 DEBUG -31022448- Create new directory [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m] on cluster [4] ((unknown file): 0)(e51fd83622674cc9)
2018-08-18 19:35:03,795 CRITICAL -31021968- Failed to open cluster [4] object [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/ 0004_bfab2d1ea2da11e8a3196c92bf5c1b88] with mode [w]: [[Errno 5] Input/output error] (app:1461)(e51fd83622674cc9)
2018-08-18 19:35:03,903 DEBUG -32366672- Directory [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m] exists on cluster [4] ((unknown file): 0)(e21ea6832d2b13d0)
2018-08-18 19:35:03,945 DEBUG -32369552- Open cluster [4] file [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88] with mode [w] (app:1461)(e21ea6832d2b13d0)
2018-08-18 19:35:04,127 DEBUG -31021968- Open cluster [4] file [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88] with mode [w] (app:1461)(e51fd83622674cc9)
2018-08-18 19:35:04,391 INFO -32369552- Rename file: cluster=4 src=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88 dst=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001 (app:1461)(e21ea6832d2b13d0)
2018-08-18 19:35:04,485 INFO -31021968- Rename file: cluster=4 src=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88 dst=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004 (app:1461)(e51fd83622674cc9)

Actual results:

IO error happended when creating file, success after retry

dht-selfheal failure is observed in gfapi log, there is unmatched inode unlock
request reported from brick.

Expected results:

Additional info:

"gluster volume status" output is all ok,
but runing "gluster volume heal vol0 info" blocks and no output

gluster volume info

Volume Name: vol0
Type: Distributed-Replicate
Volume ID: 18e1c05d-570a-4c97-aa91-ef984881c4f2
Status: Started
Snapshot Count: 0
Number of Bricks: 36 x 3 = 108
Transport-type: tcp

Options Reconfigured:
locks.trace: false
client.event-threads: 6
cluster.self-heal-daemon: enable
performance.write-behind: True
transport.keepalive: True
cluster.rebal-throttle: lazy
server.event-threads: 4
performance.io-cache: False
nfs.disable: True
cluster.quorum-type: auto
network.ping-timeout: 120
features.cache-invalidation: False
performance.read-ahead: False
performance.client-io-threads: True
cluster.server-quorum-type: none
performance.md-cache-timeout: 0
performance.readdir-ahead: True

@gluster-ant
Copy link
Collaborator Author

Time: 20180818T12:52:01
frostyplanet at gmail commented:
production cluster recently upgraded from 3.7.6 to gluster-3.10.12

@gluster-ant
Copy link
Collaborator Author

Time: 20180818T13:00:04
frostyplanet at gmail commented:
Created attachment 1476763
error log in brick

no lock found when doing inode unlock, it seems unblock request was from dht self-heal

@gluster-ant
Copy link
Collaborator Author

Time: 20180818T13:12:37
frostyplanet at gmail commented:
Created attachment 1476765
glfsheal log

there was many "has not responded in the last 120 seconds, disconnecting." in it

@gluster-ant
Copy link
Collaborator Author

Time: 20180820T04:47:39
frostyplanet at gmail commented:
Input/output seen on 0.01% operations, reproduce in every hour

@gluster-ant
Copy link
Collaborator Author

Time: 20180820T04:52:59
frostyplanet at gmail commented:
*** Bug 1618925 has been marked as a duplicate of this bug. ***

@gluster-ant
Copy link
Collaborator Author

Time: 20180820T04:56:16
frostyplanet at gmail commented:
Host eviroment
Centos 7.2
3.10.0-327.18.2.el7.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux

@gluster-ant
Copy link
Collaborator Author

Time: 20180820T13:38:32
frostyplanet at gmail commented:
I have another offline storage cluster to reproduce the "gluster heal info" hanging problem. "killall glusterd glusterfsd" then restart glusterd did not fix it.

I have triggered state dump, attach them later.

@gluster-ant
Copy link
Collaborator Author

Time: 20180820T13:42:04
frostyplanet at gmail commented:
Created attachment 1477213
state dump of offline cluster

Another offline cluster which upgraded from 3.7.6 to 3.10.12. "gluster volum heal info hang" after stress test. These are state dump file.

@gluster-ant
Copy link
Collaborator Author

Time: 20180821T08:53:56
frostyplanet at gmail commented:
gdb on "glfsheal vol0" command, stuck at heal/src/glfs-heal.c:1401

@gluster-ant
Copy link
Collaborator Author

Time: 20180821T12:15:37
frostyplanet at gmail commented:
upgraded to 3.12.11, "glfsheal vol0" stuck problem still persist

@gluster-ant
Copy link
Collaborator Author

Time: 20180914T22:46:12
frostyplanet at gmail commented:
I have revert the following two commit in release-3.12 branch to get "gluster volume heal vol0 info" working properly.

It may lead to blocking client.event-thread. The situation was described in

https://lists.gluster.org/pipermail/gluster-users/2018-September/034871.html
https://lists.gluster.org/pipermail/gluster-users/2018-September/034876.html

commit 086f1d0
Author: Xavier Hernandez jahernan@redhat.com
Date: Thu Feb 1 16:06:32 2018 +0100

cluster/afr: remove unnecessary child_up initialization

The child_up array was initialized with all elements being -1 to
allow afr_notify() to differentiate down bricks from bricks that
haven't reported yet. With current implementation this is not needed
anymore and it was causing unexpected results when other parts of
the code considered that if child_up[i] != 0, it meant that it was up.

Backport of:
> BUG: 1541038

Change-Id: I2a9d712ee64c512f24bd5cd3a48dcb37e3139472
BUG: 1541930
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>

commit 94faf8c
Author: karthik-us ksubrahm@redhat.com
Date: Wed Sep 20 18:30:21 2017 +0530

cluster/afr: Sending subvol up/down events when subvol comes up or goes down

> BUG: 1493539

(cherry picked from commit 3bbb4fe4b33dc3a3ed068ed2284077f2a4d8265a)

Change-Id: I6580351b245d5f868e9ddc6a4eb4dd6afa3bb6ec
BUG: 1492066
Signed-off-by: karthik-us <ksubrahm@redhat.com>

@gluster-ant
Copy link
Collaborator Author

Time: 20180920T07:20:25
pkarampu at redhat commented:
hi,
I went through the logs, as you pointed out, issue seems to be the ping-timeout. It is not clear as to why it is happening though. Ping timeout generally happens when the brick doesn't respond in ping-timeout seconds, which it looks to be configured as 120 seconds. Do you have this problem only with heal info command or the mount's I/O is also hanging (as per your bz description it doesn't look like it, but want a confirmation)?

If it only happens with heal info, will it be possible for you to help find the RC? we need to check if there is any dead-lock in the heal-info process.

Pranith

@gluster-ant
Copy link
Collaborator Author

Time: 20180920T12:26:27
frostyplanet at gmail commented:
It only happend in "heal info", and always reproduciable.
"gluster volume status" shows all bricks were online.

Mount by fuse were instant. Write throught gfapi are OK most of the time.
Except there were very rare case that gfapi write operation return after blocking several hours.

It turned out that "heal info" problem was not related to IO error problem. Our heals are all normal for now (reverted the two commit I pointed out earlier).
IO error problem still reproduciable for the moment.

@gluster-ant
Copy link
Collaborator Author

Time: 20180920T12:39:59
frostyplanet at gmail commented:

will it be possible for you to help find the RC?

yes, if there are future patches I'm happy to test them out

@gluster-ant
Copy link
Collaborator Author

Time: 20180920T12:49:34
frostyplanet at gmail commented:
Created attachment 1485128
glfs heal trace level log with version 3.12.11

using v3.12.11, change log to trace level, with some attitional debug log added.

@gluster-ant
Copy link
Collaborator Author

Time: 20180920T12:50:41
frostyplanet at gmail commented:
Created attachment 1485129
glfs heal trace level log with version 3.12.1

with version 3.12.1, sucessful run

@gluster-ant
Copy link
Collaborator Author

Time: 20180920T12:59:43
frostyplanet at gmail commented:
(In reply to frostyplanet from comment #13)

Except there were very rare case that gfapi write operation return after
blocking several hours.

Ignore this, I checked the log it was a bug not related to gluster

@gluster-ant
Copy link
Collaborator Author

Time: 20180920T14:00:45
frostyplanet at gmail commented:
Previously in my debuging with v3.12.11, after setting "network.ping-timeout" to 0, the client side behaviour is looping in reconnecting to all the bricks, success after a long time. If set client.event-thread to 24 (equal to number of bricks), heal info success in 30 seconds. So it leads to my suspect.
Afterwards I used git bisect to locate commit 94faf8c and 086f1d0 were the cause.

commit 94faf8c added two events to fire, but EVENT_AFR_SUBVOL_UP/EVENT_AFR_SUBVOLS_DOWN are not processed in any where of the code.

Commit 086f1d0 removed some variable initialization code, seams a regression.

@gluster-ant
Copy link
Collaborator Author

Time: 20180926T22:48:27
frostyplanet at gmail commented:
Created attachment 1487508
gfapi log for io error in 3.12.14

New sample of gfapi log in version 3.12.14. IO error happended while creating file

@gluster-ant
Copy link
Collaborator Author

Time: 20181023T14:55:03
srangana at redhat commented:
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

@gluster-ant
Copy link
Collaborator Author

Time: 20181029T08:51:52
pkarampu at redhat commented:
Nitya,
As per the log in comment-19, dht_create() is failing as it couldn't find any subvolume for doing create I think. When can this happen?

@gluster-ant
Copy link
Collaborator Author

Time: 20181029T11:29:28
nbalacha at redhat commented:
If multiple clients are creating the same directory, it is possible that one client gets an incomplete layout as the mkdir is not complete. It should, however, heal it itself. If for some reason it could not heal the directory we could run into this issue.

Can you set client-log-level to DEBUG and send us the logs once you hit this? Or better yet, is there a test program I can use to try this out?

@gluster-ant
Copy link
Collaborator Author

Time: 20181029T12:20:23
nbalacha at redhat commented:
(In reply to Nithya Balachandran from comment #22)

If multiple clients are creating the same directory, it is possible that one
client gets an incomplete layout as the mkdir is not complete. It should,
however, heal it itself. If for some reason it could not heal the directory
we could run into this issue.

Can you set client-log-level to DEBUG and send us the logs once you hit
this? Or better yet, is there a test program I can use to try this out?

Just noticed that trace logs are already provided. I will take a look and get back.

@gluster-ant
Copy link
Collaborator Author

Time: 20181120T08:34:41
ravishankar at redhat commented:
Changing the component to DHT as a part of triaging. Please assign back to replicate if you believe there is something that needs investigation in AFR.

@gluster-ant
Copy link
Collaborator Author

Time: 20191104T06:30:37
nbalacha at redhat commented:
(In reply to Nithya Balachandran from comment #23)

(In reply to Nithya Balachandran from comment #22)

If multiple clients are creating the same directory, it is possible that one
client gets an incomplete layout as the mkdir is not complete. It should,
however, heal it itself. If for some reason it could not heal the directory
we could run into this issue.

Can you set client-log-level to DEBUG and send us the logs once you hit
this? Or better yet, is there a test program I can use to try this out?

Just noticed that trace logs are already provided. I will take a look and
get back.

The logs indicate that the client version is 3.10.
[2018-08-18 11:35:03.685415] D [write-behind.c:1696:wb_process_queue] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fc082b0ffab] (--> /usr/lib64/glusterfs/3.10.12/xlator/performance/write-behind.so(+0x8adb)[0x7fc073b62adb] (--> /usr/lib64/glusterfs/3.10.12/xlator/performance/write-behind.so(+0xb880)[0x7fc073b65880] (--> /lib64/libglusterfs.so.0(default_lookup+0xbd)[0x7fc082b893bd] (--> /usr/lib64/glusterfs/3.10.12/xlator/performance/quick-read.so(+0x2596)[0x7fc07374c596] ))))) 0-vol0-write-behind: processing queues

This could be because of https://bugzilla.redhat.com/show_bug.cgi?id=1455104 which was fixed in release 3.12.

Do you still see the I/O errors with 3.12 or later?

@gluster-ant
Copy link
Collaborator Author

Time: 20191104T07:50:00
nbalacha at redhat commented:
(In reply to frostyplanet from comment #19)

Created attachment 1487508 [details]
gfapi log for io error in 3.12.14

New sample of gfapi log in version 3.12.14. IO error happended while
creating file

I do not see IO errors in the gfapi log. Please provide debug/trace logs when the issue is seen in 3.12

@stale
Copy link

stale bot commented Oct 8, 2020

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added the wontfix Managed by stale[bot] label Oct 8, 2020
@stale
Copy link

stale bot commented Oct 23, 2020

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

@stale stale bot closed this as completed Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Migrated Type:Bug wontfix Managed by stale[bot]
Projects
None yet
Development

No branches or pull requests

2 participants