Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When several PVCs are created simultaneously, linstor-satelite returns ChildProcessTimeoutException #60

Closed
tnaganawa opened this issue Jan 19, 2021 · 2 comments · Fixed by #61

Comments

@tnaganawa
Copy link
Contributor

Hi,

I set up 3 node kubernetes cluster with piraeus-operator.

When I create PVC one by one, everything works fine, and I can create up to 10 PVCs,
but when I create 10 PVCs at the same time, some linstor-satelites show ErrorReport.log with 'External command timed out' for pvdisplay command.

root@ip-172-31-128-5:/var/log/linstor-satellite# cat ErrorReport-600586FF-39674-000001.log
ERROR REPORT 600586FF-39674-000001

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Satellite
Version:                            1.11.1
Build ID:                           fe95a94d86c66c6c9846a3cf579a1a776f95d3f4
Build time:                         2021-01-13T08:34:55+00:00
Error time:                         2021-01-18 13:54:00
Node:                               ip-172-31-128-5.ap-northeast-1.compute.internal

============================================================

Reported error:
===============

Description:
    Failed to get physical devices for volume group: drbdpool
Cause:
    External command timed out
Additional information:
    External command: pvdisplay --columns -o pv_name -S vg_name=drbdpool --noheadings --nosuffix

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'genericExecutor', Source file 'Commands.java', Line #121

Error message:                      Failed to get physical devices for volume group: drbdpool

When this behavior occurred, pvdisplay command and pvs command actually takes long time to return some output.

root@ip-172-31-128-5:/var/log/linstor-satellite# pvdisplay --columns -o pv_name -S vg_name=drbdpool --noheadings --nosuffix
^C
root@ip-172-31-128-5:/var/log/linstor-satellite# pvs
^C
root@ip-172-31-128-5:/var/log/linstor-satellite# 

and that satelite node becomes OFFLINE and volume will be Uknown status ..

root@piraeus-op-cs-controller-67bccfd9bc-ppzbd:/# linstor --controller 127.0.0.1 v list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node                                              ┊ Resource                                 ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊        State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ ip-172-31-128-5.ap-northeast-1.compute.internal   ┊ pvc-0b9dbca3-e55a-4088-bf72-1675a02641cd ┊ lvm-thick   ┊     0 ┊    1004 ┊ None          ┊           ┊        ┊      Unknown ┊
┊ ip-172-31-128-202.ap-northeast-1.compute.internal ┊ pvc-0bfec7d8-0fbc-42e4-8184-95d5bb800424 ┊ lvm-thick   ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊  2.00 GiB ┊ Unused ┊     UpToDate ┊
┊ ip-172-31-128-5.ap-northeast-1.compute.internal   ┊ pvc-29079ec4-f476-442e-980d-546bfe924e83 ┊ lvm-thick   ┊     0 ┊    1003 ┊ None          ┊           ┊        ┊      Unknown ┊
┊ ip-172-31-128-5.ap-northeast-1.compute.internal   ┊ pvc-40315b55-7dda-4847-9672-61b9da6e9fd7 ┊ lvm-thick   ┊     0 ┊    1007 ┊ None          ┊           ┊        ┊      Unknown ┊
┊ ip-172-31-128-5.ap-northeast-1.compute.internal   ┊ pvc-64f637ba-5d25-443b-a9f0-cbf4529a3c90 ┊ lvm-thick   ┊     0 ┊    1002 ┊ None          ┊           ┊        ┊      Unknown ┊
┊ ip-172-31-128-5.ap-northeast-1.compute.internal   ┊ pvc-7978d086-62ba-4568-87f2-fae081fdb8b3 ┊ lvm-thick   ┊     0 ┊    1005 ┊ None          ┊           ┊        ┊      Unknown ┊
┊ ip-172-31-128-5.ap-northeast-1.compute.internal   ┊ pvc-82b01996-ad41-4a2f-b011-bf663bf65827 ┊ lvm-thick   ┊     0 ┊    1006 ┊ None          ┊           ┊        ┊      Unknown ┊
┊ ip-172-31-128-5.ap-northeast-1.compute.internal   ┊ pvc-c49ffc32-ae18-472c-a5e9-4a2df68fc75d ┊ lvm-thick   ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  2.00 GiB ┊ Unused ┊ Inconsistent ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

root@piraeus-op-cs-controller-67bccfd9bc-ppzbd:/# linstor --controller 127.0.0.1 n list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node                                              ┊ NodeType   ┊ Addresses                   ┊ State   ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ ip-172-31-128-5.ap-northeast-1.compute.internal   ┊ SATELLITE  ┊ 172.31.128.5:3366 (PLAIN)   ┊ OFFLINE ┊
┊ ip-172-31-128-90.ap-northeast-1.compute.internal  ┊ SATELLITE  ┊ 172.31.128.90:3366 (PLAIN)  ┊ Online  ┊
┊ ip-172-31-128-202.ap-northeast-1.compute.internal ┊ SATELLITE  ┊ 172.31.128.202:3366 (PLAIN) ┊ Online  ┊
┊ piraeus-op-cs-controller-67bccfd9bc-ppzbd         ┊ CONTROLLER ┊ 10.47.255.243:3366 (PLAIN)  ┊ Online  ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@piraeus-op-cs-controller-67bccfd9bc-ppzbd:/# 

Since strace command returns 'EMEDIUMTYPE (Wrong medium type)' for drbd device which are currently being created (so the state is secondary),

[root@ip-172-31-128-5 ~]# strace pvs
(snip)
stat("/dev/drbd1002", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1002), ...}) = 0
open("/dev/drbd1002", O_RDONLY|O_DIRECT|O_NOATIME) = -1 EMEDIUMTYPE (Wrong medium type)
open("/dev/drbd1002", O_RDONLY|O_NOATIME) = -1 EMEDIUMTYPE (Wrong medium type)
stat("/dev/drbd1003", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1003), ...}) = 0
open("/dev/drbd1003", O_RDONLY|O_DIRECT|O_NOATIME) = -1 EMEDIUMTYPE (Wrong medium type)
open("/dev/drbd1003", O_RDONLY|O_NOATIME) = -1 EMEDIUMTYPE (Wrong medium type)
stat("/dev/drbd1004", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1004), ...}) = 0
open("/dev/drbd1004", O_RDONLY|O_DIRECT|O_NOATIME) = -1 EMEDIUMTYPE (Wrong medium type)
open("/dev/drbd1004", O_RDONLY|O_NOATIME) = -1 EMEDIUMTYPE (Wrong medium type)
stat("/dev/drbd1005", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1005), ...}) = 0

I guess this issue is the same with this one,
LINBIT/linstor-server#8
LINBIT/linstor-server#108
and setting lvm.conf in ns-node containers,
pvdisplay command works fast and up to 10 parallel PVC creation worked fine.

tnaganawa added a commit to tnaganawa/piraeus that referenced this issue Jan 19, 2021
…ontainer to block linstor-satelite reading /dev/drbd.*, which could be secondary state during device creation.

fixes piraeusdatastore#60

Signed-off-by: Tatsuya Naganawa <tatsuyan201101@gmail.com>
@tnaganawa
Copy link
Contributor Author

I also noticed that centos-based container already has similar entry.
https://github.com/LINBIT/linstor-server/blob/master/Dockerfile.satellite#L97
LINBIT/linstor-server@c31124f

@rck
Copy link
Member

rck commented Jan 20, 2021

could you please update your PR so that it does it like the one in linstor-server? I find it easier to read. also put it on a new line please. Basically just copy what happened in linstor-server.

tnaganawa added a commit to tnaganawa/piraeus that referenced this issue Jan 20, 2021
configure lvm.conf in debian piraeus-server container to block linstor-satelite reading /dev/drbd.*, which could be secondary state during device creation.
fixes piraeusdatastore#60

Signed-off-by: Tatsuya Naganawa <tatsuyan201101@gmail.com>
@rck rck closed this as completed in #61 Jan 20, 2021
rck pushed a commit that referenced this issue Jan 20, 2021
configure lvm.conf in debian piraeus-server container to block linstor-satelite reading /dev/drbd.*, which could be secondary state during device creation.
fixes #60

Signed-off-by: Tatsuya Naganawa <tatsuyan201101@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants