Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lizardfs-uraft failed to start #824

Open
jbarrigafk opened this issue May 13, 2019 · 50 comments
Open

lizardfs-uraft failed to start #824

jbarrigafk opened this issue May 13, 2019 · 50 comments

Comments

@jbarrigafk
Copy link

Hi, I'm working with Lizardfs-uraft 3.13.0 & Centos 7.
I have 3 metadata servers, 8 chunkservers and 2 metaloggers.
I had worked well for a while but now I have a problem with millions of files and chunks on my metadata servers.
systemctl start lizardfs-uraft (master1-sv = working)
systemctl start lizardfs-uraft (master2-sv = error)
systemctl start lizardfs-uraft (master3-sv = error)

my metadata.mfs size:
[root@mfsmaster1-sv ~]# du -hs /data/mfs/metadata.mfs
23G /data/mfs/metadata.mfs

mfsmaster2-sv LOGS

May 10 17:07:17 mfsmaster2-sv systemd: Started LizardFS master server daemon.
May 10 17:07:17 mfsmaster2-sv mfsmaster[22996]: connected to Master
May 10 17:07:17 mfsmaster2-sv systemd: Started LizardFS uraft high availability daemon.
May 10 17:07:17 mfsmaster2-sv lizardfs-uraft: Lizardfs-uraft initialized properly
May 10 17:07:17 mfsmaster2-sv lizardfs-uraft: Node 'mfsmaster1-sv' is now a leader.
May 10 17:07:17 mfsmaster2-sv lizardfs-uraft: Metadata server is alive
May 10 17:10:01 mfsmaster2-sv systemd: Created slice User Slice of root.
May 10 17:10:01 mfsmaster2-sv systemd: Started Session 15008 of user root.
May 10 17:10:01 mfsmaster2-sv systemd: Removed slice User Slice of root.
May 10 17:12:50 mfsmaster2-sv mfsmaster[22996]: metadata downloaded 23689390616B/333.292241s (71.077 MB/s)
May 10 17:12:54 mfsmaster2-sv mfsmaster[22996]: changelog.mfs.1 downloaded 12138576B/0.183249s (66.241 MB/s)
May 10 17:12:54 mfsmaster2-sv mfsmaster[22996]: changelog.mfs.2 downloaded 60904845B/0.930901s (65.426 MB/s)
May 10 17:12:55 mfsmaster2-sv mfsmaster[22996]: sessions downloaded 2166B/0.001474s (1.469 MB/s)
May 10 17:12:55 mfsmaster2-sv mfsmaster[22996]: opened metadata file /var/lib/mfs/metadata.mfs
May 10 17:12:55 mfsmaster2-sv mfsmaster[22996]: loading objects (files,directories,etc.) from the metadata file
May 10 17:13:58 mfsmaster2-sv mfsmaster[22996]: loading names from the metadata file
May 10 17:20:01 mfsmaster2-sv systemd: Created slice User Slice of root.
May 10 17:20:01 mfsmaster2-sv systemd: Started Session 15009 of user root.
May 10 17:20:01 mfsmaster2-sv systemd: Removed slice User Slice of root.
May 10 17:21:05 mfsmaster2-sv mfsmaster[22996]: loading deletion timestamps from the metadata file
May 10 17:21:05 mfsmaster2-sv mfsmaster[22996]: loading extra attributes (xattr) from the metadata file
May 10 17:21:05 mfsmaster2-sv mfsmaster[22996]: loading access control lists from the metadata file
May 10 17:21:05 mfsmaster2-sv mfsmaster[22996]: loading quota entries from the metadata file
May 10 17:21:05 mfsmaster2-sv mfsmaster[22996]: loading file locks from the metadata file
May 10 17:21:05 mfsmaster2-sv mfsmaster[22996]: loading chunks data from the metadata file
May 10 17:22:23 mfsmaster2-sv mfsmaster[22996]: checking filesystem consistency of the metadata file
May 10 17:22:29 mfsmaster2-sv mfsmaster[22996]: connecting files and chunks
May 10 17:27:51 mfsmaster2-sv mfsmaster[22996]: calculating checksum of the metadata
May 10 17:28:17 mfsmaster2-sv mfsmaster[22996]: metadata file /var/lib/mfs/metadata.mfs read (196985237 inodes including 30975314 directory inodes and 136297216 file inodes, 136301197 chunks)
May 10 17:28:17 mfsmaster2-sv mfsmaster[22996]: running in shadow mode - applying changelogs from /var/lib/mfs
May 10 17:28:18 mfsmaster2-sv mfsmaster[22996]: /var/lib/mfs/changelog.mfs.1: 220353 changes applied (1254452902 to 1254673254), 0 skipped
May 10 17:28:18 mfsmaster2-sv mfsmaster[22996]: /var/lib/mfs/changelog.mfs: 255 changes applied (1254673255 to 1254673509), 269484 skipped
May 10 17:29:24 mfsmaster2-sv systemd: lizardfs-ha-master.service: main process exited, code=killed, status=27/PROF
May 10 17:29:24 mfsmaster2-sv mfsmaster: 05/10/19 17:29:24.278 [warning] [39131:39131] : can't find process to terminate
May 10 17:29:24 mfsmaster2-sv mfsmaster: can't find process to terminate
May 10 17:29:24 mfsmaster2-sv systemd: Unit lizardfs-ha-master.service entered failed state.
May 10 17:29:24 mfsmaster2-sv systemd: lizardfs-ha-master.service failed.
May 10 17:29:24 mfsmaster2-sv lizardfs-uraft: Metadata server is dead
May 10 17:29:24 mfsmaster2-sv lizardfs-uraft: LizardFS uraft helper script
May 10 17:29:24 mfsmaster2-sv lizardfs-uraft: Available commands:
May 10 17:29:24 mfsmaster2-sv lizardfs-uraft: isalive
May 10 17:29:24 mfsmaster2-sv lizardfs-uraft: metadata-version
May 10 17:29:24 mfsmaster2-sv lizardfs-uraft: quick-stop
May 10 17:29:24 mfsmaster2-sv lizardfs-uraft: promote
May 10 17:29:24 mfsmaster2-sv lizardfs-uraft: demote
May 10 17:29:24 mfsmaster2-sv lizardfs-uraft: assign-ip
May 10 17:29:24 mfsmaster2-sv lizardfs-uraft: drop-ip

Only 1 of the metadata servers works so far, I can not raise the lizardfs-uraft service in my master2-sv or mfsmaster3-sv, both present the same error:

May 10 17:29:24 mfsmaster2-sv systemd: lizardfs-ha-master.service: main process exited, code=killed, status=27/PROF
May 10 17:29:24 mfsmaster2-sv mfsmaster: 05/10/19 17:29:24.278 [warning] [39131:39131] : can't find process to terminate
May 10 17:29:24 mfsmaster2-sv mfsmaster: can't find process to terminate

we currently handle around 130 million files and chunks, maybe that's the problem? I have played with different mfsmaster parameters, but none of them works, I have increased the values ​​of CHUNKS_LOOP_MAX_CPS and METADATA_CHECKSUM_RECALCULATION_SPEED without good results.

mfsmaster1 config.

 PERSONALITY = ha-cluster-managed
 ADMIN_PASSWORD = SOMEPASS
 WORKING_USER = mfs
 WORKING_GROUP = mfs
 DATA_PATH = /data/mfs
 MATOML_LISTEN_HOST = *
 MATOML_LISTEN_PORT = 9419
 MATOCS_LISTEN_HOST = *
 MATOCS_LISTEN_PORT = 9420
 MATOCL_LISTEN_HOST = *
 MATOCL_LISTEN_PORT = 9421
 MATOTS_LISTEN_HOST = *
 MATOTS_LISTEN_PORT = 9424
 NO_ATIME = 0
 MASTER_HOST = mfsmaster

mfsmaster2 config

 PERSONALITY = ha-cluster-managed
 ADMIN_PASSWORD = SOMEPASS
 WORKING_USER = mfs
 WORKING_GROUP = mfs
 DATA_PATH = /var/lib/mfs
 MATOML_LISTEN_HOST = *
 MATOML_LISTEN_PORT = 9419
 MATOCS_LISTEN_HOST = *
 MATOCS_LISTEN_PORT = 9420
 MATOCL_LISTEN_HOST = *
 MATOCL_LISTEN_PORT = 9421
 MATOTS_LISTEN_HOST = *
 MATOTS_LISTEN_PORT = 9424
 NO_ATIME = 0
 MASTER_HOST = mfsmaster

mfsmaster3 config

 PERSONALITY = ha-cluster-managed
 ADMIN_PASSWORD = SOMEPASS
 WORKING_USER = mfs
 WORKING_GROUP = mfs
 DATA_PATH = /data/mfs
 MATOML_LISTEN_HOST = *
 MATOML_LISTEN_PORT = 9419
 MATOCS_LISTEN_HOST = *
 MATOCS_LISTEN_PORT = 9420
 MATOCL_LISTEN_HOST = *
 MATOCL_LISTEN_PORT = 9421
 MATOTS_LISTEN_HOST = *
 MATOTS_LISTEN_PORT = 9424
 NO_ATIME = 0
 MASTER_HOST = mfsmaster

lizardfs-uraft.cfg (master2-sv)

 URAFT_NODE_ADDRESS = mfsmaster2-sv
 URAFT_NODE_ADDRESS = mfsmaster1-sv
 URAFT_NODE_ADDRESS = mfsmaster3-sv
 URAFT_ID = 0
 URAFT_FLOATING_IP      = 10.40.40.140
 URAFT_FLOATING_NETMASK = 255.255.255.0
 URAFT_FLOATING_IFACE   = bond0

lizardfs-uraft.cfg (master1-sv)

URAFT_NODE_ADDRESS = mfsmaster2-sv
 URAFT_NODE_ADDRESS = mfsmaster1-sv
 URAFT_NODE_ADDRESS = mfsmaster3-sv
 URAFT_ID = 1
 URAFT_FLOATING_IP      = 10.40.40.140
 URAFT_FLOATING_NETMASK = 255.255.255.0
 URAFT_FLOATING_IFACE   = bond0

lizardfs-uraft.cfg (master3-sv)

 URAFT_NODE_ADDRESS = mfsmaster2-sv
 URAFT_NODE_ADDRESS = mfsmaster1-sv
 URAFT_NODE_ADDRESS = mfsmaster3-sv
 URAFT_ID = 2
 URAFT_FLOATING_IP      = 10.40.40.140
 URAFT_FLOATING_NETMASK = 255.255.255.0
 URAFT_FLOATING_IFACE   = bond0

Any suggest???
Thanks

@blink69
Copy link
Contributor

blink69 commented May 14, 2019 via email

@jbarrigafk
Copy link
Author

jbarrigafk commented May 14, 2019

Hello, thanks for answering
My cluster at this time uses 61 GB memory RAM.
mfsmaster1-sv = 94 GB, SSD for DATA_PATH
mfsmaster2-sv = 256 GB, SSD for DATA_PATH
mfsmaster3-sv = 94 GB, SAS for DATA_PATH

@jbarrigafk
Copy link
Author

yesterday I tried these commands that I saw in another old post

trap -- '' SIGPROF ; trap -- '' SIGVTALRM
mfsmaster -c /etc/mfs/mfsmaster.cfg -o ha-cluster-managed -o initial-personality="shadow" start

in this way the mfsmaster process never died, at least not in the first 20 minutes as it does with systemctl and the PROF signal.
However, I do not correctly load the metadata.mfs file, the logs are the following:

May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: error reading metadata (edge)
May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: can't load downloaded metadata and changelogs: error reading metadata (structure)
May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: the dumping process finished without producing output
May 13 17:42:54 mfsmaster3-sv mfsmaster[23769]: connecting to Master
May 13 17:42:54 mfsmaster3-sv mfsmaster[23769]: connected to Master
May 13 17:42:55 mfsmaster3-sv mfsmaster[23769]: some changes lost: [1300564723-1300883752], download metadata again
May 13 17:42:55 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 17:42:55 mfsmaster3-sv mfsmaster[23769]: Master temporarily refused to produce a new metadata image
May 13 17:43:03 mfsmaster3-sv mfsmaster[23769]: some changes lost: [1300564723-1300885744], download metadata again
May 13 18:46:14 mfsmaster3-sv mfsmaster[23769]: not all section has been read - file corrupted
May 13 18:46:14 mfsmaster3-sv mfsmaster[23769]: can't load downloaded metadata and changelogs: error reading metadata (structure)
May 13 18:46:14 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: error reading metadata (edge)
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: can't load downloaded metadata and changelogs: error reading metadata (structure)
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: renaming metadata.mfs.tmp to metadata.mfs failed: No such file or directory
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: renaming metadata.mfs.tmp to metadata.mfs1557805565 failed: No such file or directory
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: trying to create emergency metadata file in foreground
May 13 22:47:02 mfsmaster3-sv kernel: mfsmaster[23769]: segfault at 40 ip 00000000004b373f sp 00007ffc7f7d7ed0 error 4 in mfsmaster[400000+1c1000]

I clearly see that I have corrupt files, what I have in mind is to completely erase all the contents of / var / lib / mfs in my master3-sv and re-execute the commands I mentioned previously.
My question here is:
when executing the command
mfsmaster -c /etc/mfs/mfsmaster.cfg -or ha-cluster-managed -o initial-personality = "shadow" start

Will I still have HA functionalities with lizardfs-uraft?
what I would like is that the mfsmaster process does not die as it normally started, with systemctl start lizardfs-uraft

@jbarrigafk
Copy link
Author

In a testing environment, do the following:
master1 - systemctl start lizardfs-uraft (primary)
master2 - systemctl start lizardfs-uraft (slave)
master3 - systemctl start lizardfs-uraft (slave)

[root@master2 ~]# ps aux |grep mfs
mfs       7244  2.0  3.5 205868 62620 ?        S<   12:26   0:00 /usr/sbin/mfsmaster -o ha-cluster-managed -o initial-personality=shadow start
mfs       7245  1.1  0.1  32512  2336 ?        Ss   12:26   0:00 /usr/sbin/lizardfs-uraft
root      7313  0.0  0.0 112728   940 pts/0    S+   12:26   0:00 grep --color=auto mfs
[root@master2 ~]# kill -9 7244
[root@master2 ~]# ps aux |grep mfs
mfs       7245  1.1  0.1  32512  2336 ?        Ss   12:26   0:00 /usr/sbin/lizardfs-uraft
root      7369  0.0  0.0 112728   940 pts/0    S+   12:26   0:00 grep --color=auto mfs
[root@master2 ~]# mfsmaster -o ha-cluster-managed -o initial-personality=shadow start
05/15/19 12:26:30.739 [warning] [7372:7372] : no charts data file /var/lib/mfs2/stats.mfs - initializing empty charts

[root@master2 ~]# tail -f /var/log/messages
May 15 12:26:30 master2 mfsmaster[7372]: loading quota entries from the metadata file
May 15 12:26:30 master2 mfsmaster[7372]: loading file locks from the metadata file
May 15 12:26:30 master2 mfsmaster[7372]: loading chunks data from the metadata file
May 15 12:26:30 master2 mfsmaster[7372]: checking filesystem consistency of the metadata file
May 15 12:26:30 master2 mfsmaster[7372]: connecting files and chunks
May 15 12:26:30 master2 mfsmaster[7372]: calculating checksum of the metadata
May 15 12:26:30 master2 mfsmaster[7372]: metadata file /var/lib/mfs2/metadata.mfs read (1 inodes including 1 directory inodes and 0 file inodes, 0 chunks)
May 15 12:26:30 master2 mfsmaster[7372]: running in shadow mode - applying changelogs from /var/lib/mfs2
May 15 12:26:30 master2 mfsmaster[7372]: all needed changelogs applied successfully
May 15 12:26:30 master2 mfsmaster[7372]: synced at version = 0
May 15 12:27:00 master2 mfsmaster[7372]: sessions downloaded 10B/0.000382s (0.026 MB/s)
[root@master3 ~]# ps aux |grep mfs
nobody   23044  0.2  0.2 228104 19360 ?        Ss   12:06   0:01 python2 /usr/sbin/lizardfs-cgiserver -H 0.0.0.0 -P 9425 -R /usr/share/mfscgi
mfs      25485  0.2  1.4 251612 108208 ?       S<   12:13   0:00 /usr/sbin/mfsmaster -o ha-cluster-managed -o initial-personality=shadow start
mfs      25486  0.7  0.0  32512  2336 ?        Ss   12:13   0:01 /usr/sbin/lizardfs-uraft
root     26914  0.0  0.0 112728   940 pts/0    S+   12:15   0:00 grep --color=auto mfs
[root@master3 ~]# mfsmaster -o ha-cluster-managed -o initial-personality=shadow start
May 15 12:21:57 master3 mfsmaster[27489]: metadata downloaded 296B/0.000661s (0.448 MB/s)
May 15 12:21:57 master3 mfsmaster[27489]: changelog.mfs.1 downloaded 0B/0.000001s (0.000 MB/s)
May 15 12:21:57 master3 mfsmaster[27489]: changelog.mfs.2 downloaded 0B/0.000001s (0.000 MB/s)
May 15 12:21:57 master3 mfsmaster[27489]: sessions downloaded 10B/0.000309s (0.032 MB/s)
May 15 12:21:57 master3 mfsmaster[27489]: opened metadata file /var/lib/mfs2/metadata.mfs
May 15 12:21:57 master3 mfsmaster[27489]: loading objects (files,directories,etc.) from the metadata file
May 15 12:21:57 master3 mfsmaster[27489]: loading names from the metadata file
May 15 12:21:57 master3 mfsmaster[27489]: loading deletion timestamps from the metadata file
May 15 12:21:57 master3 mfsmaster[27489]: loading extra attributes (xattr) from the metadata file
May 15 12:21:57 master3 mfsmaster[27489]: loading access control lists from the metadata file
May 15 12:21:57 master3 mfsmaster[27489]: loading quota entries from the metadata file
May 15 12:21:57 master3 mfsmaster[27489]: loading file locks from the metadata file
May 15 12:21:57 master3 mfsmaster[27489]: loading chunks data from the metadata file
May 15 12:21:57 master3 mfsmaster[27489]: checking filesystem consistency of the metadata file
May 15 12:21:57 master3 mfsmaster[27489]: connecting files and chunks
May 15 12:21:57 master3 mfsmaster[27489]: calculating checksum of the metadata
May 15 12:21:57 master3 mfsmaster[27489]: metadata file /var/lib/mfs2/metadata.mfs read (1 inodes including 1 directory inodes and 0 file inodes, 0 chunks)
May 15 12:21:57 master3 mfsmaster[27489]: running in shadow mode - applying changelogs from /var/lib/mfs2
May 15 12:21:57 master3 mfsmaster[27489]: all needed changelogs applied successfully
May 15 12:21:57 master3 mfsmaster[27489]: synced at version = 0
May 15 12:22:00 master3 mfsmaster[27489]: sessions downloaded 10B/0.000383s (0.026 MB/s)

everything is synchronized correctly, but when the lizardfs service is stopped in the primary master (master1), the rest fails, none shadow is promoted as primary. So this is not a solution

@jbarrigafk
Copy link
Author

jbarrigafk commented May 15, 2019

my master cpu usage is always 100% when lizardfs-uraft starts, i think PROF signal appears by a long cpu usage. How can i handle cpu usage then? I change default value of CHUNKS_LOOP_MAX_CPU = 60 to 30 and nothing changes

@jbarrigafk
Copy link
Author

jbarrigafk commented May 15, 2019

yum install cplimit
cpulimit -p PID -l 75

I limited the use of cpu to 75% for the mfsmaster process and even after about 20 minutes, it died by


May 15 16:42:59 mfsmaster3-sv systemd: lizardfs-ha-master.service: main process exited, code=killed, status=27/PROF
May 15 16:42:59 mfsmaster3-sv mfsmaster: can't find process to terminate
May 15 16:42:59 mfsmaster3-sv mfsmaster: 05/15/19 16:42:59.496 [warning] [20430:20430] : can't find process to terminate
May 15 16:42:59 mfsmaster3-sv systemd: Unit lizardfs-ha-master.service entered failed state.
May 15 16:42:59 mfsmaster3-sv systemd: lizardfs-ha-master.service failed.
May 15 16:42:59 mfsmaster3-sv lizardfs-uraft: Metadata server is dead

@pbeza
Copy link
Member

pbeza commented May 16, 2019

However, I do not correctly load the metadata.mfs file, the logs are the following:

May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: error reading metadata (edge)
May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: can't load downloaded metadata and changelogs: error reading metadata (structure)
May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: the dumping process finished without producing output
May 13 17:42:54 mfsmaster3-sv mfsmaster[23769]: connecting to Master
May 13 17:42:54 mfsmaster3-sv mfsmaster[23769]: connected to Master
May 13 17:42:55 mfsmaster3-sv mfsmaster[23769]: some changes lost: [1300564723-1300883752], download metadata again
May 13 17:42:55 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 17:42:55 mfsmaster3-sv mfsmaster[23769]: Master temporarily refused to produce a new metadata image
May 13 17:43:03 mfsmaster3-sv mfsmaster[23769]: some changes lost: [1300564723-1300885744], download metadata again
May 13 18:46:14 mfsmaster3-sv mfsmaster[23769]: not all section has been read - file corrupted
May 13 18:46:14 mfsmaster3-sv mfsmaster[23769]: can't load downloaded metadata and changelogs: error reading metadata (structure)
May 13 18:46:14 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: error reading metadata (edge)
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: can't load downloaded metadata and changelogs: error reading metadata (structure)
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: renaming metadata.mfs.tmp to metadata.mfs failed: No such file or directory
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: renaming metadata.mfs.tmp to metadata.mfs1557805565 failed: No such file or directory
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: trying to create emergency metadata file in foreground
May 13 22:47:02 mfsmaster3-sv kernel: mfsmaster[23769]: segfault at 40 ip 00000000004b373f sp 00007ffc7f7d7ed0 error 4 in mfsmaster[400000+1c1000]

I can see segfault in your log. 👎 Can you please check whether core dump was created? If not – try to enable core dumps and reproduce segfault once again.

After you get core dump install lizardfs-dbg package and run gdb mfsmaster /var/crash/_usr_sbin_mfsmaster.1.crash. bt (backtrace) will hopefully point out the problem.

@jbarrigafk
Copy link
Author

However, I do not correctly load the metadata.mfs file, the logs are the following:

May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: error reading metadata (edge)
May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: can't load downloaded metadata and changelogs: error reading metadata (structure)
May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 17:42:53 mfsmaster3-sv mfsmaster[23769]: the dumping process finished without producing output
May 13 17:42:54 mfsmaster3-sv mfsmaster[23769]: connecting to Master
May 13 17:42:54 mfsmaster3-sv mfsmaster[23769]: connected to Master
May 13 17:42:55 mfsmaster3-sv mfsmaster[23769]: some changes lost: [1300564723-1300883752], download metadata again
May 13 17:42:55 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 17:42:55 mfsmaster3-sv mfsmaster[23769]: Master temporarily refused to produce a new metadata image
May 13 17:43:03 mfsmaster3-sv mfsmaster[23769]: some changes lost: [1300564723-1300885744], download metadata again
May 13 18:46:14 mfsmaster3-sv mfsmaster[23769]: not all section has been read - file corrupted
May 13 18:46:14 mfsmaster3-sv mfsmaster[23769]: can't load downloaded metadata and changelogs: error reading metadata (structure)
May 13 18:46:14 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: error reading metadata (edge)
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: can't load downloaded metadata and changelogs: error reading metadata (structure)
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: Waiting for master to produce up-to-date metadata image
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: renaming metadata.mfs.tmp to metadata.mfs failed: No such file or directory
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: renaming metadata.mfs.tmp to metadata.mfs1557805565 failed: No such file or directory
May 13 22:46:05 mfsmaster3-sv mfsmaster[23769]: trying to create emergency metadata file in foreground
May 13 22:47:02 mfsmaster3-sv kernel: mfsmaster[23769]: segfault at 40 ip 00000000004b373f sp 00007ffc7f7d7ed0 error 4 in mfsmaster[400000+1c1000]

I can see segfault in your log. Can you please check whether core dump was created? If not – try to enable core dumps and reproduce segfault once again.

After you get core dump install lizardfs-dbg package and run gdb mfsmaster /var/crash/_usr_sbin_mfsmaster.1.crash. bt (backtrace) will hopefully point out the problem.

thanks for your answer, segfault appeared when executing lizardfs with the following command:

mfsmaster -c /etc/mfs/mfsmaster.cfg -o ha-cluster-managed -o initial-personality="shadow" start
but in this way the HA of the lizardfs cluster stops working correctly, I need to solve the problem I have with systemctl start lizardfs-uraft, in which, the logs are the following

May 17 16:43:06 mfsmaster3-sv systemd: Starting LizardFS master server daemon...
May 17 16:43:06 mfsmaster3-sv mfsmaster: set gid to 989
May 17 16:43:06 mfsmaster3-sv mfsmaster: set uid to 994
May 17 16:43:06 mfsmaster3-sv mfsmaster: changed working directory to: /data/mfs
May 17 16:43:06 mfsmaster3-sv mfsmaster: lockfile /data/mfs/.mfsmaster.lock created and locked
May 17 16:43:06 mfsmaster3-sv mfsmaster: sessions have been loaded
May 17 16:43:06 mfsmaster3-sv mfsmaster: initialized sessions from file /data/mfs/sessions.mfs
May 17 16:43:06 mfsmaster3-sv mfsmaster: initialized exports from file /etc/mfs/mfsexports.cfg
May 17 16:43:06 mfsmaster3-sv mfsmaster: initialized topology from file /etc/mfs/mfstopology.cfg
May 17 16:43:06 mfsmaster3-sv mfsmaster: initialized goal definitions from file /etc/mfs/mfsgoals.cfg
May 17 16:43:06 mfsmaster3-sv mfsmaster: loaded charts data file from /data/mfs/stats.mfs
May 17 16:43:06 mfsmaster3-sv mfsmaster: connecting to Master
May 17 16:43:06 mfsmaster3-sv mfsmaster: master <-> metaloggers module: listen on *:9419
May 17 16:43:06 mfsmaster3-sv mfsmaster: master <-> chunkservers module: listen on *:9420
May 17 16:43:06 mfsmaster3-sv mfsmaster: master <-> tapeservers module: listen on (*:9424)
May 17 16:43:06 mfsmaster3-sv mfsmaster: main master server module: listen on *:9421
May 17 16:43:06 mfsmaster3-sv mfsmaster: open files limit: 100000
May 17 16:43:06 mfsmaster3-sv mfsmaster: mfsmaster daemon initialized properly
May 17 16:43:06 mfsmaster3-sv systemd: Started LizardFS master server daemon.
May 17 16:43:06 mfsmaster3-sv mfsmaster[25288]: connected to Master
May 17 16:43:06 mfsmaster3-sv systemd: Started LizardFS uraft high availability daemon.
May 17 16:43:06 mfsmaster3-sv lizardfs-uraft: Lizardfs-uraft initialized properly
May 17 16:43:06 mfsmaster3-sv lizardfs-uraft: Node 'mfsmaster1-sv' is now a leader.
May 17 16:43:06 mfsmaster3-sv lizardfs-uraft: Metadata server is alive
May 17 16:43:56 mfsmaster3-sv systemd-logind: New session 784 of user jbarriga.
May 17 16:43:56 mfsmaster3-sv systemd: Started Session 784 of user jbarriga.
May 17 16:44:01 mfsmaster3-sv su: (to root) jbarriga on pts/1
May 17 16:48:22 mfsmaster3-sv mfsmaster[25288]: metadata downloaded 24798735237B/316.049143s (78.465 MB/s)
May 17 16:48:22 mfsmaster3-sv mfsmaster[25288]: changelog.mfs.1 downloaded 22814654B/0.287793s (79.275 MB/s)
May 17 16:48:23 mfsmaster3-sv mfsmaster[25288]: changelog.mfs.2 downloaded 16915265B/0.269178s (62.840 MB/s)
May 17 16:48:23 mfsmaster3-sv mfsmaster[25288]: sessions downloaded 2166B/0.002605s (0.831 MB/s)
May 17 16:48:23 mfsmaster3-sv mfsmaster[25288]: opened metadata file /data/mfs/metadata.mfs
May 17 16:48:23 mfsmaster3-sv mfsmaster[25288]: loading objects (files,directories,etc.) from the metadata file
May 17 16:49:49 mfsmaster3-sv mfsmaster[25288]: loading names from the metadata file
May 17 16:50:01 mfsmaster3-sv systemd: Created slice User Slice of root.
May 17 16:50:01 mfsmaster3-sv systemd: Started Session 785 of user root.
May 17 16:50:01 mfsmaster3-sv systemd: Removed slice User Slice of root.
May 17 16:57:50 mfsmaster3-sv mfsmaster[25288]: loading deletion timestamps from the metadata file
May 17 16:57:50 mfsmaster3-sv mfsmaster[25288]: loading extra attributes (xattr) from the metadata file
May 17 16:57:50 mfsmaster3-sv mfsmaster[25288]: loading access control lists from the metadata file
May 17 16:57:50 mfsmaster3-sv mfsmaster[25288]: loading quota entries from the metadata file
May 17 16:57:50 mfsmaster3-sv mfsmaster[25288]: loading file locks from the metadata file
May 17 16:57:50 mfsmaster3-sv mfsmaster[25288]: loading chunks data from the metadata file
May 17 16:59:36 mfsmaster3-sv mfsmaster[25288]: checking filesystem consistency of the metadata file
May 17 16:59:43 mfsmaster3-sv mfsmaster[25288]: connecting files and chunks
May 17 17:00:01 mfsmaster3-sv systemd: Created slice User Slice of root.
May 17 17:00:01 mfsmaster3-sv systemd: Started Session 786 of user root.
May 17 17:00:01 mfsmaster3-sv systemd: Removed slice User Slice of root.
May 17 17:01:01 mfsmaster3-sv systemd: Created slice User Slice of root.
May 17 17:01:01 mfsmaster3-sv systemd: Started Session 787 of user root.
May 17 17:01:01 mfsmaster3-sv systemd: Removed slice User Slice of root.
May 17 17:04:26 mfsmaster3-sv mfsmaster[25288]: calculating checksum of the metadata
May 17 17:04:54 mfsmaster3-sv mfsmaster[25288]: metadata file /data/mfs/metadata.mfs read (205494244 inodes including 32020337 directory inodes and 141293409 file inodes, 141297761 chunks)
May 17 17:04:54 mfsmaster3-sv mfsmaster[25288]: running in shadow mode - applying changelogs from /data/mfs
May 17 17:04:55 mfsmaster3-sv mfsmaster[25288]: /data/mfs/changelog.mfs.1: 378283 changes applied (1359114939 to 1359493221), 0 skipped
May 17 17:04:55 mfsmaster3-sv mfsmaster[25288]: /data/mfs/changelog.mfs: 70 changes applied (1359493222 to 1359493291), 330105 skipped
May 17 17:05:02 mfsmaster3-sv systemd: **lizardfs-ha-master.service: main process exited, code=killed, status=27/PROF**
May 17 17:05:02 mfsmaster3-sv mfsmaster: 05/17/19 17:05:02.618 [warning] [10426:10426] : can't find process to terminate
May 17 17:05:02 mfsmaster3-sv mfsmaster: can't find process to terminate
May 17 17:05:02 mfsmaster3-sv systemd: Unit lizardfs-ha-master.service entered failed state.
May 17 17:05:02 mfsmaster3-sv systemd: lizardfs-ha-master.service failed.
May 17 17:05:02 mfsmaster3-sv lizardfs-uraft: Metadata server is dead
May 17 17:05:02 mfsmaster3-sv lizardfs-uraft: LizardFS uraft helper script
May 17 17:05:02 mfsmaster3-sv lizardfs-uraft: Available commands:
May 17 17:05:02 mfsmaster3-sv lizardfs-uraft: isalive
May 17 17:05:02 mfsmaster3-sv lizardfs-uraft: metadata-version
May 17 17:05:02 mfsmaster3-sv lizardfs-uraft: quick-stop
May 17 17:05:02 mfsmaster3-sv lizardfs-uraft: promote
May 17 17:05:02 mfsmaster3-sv lizardfs-uraft: demote
May 17 17:05:02 mfsmaster3-sv lizardfs-uraft: assign-ip
May 17 17:05:02 mfsmaster3-sv lizardfs-uraft: drop-ip

I've tried several things, I've tried servers with higher CPU speeds and the problems are the same.
my metadata version is 1359896017 and it keeps growing very fast only in my master1-sv
I do not understand how it is still working in my master1 and in the other 2 fails...

I do not know where else I should look

@jbarrigafk
Copy link
Author

we decided to remove the uraft function and only work with master / shadow, even so, only one of the metadata servers is the one that starts correctly, the other presents the same problems.
This means that there is no problem with HA uraft functions

@pbeza
Copy link
Member

pbeza commented May 25, 2019

It would be nice it you were able to reliably reproduce the problem on some minimal working example (MWE), so that I was able to reproduce the problem (see this guideline).

If you are not able to create MWE, then it would be nice if you could provide core dump of the segfault as I mentioned in my last post.

@jbarrigafk
Copy link
Author

jbarrigafk commented Jun 11, 2019

an apology for a delayed answer. I tried to play the segfault in another test environment using the same metadata.mfs of my productive masters.
No segfault appears in the test environment. I have the theory that occurs in my productive environment due to the large amounts of writing and reading files in lizardfs.

Enable coredump in my test masters, but, only the coredump is generated for some signals, example: signal 11

In my case, the signal that ends the process of lizardfs is 27 SIGPROF which is of type "term"

  • Term Default action is to terminate the process.
  • Ign Default action is to ignore the signal.
  • Core Default action is to finish the process and dump core (see core (5)).

etc

signal 27 SIGPROF does not generate any coredump that can analyze

however, install gdb and the debug package of lizardfs for centos 7
yum localinstall lizardfs-debuginfo-3.13.0-0el7.x86_64.rpm

I keep investigating how to use gdb to try to debugge the lizardfs process

In my testing environment, the sigprof signal continues to appear and the lizardfs process ends.
metadata.mfs = 24GB

@kaitoan2000
Copy link

Any update for this case @jbarrigafk ?

@jbarrigafk
Copy link
Author

@kaitoan2000
At this time I could not give this problem time due to the workload I have, but I was reading and I think that some value should be increased somewhere, either in systemd, in the kernel or in some lizardfs file.
In moosefs I found the file cpuusage.c, which manages the variable of ITIMER_PROF but I still can not find all the code in .C of lizardfs to verify those parameters.
As soon as I have time I will continue looking for how to solve this problem

@kaitoan2000
Copy link

kaitoan2000 commented Jun 21, 2019

@kaitoan2000
At this time I could not give this problem time due to the workload I have, but I was reading and I think that some value should be increased somewhere, either in systemd, in the kernel or in some lizardfs file.
In moosefs I found the file cpuusage.c, which manages the variable of ITIMER_PROF but I still can not find all the code in .C of lizardfs to verify those parameters.
As soon as I have time I will continue looking for how to solve this problem

When this option is set (equals 1) master will try to recover metadata
from changelog when it is being started after a crash; otherwise it will
refuse to start and 'mfsmetarestore' should be used to recover the metadata.
(Default: 0)
AUTO_RECOVERY = 0
Could I change this option to 1 ?
I have 3 master nodes but there are 2 node have float IP, another is run with elector mode (URAFT_ELECTOR_MODE = 1 )

@Zorlin
Copy link

Zorlin commented Jun 21, 2019

You should be able to change it to 1, but as I understand it it's deliberately 0, because when a crash occurs you probably want to look at it yourself rather than letting it be handled automatically.

@kaitoan2000
Copy link

we decided to remove the uraft function and only work with master / shadow, even so, only one of the metadata servers is the one that starts correctly, the other presents the same problems.
This means that there is no problem with HA uraft functions

@jbarrigafk Please give me a way to remove uraft HA, It's terrible with me now.
Alot of downtime :(
Could I reach you via mail or telegram, bro?

@jbarrigafk
Copy link
Author

@kaitoan2000
if it is possible to stop all the writing and reading that you have in your cluster of lizardfs through the clients.
After that, you should stop the URAFT service on all your metadata servers, at that time, the master server will begin to show logs of Lock to maintain or ensure the integrity of the data, so you should wait for the services to stop correctly (do not kill any process, since the information can be corrupted)

and maybe you should do a tar.gz of your metadata from /var/lib/mfs

Choose your last active metadata master from URAFT and change this values:

mfsmaster.cfg
Master

PERSONALITY = master
# ADMIN_PASSWORD = Somepassw
# MASTER_HOST = mfsmaster

Here, choose any of both last metadata servers or both

Shadow

PERSONALITY = shadow
# ADMIN_PASSWORD = Somepassw
 MASTER_HOST = mfsmaster

chunkservers

mfschunkserver.cfg
Chunkservers
MASTER_HOST = mfsmaster

metaloggers

mfsmetalogger.cfg
MASTER_HOST = mfsmaster

Just be sure all your cluster knows mfsmaster metadata master IP (/etc/hosts resolution)

systemctl start lizardfs-master on master and after shadow

@jbarrigafk
Copy link
Author

trap -- '' SIGPROF ; trap -- '' SIGVTALRM;

/usr/sbin/mfsmaster -c /etc/mfs/mfsmaster.cfg -d

Good, back to this topic.
I keep trying to make lizardfs Master / shadow work before continuing to use URAFT, the way I have been able to make it work in a test scenario is as follows:
(i cut some records)

`trap -- '' SIGPROF ; trap -- '' SIGVTALRM; 
/usr/sbin/mfsmaster -c /etc/mfs/mfsmaster.cfg -d

07/18/19 16:26:06.617 [info] [8729:8729]  : loading chunks data from the metadata file
mfsmaster [8729]: loading chunks data from the metadata file 
07/18/19 16:26:06.618 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 16:26:06.618 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 16:26:06.618 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 16:26:06.618 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 16:26:06.618 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 16:26:06.618 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 16:26:06.618 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 16:26:06.618 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 16:26:06.618 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 16:26:06.618 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk

.
.
.
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.067 [debug] [8729:8729] : master.fs.checksum.changing_not_recalculated_chunk
07/18/19 19:49:23.077 [info] [8729:8729] : calculating checksum of the metadata
mfsmaster[8729]: calculating checksum of the metadata
07/18/19 19:49:58.160 [info] [8729:8729] : metadata file /var/lib/mfs/metadata.mfs read (208740527 inodes including 32467844 directory inodes and 143185719 file inodes, 143190092 chunks)
mfsmaster[8729]: metadata file /var/lib/mfs/metadata.mfs read (208740527 inodes including 32467844 directory inodes and 143185719 file inodes, 143190092 chunks)
07/18/19 19:49:58.160 [info] [8729:8729] : running in shadow mode - applying changelogs from /var/lib/mfs
mfsmaster[8729]: running in shadow mode - applying changelogs from /var/lib/mfs
07/18/19 19:53:12.371 [debug] [8729:8729] : master.fs.stored
07/18/19 19:53:12.371 [info] [8729:8729] : all needed changelogs applied successfully
mfsmaster[8729]: all needed changelogs applied successfully
07/18/19 19:53:12.371 [info] [8729:8729] : synced at version = 1396145581
mfsmaster[8729]: synced at version = 1396145581
07/18/19 19:53:12.371 [info] [8729:8729] : connection was reset by Master
mfsmaster[8729]: connection was reset by Master
07/18/19 19:53:13.023 [info] [8729:8729] : connecting to Master
mfsmaster[8729]: connecting to Master
07/18/19 19:53:13.024 [info] [8729:8729] : connected to Master
mfsmaster[8729]: connected to Master
07/18/19 19:54:00.071 [info] [8729:8729] : sessions downloaded 10B/0.012636s (0.001 MB/s)
mfsmaster[8729]: sessions downloaded 10B/0.012636s (0.001 MB/s)
07/18/19 19:55:00.023 [info] [8729:8729] : sessions downloaded 10B/0.004217s (0.002 MB/s)
mfsmaster[8729]: sessions downloaded 10B/0.004217s (0.002 MB/s)
07/18/19 19:56:00.028 [info] [8729:8729] : sessions downloaded 10B/0.005803s (0.002 MB/s)
mfsmaster[8729]: sessions downloaded 10B/0.005803s (0.002 MB/s)
07/18/19 19:57:00.025 [info] [8729:8729] : sessions downloaded 10B/0.006056s (0.002 MB/s)
mfsmaster[8729]: sessions downloaded 10B/0.006056s (0.002 MB/s)
07/18/19 19:58:00.055 [info] [8729:8729] : sessions downloaded 10B/0.006942s (0.001 MB/s)
mfsmaster[8729]: sessions downloaded 10B/0.006942s (0.001 MB/s)
07/18/19 19:59:00.030 [info] [8729:8729] : sessions downloaded 10B/0.011031s (0.001 MB/s)
mfsmaster[8729]: sessions downloaded 10B/0.011031s (0.001 MB/s)
07/18/19 20:00:02.953 [info] [8729:8729] : sessions downloaded 10B/0.054589s (0.000 MB/s)
mfsmaster[8729]: sessions downloaded 10B/0.054589s (0.000 MB/s)
07/18/19 20:01:00.295 [info] [8729:8729] : sessions downloaded 10B/0.010621s (0.001 MB/s)
mfsmaster[8729]: sessions downloaded 10B/0.010621s (0.001 MB/s)
07/18/19 20:02:00.303 [info] [8729:8729] : sessions downloaded 10B/0.004221s (0.002 MB/s)
mfsmaster[8729]: sessions downloaded 10B/0.004221s (0.002 MB/s)
07/18/19 20:03:00.334 [info] [8729:8729] : sessions downloaded 10B/0.011672s (0.001 MB/s)
mfsmaster[8729]: sessions downloaded 10B/0.011672s (0.001 MB/s)
07/18/19 20:03:33.552 [debug] [8729:8729] : master.fs.stored
07/18/19 20:04:00.047 [info] [8729:8729] : sessions downloaded 10B/0.010896s (0.001 MB/s)
`

I add here the debug records, the process takes about 3 hours to complete (without systemctl)

When I start the lizardfs service with systemctl, systemd kills the process after approximately 30 minutes, in the logs it can be seen that the master.fs.checksum.changin_not_recalculated_chunk processes take a long time to complete

@jbarrigafk
Copy link
Author

jbarrigafk commented Jul 19, 2019

we decided to remove the uraft function and only work with master / shadow, even so, only one of the metadata servers is the one that starts correctly, the other presents the same problems.
This means that there is no problem with HA uraft functions

@jbarrigafk Please give me a way to remove uraft HA, It's terrible with me now.
Alot of downtime :(
Could I reach you via mail or telegram, bro?

@kaitoan2000
could you change from uraft to master / slave?

@kaitoan2000
Copy link

we decided to remove the uraft function and only work with master / shadow, even so, only one of the metadata servers is the one that starts correctly, the other presents the same problems.
This means that there is no problem with HA uraft functions

@jbarrigafk Please give me a way to remove uraft HA, It's terrible with me now.
Alot of downtime :(
Could I reach you via mail or telegram, bro?

@kaitoan2000
could you change from uraft to master / slave?

Hi, I have change out master server to stronger server with 128GB RAM and Higher CPU. It's seem OK now.

@jbarrigafk
Copy link
Author

In our productive environment we still can't connect shadow with the master. I'm still waiting, the messages that appear are the following:

Jul 22 13:05:21 mfsmaster2-sv mfsmaster[171208]: running in shadow mode - applying changelogs from /var/lib/mfs
Jul 22 13:05:41 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 328673 changes applied (2109027931 to 2109356603), 0 skipped
Jul 22 13:05:41 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 154 changes applied (2109356604 to 2109356757), 107952 skipped
Jul 22 13:08:14 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 22 13:08:14 mfsmaster2-sv mfsmaster[171208]: synced at version = 2109356757
Jul 22 13:08:15 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 22 13:08:15 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 22 13:08:15 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2109356778
--
Jul 22 15:53:10 mfsmaster2-sv mfsmaster[171208]: running in shadow mode - applying changelogs from /var/lib/mfs
Jul 22 15:53:21 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 191943 changes applied (2111000154 to 2111192096), 0 skipped
Jul 22 15:53:22 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 208 changes applied (2111192097 to 2111192304), 69941 skipped
Jul 22 15:55:58 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 22 15:55:58 mfsmaster2-sv mfsmaster[171208]: synced at version = 2111192304
Jul 22 15:55:59 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 22 15:55:59 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 22 15:55:59 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2111192318
--
Jul 22 18:43:03 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.2: 536460 changes applied (2112197119 to 2112733578), 0 skipped
Jul 22 18:43:08 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 87486 changes applied (2112733579 to 2112821064), 0 skipped
Jul 22 18:43:08 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 73 changes applied (2112821065 to 2112821137), 85280 skipped
Jul 22 18:45:43 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 22 18:45:43 mfsmaster2-sv mfsmaster[171208]: synced at version = 2112821137
Jul 22 18:45:44 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 22 18:45:44 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 22 18:45:44 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2112821149
--
Jul 22 21:30:34 mfsmaster2-sv mfsmaster[171208]: running in shadow mode - applying changelogs from /var/lib/mfs
Jul 22 21:31:02 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 485960 changes applied (2114091450 to 2114577409), 0 skipped
Jul 22 21:31:02 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 190 changes applied (2114577410 to 2114577599), 53509 skipped
Jul 22 21:33:37 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 22 21:33:37 mfsmaster2-sv mfsmaster[171208]: synced at version = 2114577599
Jul 22 21:33:38 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 22 21:33:38 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 22 21:33:38 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2114577613
--
Jul 23 00:19:39 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 197052 changes applied (2115365618 to 2115562669), 0 skipped
Jul 23 00:19:39 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 39 changes applied (2115562670 to 2115562708), 41061 skipped
Jul 23 00:20:01 mfsmaster2-sv systemd: Started Session 10639 of user root.
Jul 23 00:22:27 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 23 00:22:27 mfsmaster2-sv mfsmaster[171208]: synced at version = 2115562708
Jul 23 00:22:28 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 23 00:22:28 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 23 00:22:28 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2115562720
--
Jul 23 03:07:29 mfsmaster2-sv mfsmaster[171208]: running in shadow mode - applying changelogs from /var/lib/mfs
Jul 23 03:07:32 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 54920 changes applied (2116153362 to 2116208281), 0 skipped
Jul 23 03:10:01 mfsmaster2-sv systemd: Started Session 10660 of user root.
Jul 23 03:10:06 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 23 03:10:06 mfsmaster2-sv mfsmaster[171208]: synced at version = 2116208281
Jul 23 03:10:07 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 23 03:10:07 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 23 03:10:07 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2116208295
--
Jul 23 05:53:30 mfsmaster2-sv mfsmaster[171208]: metadata file /var/lib/mfs/metadata.mfs read (240766263 inodes including 44634083 directory inodes and 196132180 file inodes, 196136801 chunks)
Jul 23 05:53:30 mfsmaster2-sv mfsmaster[171208]: running in shadow mode - applying changelogs from /var/lib/mfs
Jul 23 05:53:32 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 29614 changes applied (2116427850 to 2116457463), 0 skipped
Jul 23 05:56:19 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 23 05:56:19 mfsmaster2-sv mfsmaster[171208]: synced at version = 2116457463
Jul 23 05:56:20 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 23 05:56:20 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 23 05:56:20 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2116457480
--
Jul 23 08:43:05 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.2: 74775 changes applied (2116586518 to 2116661292), 0 skipped
Jul 23 08:43:06 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 11277 changes applied (2116661293 to 2116672569), 0 skipped
Jul 23 08:43:06 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 2 changes applied (2116672570 to 2116672571), 10477 skipped
Jul 23 08:45:40 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 23 08:45:40 mfsmaster2-sv mfsmaster[171208]: synced at version = 2116672571
Jul 23 08:45:41 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 23 08:45:41 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 23 08:45:41 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2116672593

@kaitoan2000
Copy link

In our productive environment we still can't connect shadow with the master. I'm still waiting, the messages that appear are the following:

Jul 22 13:05:21 mfsmaster2-sv mfsmaster[171208]: running in shadow mode - applying changelogs from /var/lib/mfs
Jul 22 13:05:41 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 328673 changes applied (2109027931 to 2109356603), 0 skipped
Jul 22 13:05:41 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 154 changes applied (2109356604 to 2109356757), 107952 skipped
Jul 22 13:08:14 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 22 13:08:14 mfsmaster2-sv mfsmaster[171208]: synced at version = 2109356757
Jul 22 13:08:15 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 22 13:08:15 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 22 13:08:15 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2109356778
--
Jul 22 15:53:10 mfsmaster2-sv mfsmaster[171208]: running in shadow mode - applying changelogs from /var/lib/mfs
Jul 22 15:53:21 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 191943 changes applied (2111000154 to 2111192096), 0 skipped
Jul 22 15:53:22 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 208 changes applied (2111192097 to 2111192304), 69941 skipped
Jul 22 15:55:58 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 22 15:55:58 mfsmaster2-sv mfsmaster[171208]: synced at version = 2111192304
Jul 22 15:55:59 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 22 15:55:59 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 22 15:55:59 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2111192318
--
Jul 22 18:43:03 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.2: 536460 changes applied (2112197119 to 2112733578), 0 skipped
Jul 22 18:43:08 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 87486 changes applied (2112733579 to 2112821064), 0 skipped
Jul 22 18:43:08 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 73 changes applied (2112821065 to 2112821137), 85280 skipped
Jul 22 18:45:43 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 22 18:45:43 mfsmaster2-sv mfsmaster[171208]: synced at version = 2112821137
Jul 22 18:45:44 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 22 18:45:44 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 22 18:45:44 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2112821149
--
Jul 22 21:30:34 mfsmaster2-sv mfsmaster[171208]: running in shadow mode - applying changelogs from /var/lib/mfs
Jul 22 21:31:02 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 485960 changes applied (2114091450 to 2114577409), 0 skipped
Jul 22 21:31:02 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 190 changes applied (2114577410 to 2114577599), 53509 skipped
Jul 22 21:33:37 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 22 21:33:37 mfsmaster2-sv mfsmaster[171208]: synced at version = 2114577599
Jul 22 21:33:38 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 22 21:33:38 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 22 21:33:38 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2114577613
--
Jul 23 00:19:39 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 197052 changes applied (2115365618 to 2115562669), 0 skipped
Jul 23 00:19:39 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 39 changes applied (2115562670 to 2115562708), 41061 skipped
Jul 23 00:20:01 mfsmaster2-sv systemd: Started Session 10639 of user root.
Jul 23 00:22:27 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 23 00:22:27 mfsmaster2-sv mfsmaster[171208]: synced at version = 2115562708
Jul 23 00:22:28 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 23 00:22:28 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 23 00:22:28 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2115562720
--
Jul 23 03:07:29 mfsmaster2-sv mfsmaster[171208]: running in shadow mode - applying changelogs from /var/lib/mfs
Jul 23 03:07:32 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 54920 changes applied (2116153362 to 2116208281), 0 skipped
Jul 23 03:10:01 mfsmaster2-sv systemd: Started Session 10660 of user root.
Jul 23 03:10:06 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 23 03:10:06 mfsmaster2-sv mfsmaster[171208]: synced at version = 2116208281
Jul 23 03:10:07 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 23 03:10:07 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 23 03:10:07 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2116208295
--
Jul 23 05:53:30 mfsmaster2-sv mfsmaster[171208]: metadata file /var/lib/mfs/metadata.mfs read (240766263 inodes including 44634083 directory inodes and 196132180 file inodes, 196136801 chunks)
Jul 23 05:53:30 mfsmaster2-sv mfsmaster[171208]: running in shadow mode - applying changelogs from /var/lib/mfs
Jul 23 05:53:32 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 29614 changes applied (2116427850 to 2116457463), 0 skipped
Jul 23 05:56:19 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 23 05:56:19 mfsmaster2-sv mfsmaster[171208]: synced at version = 2116457463
Jul 23 05:56:20 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 23 05:56:20 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 23 05:56:20 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2116457480
--
Jul 23 08:43:05 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.2: 74775 changes applied (2116586518 to 2116661292), 0 skipped
Jul 23 08:43:06 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs.1: 11277 changes applied (2116661293 to 2116672569), 0 skipped
Jul 23 08:43:06 mfsmaster2-sv mfsmaster[171208]: /var/lib/mfs/changelog.mfs: 2 changes applied (2116672570 to 2116672571), 10477 skipped
Jul 23 08:45:40 mfsmaster2-sv mfsmaster[171208]: all needed changelogs applied successfully
Jul 23 08:45:40 mfsmaster2-sv mfsmaster[171208]: synced at version = 2116672571
Jul 23 08:45:41 mfsmaster2-sv mfsmaster[171208]: connecting to Master
Jul 23 08:45:41 mfsmaster2-sv mfsmaster[171208]: connected to Master
Jul 23 08:45:41 mfsmaster2-sv mfsmaster[171208]: unloading filesystem at 2116672593

Hi,
Could you show CPU of your server?
I think It's get 100 % when loading filesystem and hang out.

@jbarrigafk
Copy link
Author

jbarrigafk commented Jul 24, 2019

Captura de pantalla de 2019-07-24 10-42-22

@kaitoan2000 the cpu remains between 60 and 70% with master.fs.checksum.changing_not_recalculated_chunk process.
I don't understand why it does an unloading filesystem if all the changelogs are synchronized correctly

Virtual Disk 0 
ID                                : 0
Status                            : Ok
Name                              : main
State                             : Ready
Hot Spare Policy violated         : Not Assigned
Encrypted                         : No
Layout                            : RAID-1
Size                              : 1,862.50 GB (1999844147200 bytes)
T10 Protection Information Status : No
Associated Fluid Cache State      : Not Applicable
Device Name                       : /dev/sda
Bus Protocol                      : SATA
Media                             : SSD
Read Policy                       : Read Ahead
Write Policy                      : Write Back
Cache Policy                      : Not Applicable
Stripe Element Size               : 64 KB
Disk Cache Policy                 : Unchanged

[root@mfsmaster2-sv ~]# lsblk
NAME                                          MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                                             8:0    0  1.8T  0 disk  
├─sda1                                          8:1    0    1G  0 part  /boot
└─sda2                                          8:2    0  1.8T  0 part  
  └─luks-b8dbebda-4959-4b1d-8b9c-a1765b6e9ac2 253:0    0  1.8T  0 crypt 
    ├─vg_main-lv_root                         253:1    0   20G  0 lvm   /
    ├─vg_main-lv_swap                         253:2    0  128G  0 lvm   [SWAP]
    ├─vg_main-lv_var                          253:3    0  1.7T  0 lvm   /var
    ├─vg_main-lv_tmp                          253:4    0   10G  0 lvm   /tmp
    ├─vg_main-lv_var_log                      253:5    0   10G  0 lvm   /var/log
    └─vg_main-lv_home                         253:6    0   10G  0 lvm   /home

until today, we are still working only with 1 master and a couple of metaloggers, but it is not safe for us, we need at least 1 synchronized shadow

@pbeza
Copy link
Member

pbeza commented Aug 1, 2019

@jbarrigafk I feel quite confused – you started this issue because of uraft that failed to start and now you are unable to make shadow server connected to the master. Do you have any idea how can I reproduce it to help you?

@jbarrigafk
Copy link
Author

jbarrigafk commented Aug 1, 2019

@pbeza
I initially thought it was something with uraft, but since my cluster is in a productive environment I decided to remove uraft. Even without uraft, lizardfs still has the same problem loading metadata in my shadows.

Let me create a test environment again with URAFT to reproduce this problem, I anticipate that this problem has arisen in Centos 7 and Lizardfs 3.13.0 from lizardfs-bundle-CentOS-7.5.1804.tar when trying to load/synchronize my shadow a file metadata.mfs of 26GB which are around 130 million small files (Only active master works).
My cluster with these metadata uses 70GB RAM.

maybe I can share my metadata.mfs file

@kaitoan2000
Copy link

Aug 6 15:24:32 hostname mfsmaster[5312]: running in shadow mode - applying changelogs from /var/lib/mfs
Aug 6 15:26:08 hostname mfsmaster[5312]: /var/lib/mfs/changelog.mfs.1: 3257543 changes applied (26098986946 to 26102244488), 0 skipped
Aug 6 15:26:10 hostname mfsmaster[5312]: /var/lib/mfs/changelog.mfs: 29385 changes applied (26102244489 to 26102273873), 679771 skipped
Aug 6 15:27:33 hostname mfsmaster[5312]: all needed changelogs applied successfully
Aug 6 15:27:33 hostname mfsmaster[5312]: synced at version = 26102273873
Aug 6 15:27:34 hostname mfsmaster[5312]: connecting to Master
Aug 6 15:27:34 hostname mfsmaster[5312]: connected to Master
Aug 6 15:27:34 hostname mfsmaster[5312]: unloading filesystem at 26102273875
Aug 6 15:28:30 hostname mfsmaster[5312]: connecting to Master
Aug 6 15:28:30 hostname mfsmaster[5312]: connected to Master
Aug 6 15:32:15 hostname mfsmaster[5312]: metadata downloaded 14132157732B/224.629195s (62.913 MB/s)
Aug 6 15:32:21 hostname mfsmaster[5312]: changelog.mfs.1 downloaded 381213302B/6.000840s (63.527 MB/s)
Aug 6 15:32:34 hostname mfsmaster[5312]: changelog.mfs.2 downloaded 748056673B/12.925680s (57.874 MB/s)
Aug 6 15:32:34 hostname mfsmaster[5312]: sessions downloaded 4027B/0.000784s (5.136 MB/s)
Aug 6 15:32:34 hostname mfsmaster[5312]: opened metadata file /var/lib/mfs/metadata.mfs
Aug 6 15:32:34 hostname mfsmaster[5312]: loading objects (files,directories,etc.) from the metadata file

Still unloading :(

@blink69
Copy link
Contributor

blink69 commented Aug 6, 2019

Please check value of BACK_META_KEEP_PREVIOUS in you mfsmaster.cfg file.

@kaitoan2000
Copy link

kaitoan2000 commented Aug 6, 2019

BACK_META_KEEP_PREVIOUS

grep BACK_META_KEEP_PREVIOUS mfsmaster.cfg
BACK_META_KEEP_PREVIOUS = 5
and these are all files here.
-rw-r----- 1 mfs mfs 14G 15:44 6 Th08 metadata.mfs
-rw-r----- 1 mfs mfs 14G 15:32 6 Th08 metadata.mfs.1
-rw-r----- 1 mfs mfs 14G 15:27 6 Th08 metadata.mfs.2
-rw-r----- 1 mfs mfs 14G 15:15 6 Th08 metadata.mfs.3
-rw-r----- 1 mfs mfs 14G 15:10 6 Th08 metadata.mfs.4
-rw-r----- 1 mfs mfs 14G 14:58 6 Th08 metadata.mfs.5

Screenshot from 2019-08-06 15-56-59
With 100% CPU .

@blink69
Copy link
Contributor

blink69 commented Aug 6, 2019

set 1 and your problem with unloading metadata by uraft will disappear.
With uraft you can only play with BACK_LOGS variable.

@kaitoan2000
Copy link

kaitoan2000 commented Aug 6, 2019

set 1 and your problem with unloading metadata by uraft will disappear.
With uraft you can only play with BACK_LOGS variable.

I have editted BACK_META_KEEP_PREVIOUS to default (1), but not luck.
Aug 6 17:34:00 hostname mfsmaster[53621]: /var/lib/mfs/changelog.mfs.1: 4778156 changes applied (26124369179 to 26129147334), 0 skipped
Aug 6 17:34:02 hostname mfsmaster[53621]: /var/lib/mfs/changelog.mfs: 38301 changes applied (26129147335 to 26129185635), 838880 skipped
Aug 6 17:35:13 hostname6 mfsmaster[53621]: all needed changelogs applied successfully
Aug 6 17:35:13 hostname mfsmaster[53621]: synced at version = 26129185635
Aug 6 17:35:13 hostname mfsmaster[53621]: write to Master error: Connection reset by peer
Aug 6 17:35:14 hostname mfsmaster[53621]: connecting to Master
Aug 6 17:35:14 hostname mfsmaster[53621]: connected to Master
Aug 6 17:35:14 hostname mfsmaster[53621]: unloading filesystem at 26129185707

In master :
Aug 6 17:36:00 hostname mfsmaster: connection with ML(10.30.6.34) has been closed by peer
Aug 6 18:27:17 hostname mfsmaster: connection with ML(10.30.6.34) has been closed by peer

@jbarrigafk
Copy link
Author

jbarrigafk commented Aug 6, 2019

@pbeza
Masters resolution

/etc/hosts
10.x.x.11     master1-test
10.x.x.12     master2-test
10.x.x.13     master3-test
10.x.x.10     mfsmaster

Download lizardfs bundle on masters, chunkservers and clients
wget https://lizardfs.com/wp-content/uploads/2018/07/lizardfs-bundle-CentOS-7.5.1804.tar
Extract and install metadata servers packages

cd lizardfs-bundle-CentOS-7.5.1804
yum localinstall lizardfs-master-3.13.0-0el7.x86_64.rpm -y
yum localinstall lizardfs-adm-3.13.0-0el7.x86_64.rpm -y
yum localinstall lizardfs-uraft-3.13.0-0el7.x86_64.rpm -y
echo 1 > /proc/sys/net/ipv4/conf/all/arp_accept
echo 1 > /proc/sys/vm/overcommit_memory

Config on metadata servers

-- /etc/mfs/mfsmaster.cfg --
 PERSONALITY = ha-cluster-managed
 ADMIN_PASSWORD = mfspassword
 WORKING_USER = mfs
 WORKING_GROUP = mfs
 DATA_PATH = /var/lib/mfs
 MATOML_LISTEN_HOST = *
 MATOML_LISTEN_PORT = 9419
 MATOCS_LISTEN_HOST = *
 MATOCS_LISTEN_PORT = 9420
 MATOCL_LISTEN_HOST = *
 MATOCL_LISTEN_PORT = 9421
 MATOTS_LISTEN_HOST = *
 MATOTS_LISTEN_PORT = 9424
 NO_ATIME = 0
 MASTER_HOST = mfsmaster
-- mfsexports.cfg --
*                       /       rw,alldirs,maproot=0
*                       .       rw
-- mfsgoals.cfg --
1 test_goal : _
-- master1-test | lizardfs-uraft.cfg --
 URAFT_NODE_ADDRESS = master1-test
 URAFT_NODE_ADDRESS = master2-test
 URAFT_NODE_ADDRESS = master3-test
 URAFT_ID = 0
 URAFT_FLOATING_IP      = 10.x.x.10
 URAFT_FLOATING_NETMASK = 255.255.254.0
 URAFT_FLOATING_IFACE   = bond0
-- master2-test | lizardfs-uraft.cfg --
 URAFT_NODE_ADDRESS = master1-test
 URAFT_NODE_ADDRESS = master2-test
 URAFT_NODE_ADDRESS = master3-test
 URAFT_ID = 1
 URAFT_FLOATING_IP      = 10.x.x.10
 URAFT_FLOATING_NETMASK = 255.255.254.0
 URAFT_FLOATING_IFACE   = eth0
-- master3-test | lizardfs-uraft.cfg --
 URAFT_NODE_ADDRESS = master1-test
 URAFT_NODE_ADDRESS = master2-test
 URAFT_NODE_ADDRESS = master3-test
 URAFT_ID = 2
 URAFT_FLOATING_IP      = 10.x.x.10
 URAFT_FLOATING_NETMASK = 255.255.254.0
 URAFT_FLOATING_IFACE   = bond0
[master1-test]# systemctl start lizardfs-uraft
[master2-test]# systemctl start lizardfs-uraft
[master3-test]# systemctl start lizardfs-uraft

chunkserver

-- /etc/hosts --
 mfsmaster      10.x.x.10
yum localinstall lizardfs-chunkserver-3.13.0-0el7.x86_64.rpm
-- mfschunkserver.cfg --
 MASTER_HOST = mfsmaster
 PERFORM_FSYNC = 0
-- mfshdd.cfg --
 /var/hd1
mkdir /var/hd1
chown mfs:mfs /var/hd1

[mfschunkserver1]# mfschunkserver start

yum localinstall lizardfs-cgi-3.13.0-0el7.x86_64.rpm
yum localinstall lizardfs-cgiserv-3.13.0-0el7.x86_64.rpm
systemctl restart lizardfs-cgiserv

Client

yum localinstall lizardfs-client-3.13.0-0el7.x86_64.rpm
mkdir /lizardfs
mfsmount /lizardfs/
mkdir /lizardfs/files
#Setting 1 copy from test_goal on mfsgoals.cfg
lizardfs setgoal test_goal /lizardfs/
cd /lizardfs/files

#Create a test file and split file to have millions of small files inside lizardfs.
#Note: To speed up the creation of files, I repeated this process (split) in several screen sessions in different directories within /lizardfs/files (i use 8 screen sessions for this test)

dd if=/dev/zero of=masterfile bs=1 count=1300000000
- screen -S 0
cd /lizardfs/files/masterfile
split -b 10 -a 10 masterfile

mkdir /lizardfs/files/files1
cp masterfile /lizardfs/files/files1/masterfile1
- screen -S 1
cd /lizardfs/files/files1/masterfile1
split -b 10 -a 10 masterfile1
.
mkdir /lizardfs/files/files8
cp masterfile /lizardfs/files/files8/masterfile8
- screen -S 8
cd /lizardfs/files/files8/masterfile8
split -b 10 -a 10 masterfile8

#create directory trees to replicate my data structure, i use fdtree.bash from here: https://github.com/LLNL/fdtree
#Repeat this script several times on differents folders inside screens sessions (i use 5 screen sessions for this test)

mkdir /lizardfs/files/tree1/
cd /lizardfs/files/tree1/
./fdtree.bash -d 10000 -l 100

#To further speed up the creation of files and directories, start creating snapshots of the entire lizardfs data root (i'll create 28 snapshots on this test)
cd /lizardfs
mfsmakesnapshot -l files files_bkp1
.
.
mfsmakesnapshot -l files files_bkp28

#Throughout this process I was killing the lizardfs-uraft and mfsmaster service on one of my metadata servers and then running systemctl start lizardfs-uraft again

ps aux |grep mfs
kill -9 mfsmasterID lizardfs-uraftID
systemctl restart lizardfs-uraft

#It may not be the best way to reproduce the problem but it was how I succeeded. Millions of small files and many directory trees must be created within lizardfs, until there comes a point where lizardfs can no longer load metadata

my metadata server throw 27 SIGPROF on my lizardfs cluster with:
Captura de pantalla de 2019-08-06 15-52-01

chunkserver 44 GB Space used (single chunkserver for test)

@jbarrigafk
Copy link
Author

set 1 and your problem with unloading metadata by uraft will disappear.
With uraft you can only play with BACK_LOGS variable.

@blink69
I've always had that value at 1 and still still doing unloading with:
trap -- '' SIGPROF ; trap -- '' SIGVTALRM;
/usr/sbin/mfsmaster -c /etc/mfs/mfsmaster.cfg -d
on shadow.

with systemctl start lizardfs-uraft I get SIGPROF

@jbarrigafk
Copy link
Author

@pbeza
I share my metadata.mfs from my testing environment, which is where lizardfs throws SIGPROF at me if you don't want to reproduce my problem, since for me it was a long time to create files and directories (4hrs min ..)

https://drive.google.com/open?id=1OTkHOhO3h_nhxCjB_fNcJ9mZBKSkE1yK

@kaitoan2000
Copy link

Hi there, it's OK after 4 hours to try loading all filesystem.
I think BACK_META_KEEP_PREVIOUS is not helpful for this case.

@jbarrigafk
Copy link
Author

jbarrigafk commented Aug 7, 2019

Hi there, it's OK after 4 hours to try loading all filesystem.
I think BACK_META_KEEP_PREVIOUS is not helpful for this case.

@kaitoan2000 Yes, that is what I take in my test environment without having writes or readings in my lizardfs storage cluster. But in my productive environment with a lot of writing and reading it takes about 2 or 3 days and I don't finish synchronizing

If you already have your metadata server synchronized after 4 hours, I suggest you try the uraft HA, try turning off nodes and verify that the hearbeat ip works on the different nodes of your cluster.
When I test, the HA functions stop working correctly

@jbarrigafk
Copy link
Author

@pbeza hi, did you reproduce my issue? im still with a single master on my cluster

@bash99
Copy link

bash99 commented Sep 6, 2019

@jbarrigafk Does your metaserver also has 128GB memory?

It seems lizardfs require more memory then what the manual says:
1G memory for 3M objects, as 130 M files, it should be works within 44G memory, 64G should be ok for this.

@pbeza
Copy link
Member

pbeza commented Sep 6, 2019

Sorry for late response @jbarrigafk but currently I have issues with servers that will be used to test your use case, so you must stay patient. :(

@jbarrigafk
Copy link
Author

@jbarrigafk Does your metaserver also has 128GB memory?

It seems lizardfs require more memory then what the manual says:
1G memory for 3M objects, as 130 M files, it should be works within 44G memory, 64G should be ok for this.

@bash99 Hi thanks for your response

In my productive environment it was:
master1 = 94GB RAM
master2 = 256 GB RAM
master3 = 94GB RAM

my primary was always master1 and my shadows were master2 and master3 which could never synchronize the metadata of master1.

Since I could never make this work, I had to stop the entire cluster and start master2 as primary and single node in my cluster because it was my node that had more available ram (256GB)

I never had trouble starting my nodes with the role of Master/Primary, the problem really is in the Shadows, they neves sync to master... :/

Im looking for more info about memory requirements by fs objects, but i cant find nothing. If u can share some link to read it, I would appreciate it

@jbarrigafk
Copy link
Author

Sorry for late response @jbarrigafk but currently I have issues with servers that will be used to test your use case, so you must stay patient. :(

@pbeza
Do not worry, I will remain patient and hopefully find the origin of this problem I have.
I will continue researching and testing

@phamthanhnhan14
Copy link

phamthanhnhan14 commented Sep 25, 2019

I got this error today with both shadow run as lizardfs-uraft and mfsmaster with shadow mode.
Screenshot from 2019-09-25 20-49-39
More log:

Sep 25 20:33:06 mfsmaster[9258]: running in shadow mode - applying changelogs from /var/lib/mfs
Sep 25 20:33:38mfsmaster[9258]: /var/lib/mfs/changelog.mfs.1: 3999652 changes applied (38248045773 to 38252045424), 0 skipped
Sep 25 20:33:38 mfsmaster[9258]: /var/lib/mfs/changelog.mfs: 24690 changes applied (38252045425 to 38252070114), 911498 skipped
Sep 25 20:35:32 mfsmaster[9258]: all needed changelogs applied successfully
Sep 25 20:35:32 mfsmaster[9258]: synced at version = 38252070114
Sep 25 20:35:33 mfsmaster[9258]: connecting to Master
Sep 25 20:35:33 mfsmaster[9258]: connected to Master
Sep 25 20:35:33 mfsmaster[9258]: unloading filesystem at 38252070118

@phamthanhnhan14
Copy link

@pbeza hi, did you reproduce my issue? im still with a single master on my cluster

Is it OK till now bro?

@jbarrigafk
Copy link
Author

Hi @phamthanhnhan14 Yes. My cluster of only 1 master still works but I had to remove it from production, we still have it working but only to make readings.
We had to create a new cluster without uraft and with a different directory structure to the previous one to reduce the number of directory trees. In this way I hope to reduce the number of fs objects.
if it fails again I will have what to forget about lizard :(
I hope to have time soon to resume this topic

@jbarrigafk
Copy link
Author

jbarrigafk commented May 26, 2021

Greetings to all, I return again to try to solve this problem together with all of you. I have started to carry out some tests, having the same scheme:
Actual Cluster values:
Version: 3.13.0
RAM used: 99 GiB
Files: 267390258
chunks: 267370981
all chunk copies: 534751571

1x metadata master / 256 RAM
1x metadata shadow / 192 RAM
6x chunkservers

Im trying to start my Shadow server with default config but it fails.
This is my debug shadow messages.
cpu time expired

Captura de pantalla de 2021-05-25 18-19-17

This is the only thread related to this message.
#742

feel free to ask me for any information to try to solve this problem

@jbarrigafk
Copy link
Author

hello everyone.
I've been trying a lot of things, kernel settings related to CPU limits to avoid getting systemd's SIGPROF and CPU Time Expired message but with no luck.
However, I decided to completely empty my shadow server's ram with:
sync; echo 3 > /proc/sys/vm/drop_caches

Since I realized that my server already had 90 GB ram in use without having any active service and that is why systemd sent:
systemd: lizardfs-master.service: main process exited, code = dead, status = 27/PROF

since my server did not have enough memory to complete the lizardfs process.
Well, then empty my server ram and start the lizard service again: mfsmaster start
To my surprise I no longer got the message from systemd:
systemd: lizardfs-master.service: main process exited, code = dead, status = 27/PROF

I thought that now everything would be fine, but I started to monitor the consumption of the ram memory of my shadow server and it was growing at a really fast rate, however, the lizardfs process still continued as it should be, the logs showed progress correctly, until I saw that my server reached its maximum ram memory in use and the following log appeared:

Jun  1 12:02:39 db2-sv mfsmaster[19540]: synced at version = 4470088089
Jun  1 12:02:40 db2-sv mfsmaster[19540]: connecting to Master
Jun  1 12:02:40 db2-sv mfsmaster[19540]: connected to Master
Jun  1 12:02:40 db2-sv mfsmaster[19540]: unloading filesystem at 4470088091

If I only had a little more ram, my shadow would complete the sync process with the master, but why does it need twice the ram memory of the master? My master node has 100 GiB of ram in use.

I have another cluster where the master node has 41 GiB in use and the shadow has 23 GiB, I still cannot find the reason for the high ram consumption to synchronize my shadow.

This is my shadow server process and consumption
Captura de pantalla de 2021-06-01 12-03-44

I will continue investigating, but now with issues related to memory

@jkiebzak
Copy link

jkiebzak commented Jun 1, 2021 via email

@jbarrigafk
Copy link
Author

jbarrigafk commented Jun 1, 2021

I don't believe the mfsmaster process releases memory. So when files are deleted, that memory space is kept. What happens to the memory footprint on your shadow server if you restart mfsmaster?

@jkiebzak hi, thanks for giving signs of life in this thread
I did a mfsmaster restart and 2 new processes were created apart from the one that was already running

[root@db2-sv jbarriga]# ps aux |grep mfs
root     19357  0.0  0.0 152088  2364 pts/1    S+   14:18   0:00 mfsmaster restart
mfs      19359  0.0  0.0 158468  1804 ?        S<   14:18   0:00 mfsmaster restart
mfs      19540 83.9 42.7 84840300 84696148 ?   R<   11:25 151:18 mfsmaster start

Therefore, lock logs appeared:
Jun 1 14:32:52 db2-sv mfsmaster: about 880 seconds passed and the lock /zfs-data/mfs/.mfsmaster.lock still exists...
Once the mfsmaster service was restarted and the lock was released, the memory ram went down to 43 GB of RAM and to increase little by little.

In this process, RAM increase from 43 GB to 76 GB
Jun 1 14:33:02 db2-sv mfsmaster[19359]: connected to Master

In this process, RAM increase from 76 GB to 112 GB
Jun 1 14:42:28 db2-sv mfsmaster[19359]: loading objects (files,directories,etc.) from the metadata file

In this process, RAM increase from 112 GB to 141 GB
Jun 1 14:44:00 db2-sv mfsmaster[19359]: loading names from the metadata file

In this process, RAM increase from 141 GB to 155 GB

Jun  1 14:51:50 db2-sv mfsmaster[19359]: loading deletion timestamps from the metadata file
Jun  1 14:51:50 db2-sv mfsmaster[19359]: loading extra attributes (xattr) from the metadata file
Jun  1 14:51:50 db2-sv mfsmaster[19359]: loading access control lists from the metadata file
Jun  1 14:51:50 db2-sv mfsmaster[19359]: loading quota entries from the metadata file
Jun  1 14:51:50 db2-sv mfsmaster[19359]: loading file locks from the metadata file
Jun  1 14:51:50 db2-sv mfsmaster[19359]: loading chunks data from the metadata file

In this process, RAM increase from 155 GB to 176 GB

Jun  1 14:54:31 db2-sv mfsmaster[19359]: checking filesystem consistency of the metadata file
Jun  1 14:54:37 db2-sv mfsmaster[19359]: connecting files and chunks
Jun  1 15:06:09 db2-sv mfsmaster[19359]: calculating checksum of the metadata
Jun  1 15:06:51 db2-sv mfsmaster[19359]: /zfs-data/mfs/changelog.mfs: 591 changes applied (4474280052 to 4474280642), 143368 skipped

In this process, RAM increase from 176 GB to 185 GB

Jun  1 15:10:28 db2-sv mfsmaster[19359]: all needed changelogs applied successfully
Jun  1 15:10:28 db2-sv mfsmaster[19359]: synced at version = 4474280642
Jun  1 15:10:29 db2-sv mfsmaster[19359]: connecting to Master
Jun  1 15:10:29 db2-sv mfsmaster[19359]: connected to Master
Jun  1 15:10:29 db2-sv mfsmaster[19359]: unloading filesystem at 4474280644

after 5 minutes, the ram went down from 185 GB to 111 GB and the process started again

Jun  1 15:15:37 db2-sv mfsmaster[19359]: connecting to Master
Jun  1 15:15:37 db2-sv mfsmaster[19359]: connected to Master

@jbarrigafk
Copy link
Author

jbarrigafk commented Jun 1, 2021

indeed, lizard does not free the memory with a restart, it stays in some kind of cache perhaps? Even so, my cluster has 100GB of ram in use and my shadow cannot synchronize with 188GB, I also observe that the swap memory is not used, the server has the default value of swapiness in 60

[root@db2-sv jbarriga]# cat /proc/sys/vm/swappiness
60
[root@db2-sv jbarriga]# cat /proc/sys/vm/overcommit_memory
1

@bash99
Copy link

bash99 commented Jun 3, 2021

indeed, lizard does not free the memory with a restart, it stays in some kind of cache perhaps? Even so, my cluster has 100GB of ram in use and my shadow cannot synchronize with 188GB, I also observe that the swap memory is not used, the server has the default value of swapiness in 60

Maybe it's some shm memory in use?
try ipcs -m to found if there are any outstand memory usage.

@jbarrigafk
Copy link
Author

indeed, lizard does not free the memory with a restart, it stays in some kind of cache perhaps? Even so, my cluster has 100GB of ram in use and my shadow cannot synchronize with 188GB, I also observe that the swap memory is not used, the server has the default value of swapiness in 60

Maybe it's some shm memory in use?
try ipcs -m to found if there are any outstand memory usage.


[root@db2-sv ~]# ipcs -m

---- Segmentos memoria compartida ----
key        shmid      propietario perms      bytes      nattch     estado      


Nothing with ipcs -m

the mfsmaster service has been running for 15 days, but it continues to unload, it does not finish synchronizing with the master

Captura de pantalla de 2021-06-15 12-16-58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants