New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master: hourly fork / 30 seconds down time (all clients) #323

Open
onlyjob opened this Issue Oct 31, 2015 · 7 comments

Comments

Projects
None yet
5 participants
@onlyjob
Copy link
Member

onlyjob commented Oct 31, 2015

At the begginning of every hour (with few seconds precision) all clients timeout for ~30 seconds as follows:

Oct 31 20:00:08 debmain mfsmount[9250]: master: tcp recv error: ETIMEDOUT (Operation timed out)
Oct 31 20:00:09 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:10 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:11 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:12 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:13 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:14 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:15 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:16 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:17 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:18 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:20 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:21 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:22 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:23 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:24 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:25 debmain mfsmount[9250]: master: register error (read header: ETIMEDOUT (Operation timed out))
Oct 31 20:00:26 debmain mfsmount[9250]: registered to master (session id #2437)

Inspection of master log revealed:

Oct 31 20:00:00 debstor mfsmaster[20991]: fork failed: ENOMEM (Cannot allocate memory)
Oct 31 20:00:00 debstor mfsmaster[20991]: mfsmaster[20991]: fork failed: ENOMEM (Cannot allocate memory)
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.2) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.76) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.2) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.76) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.204) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.7) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: main master server module: (ip:192.168.0.250) write error: EPIPE (Broken pipe)
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.75) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.2) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.76) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.204) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.7) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: main master server module: (ip:192.168.0.250) write error: EPIPE (Broken pipe)
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.75) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.2) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.76) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.204) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.7) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: main master server module: (ip:192.168.0.250) write error: EPIPE (Broken pipe)
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.75) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.2) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.76) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.204) has been closed by peer
Oct 31 20:00:26 debstor mfsmaster[20991]: connection with client(ip:192.168.0.7) has been closed by peer
[...]

Fork fails because master uses about 50% of server's RAM and only about 40% is available. Master is started with LOCK_MEMORY = 1.

LizardFS compiled with libJudy but without tcmalloc (due to performance reasons).

I do not recall such problem on 2.6.0. It would be nice to avoid hourly forking as fork needs twice as much RAM as already allocated by master.

@psarna

This comment has been minimized.

Copy link
Member

psarna commented Oct 31, 2015

This is not 3.9.2 issue, previous versions of LizardFS and actually any executable file in the world could have the same issue with fork.
The problem is that Linux assumes, that the forked process will need as much memory as the original one, which is usually not true because of copy-on-write.
Issue can be solved with setting overcommit_memory option in the kernel,
which will let processes fork even if they could potentially need more memory than amount available in the system:

echo 1 > /proc/sys/vm/overcommit_memory

@psarna psarna closed this Oct 31, 2015

@Zorlin

This comment has been minimized.

Copy link

Zorlin commented Oct 31, 2015

Alternately, you could also consider RAM requirements as being twice the
size of the metadata set. At my previous employer that was the answer -
gobs of RAM with plenty of room for expansion.
On 31 Oct 2015 10:56 PM, "Piotr Sarna" notifications@github.com wrote:

This is not 3.9.2 issue, previous versions of LizardFS and actually any
executable file in the world could have the same issue with fork.
The problem is that Linux assumes, that the forked process will need as
much memory as the original one, which is usually not true because of
copy-on-write.
Issue can be solved with setting overcommit_memory option in the kernel,
which will let processes fork even if they could potentially need more
memory than amount available in the system:

echo 1 > /proc/sys/vm/overcommit_memory


Reply to this email directly or view it on GitHub
#323 (comment).

@onlyjob

This comment has been minimized.

Copy link
Member Author

onlyjob commented Oct 31, 2015

Probably I hit this issue only now due to growth in metadata...
I'm surprised that vm.overcommit_memory=1 is required in such situation. I'm not sure if I'm comfortable with that. Besides default is vm.overcommit_memory=0

By why hourly fork in first place?? Is it really necessary? What for?

@Zorlin, amount of RAM is roughly twice the size of metadata. Co-located chunkserver takes another 10% of RAM. 60% RAM utilisation is not bad and it is certainly not nice to require twice as much RAM so master could fork.
I expect process to be able to allocate all the memory it needs.

I consider this issue to be a bug in master's memory management hence I respectfully request to reopen.

@psarna

This comment has been minimized.

Copy link
Member

psarna commented Oct 31, 2015

Forks are used for hourly metadata dumps. The design comes frm MFS1.6 and is definitely not flawless, but rewriting the mechanism without forks is not considered a priority right now.

@psarna psarna reopened this Oct 31, 2015

@onlyjob

This comment has been minimized.

Copy link
Member Author

onlyjob commented Oct 31, 2015

Thanks for reopening. I agree that it is indeed a low priority issue (especially if vm.overcommit_memory=1 helps) but it would be nice to have an agenda to get rid of hourly forks in (long-term) TODO list.

Meanwhile here is an article explaining some pitfalls of overcommitting: http://www.etalabs.net/overcommit.html

Thanks.

@onlyjob onlyjob changed the title 3.9.2 master: hourly fork / 30 seconds down time (all clients) master: hourly fork / 30 seconds down time (all clients) Oct 31, 2015

@blink69 blink69 added the enhancement label Oct 31, 2015

@biocyberman

This comment has been minimized.

Copy link

biocyberman commented Aug 22, 2016

@onlyjob Could you explain a bit more of how you overcome this problem for now?

I am running 3.10 and writing from one client to one of chunkservers is timed out. Shutting down processes and services to free up memory on the mfsmaster seems to help. But what is a better solution other than the obvious: buy more RAM?

@onlyjob

This comment has been minimized.

Copy link
Member Author

onlyjob commented Aug 24, 2016

On 3.9.4 master used about 51% of RAM on my server (where buying more RAM is not an option).
For some time I've used vm.overcommit_memory=1 (see sysctl(8)) with some extra swap space.

3.10.1 optimised RAM usage so master uses a little less than 50% of RAM and does not exhibit the problem (as long as there is enough free RAM to accommodate 200% of current allocation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment