Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runner: don't allow multiple instances of daemon to run #418

Merged
merged 1 commit into from May 25, 2018

Conversation

pkalever
Copy link
Contributor

until now there is no check to defend on multiple runs of daemon within
the same node. This patch takes a non blocking lock on 'tcmu.lock' file,
if it succeeds only then daemon is allowed to run, if there is already a
lock on lock-file (taken by some other instance of tcmu-runner) we will exit.

Signed-off-by: Prasanna Kumar Kalever prasanna.kalever@redhat.com

main.c Outdated
lock.l_type = F_WRLCK;
if (fcntl(fd, F_SETLK, &lock) == -1) {
tcmu_err("tcmu-runner is already running...\n");
close(fd);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better add one goto for closing the fd to make the code more readable.

Or we should it close(fd) everywhere just after this line when trying to goto error tags.

Copy link
Contributor Author

@pkalever pkalever May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @lxbsz
I will fix this part.

main.c Outdated
tcmu_err("tcmu-runner is already running...\n");
close(fd);
goto destroy_log;
}

ret = load_our_module();
if (ret < 0) {
tcmu_err("couldn't load module\n");
goto destroy_log;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like here.

main.c Outdated

lock.l_type = F_WRLCK;
if (fcntl(fd, F_SETLK, &lock) == -1) {
tcmu_err("tcmu-runner is already running...\n");
Copy link
Collaborator

@lxbsz lxbsz May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that only when errno == EACCES or EAGAIN then we will be sure that the tcmu-runner main process is already runing ? Or it should be other op errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will defend on EAGAIN

@pkalever
Copy link
Contributor Author

@lxbsz updated with the suggestions. Thanks!

@lxbsz
Copy link
Collaborator

lxbsz commented May 16, 2018

It looks good to me and test it and the logs are:

/usr/bin/tcmu-runner --tcmu-log-dir=/var/log/gluster-block/ --debug
The logdir option from the tcmu.conf will be ignored
Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
2018-05-16 09:30:46.462 6677 [ERROR] main:1073: tcmu-runner is already running...

Copy link

@pranithk pranithk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

main.c Outdated
@@ -56,6 +56,8 @@
#include "libtcmu_config.h"
#include "libtcmu_log.h"

# define TCMU_LOCK_FILE "/var/run/tcmu.lock"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this more normally in the lock dir like

/var/run/lock/tcmu.lock
or
/var/run/lock/tcmu/lock
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes will pick the first one.

main.c Outdated
@@ -994,6 +996,8 @@ int main(int argc, char **argv)
GIOChannel *libtcmu_gio;
guint reg_id;
bool new_path = false;
struct flock lock = {0, };
int fd;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rename to lock_fd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this.

main.c Outdated
@@ -1146,6 +1173,8 @@ int main(int argc, char **argv)
tcmulib_close(tcmulib_context);
err_free_handlers:
darray_free(handlers);
close_fd:
close(fd);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why for the normal shutdown we do a F_UNLCK and a close, but for the error path we only do a close on the fd?

From the man page it sounds like the same thing happens and the lock is released due to the process exiting. Is that correct? Is it just customary to only do a close in error paths liek this?

Copy link
Contributor Author

@pkalever pkalever May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From man:

  • As well as being removed by an explicit F_UNLCK, record locks are automatically released when the process terminates.

  • If a process closes any file descriptor referring to a file, then all of the process's locks on that file are released

I just wrote more like sequence, like open, grab lock, release lock, close in the normal shutdown. And forget to add release lock at error path.

But, just closing the fd without F_UNLCK will also release the lock anyway.

@mikechristie I'm not sure what would you prefer:

  1. remove F_UNLCK completely and just close(fd) in the both normal & error paths (or)
  2. add F_UNLCK in error path, which I forget before ?

Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, I do not know.

Go for the F_UNLOCK just to make it clear I guess. We seem to do that for some other things so it will at least be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - Make-sense :-)

until now there is no check to defend on multiple runs of daemon within
the same node. This patch takes a non blocking lock on 'tcmu.lock' file,
if it succeeds only then daemon is allowed to run, if there is already a
lock on lock-file (taken by some other instance of tcmu-runner) we will exit.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
@pkalever
Copy link
Contributor Author

Updated with the suggested changes. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants