Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Attempting to copy non-existent file with pfcp #28

Closed
gregorygeller opened this issue Jun 20, 2016 · 11 comments
Closed

BUG: Attempting to copy non-existent file with pfcp #28

gregorygeller opened this issue Jun 20, 2016 · 11 comments
Assignees
Labels

Comments

@gregorygeller
Copy link

gregorygeller commented Jun 20, 2016

When attempting to copy a non-existent file using pftool (pfcp), it should generate an error.

It does, but $? is still set to 0.

-bash-4.1$ pfcp a b
"/users/gellergr/src" pfcp a b
get_base_path  Failed to stat path a
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 11519 on
node cc-fta02 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
Launched /opt/campaign/pftool/installed/bin/pfcp from host cc-fta03.localdomain at: Mon Jun 20 16:22:25 MDT 2016
ERROR: /opt/campaign/pftool/installed/bin/pfcp failed
Job finished at: Mon Jun 20 16:22:27 MDT 2016
-bash-4.1$ echo $?
0
@brettkettering
Copy link
Contributor

Well, at least it exits with an error message. The user won't be fooled, but it would be nice to have the exit code updated so that a script can test for error. Not necessary for Secure Campaign, I would say, but would be good to add in the near future.

@thewacokid
Copy link
Contributor

This is due to the wrapper script that actually calls pftool. We could have it pass back the error pretty easily I believe.

On Jun 20, 2016, at 4:31 PM, Brett Kettering notifications@github.com wrote:

Well, at least it exits with an error message. The user won't be fooled, but it would be nice to have the exit code updated so that a script can test for error. Not necessary for Secure Campaign, I would say, but would be good to add in the near future.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #28 (comment), or mute the thread https://github.com/notifications/unsubscribe/ADRbVdXQTRVwRskb4_oDGaINpOLgmNjfks5qNxTHgaJpZM4I6MKG.

@garygrider
Copy link

What is a non-existent file?

From: Gregory Geller [mailto:notifications@github.com]
Sent: Monday, June 20, 2016 4:23 PM
To: pftool/pftool
Subject: [pftool/pftool] BUG: Attempting to copy non-existent file with pfcp (#28)

When attempting to copy a non-existent file using pftool (pfcp), it should generate an error.

It does, but $? is still set to 0.

-bash-4.1$ pfcp a b
"/users/gellergr/src" pfcp a b

get_base_path -- Failed to stat path a

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on

exactly when Open MPI kills them.


mpirun has exited due to process rank 0 with PID 11519 on
node cc-fta02 exiting improperly. There are two reasons this could occur:

  1.  this process did not call "init" before exiting, but others in
    

    the job did. This can cause a job to hang indefinitely while it waits
    for all processes to call "init". By rule, if one process calls "init",
    then ALL processes must call "init" prior to termination.

  2.  this process called "init", but exited without calling "finalize".
    

    By rule, all processes that call "init" MUST call "finalize" prior to
    exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be

terminated by signals sent by mpirun (as reported here).

Launched /opt/campaign/pftool/installed/bin/pfcp from host cc-fta03.localdomain at: Mon Jun 20 16:22:25 MDT 2016
ERROR: /opt/campaign/pftool/installed/bin/pfcp failed
Job finished at: Mon Jun 20 16:22:27 MDT 2016
-bash-4.1$ echo $?
0


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/28, or mute the threadhttps://github.com/notifications/unsubscribe/ALNxmYbsEfwmQWrLdKEBmusdpUgTVCOJks5qNxK9gaJpZM4I6MKG.

@brettkettering
Copy link
Contributor

There's a test to use pfcp to copy a file that isn't there. Say someone mistypes a filename, for example. What does pfcp do when you tell it to copy a file that is not present?

@jti-lanl
Copy link
Contributor

Come to think of it, I actually commented out that test, in a private version, because ... what if you don't have fuse mounted?

[Apologies to Greg, since I suggested he report this. I'd suggest he throw things up here, so we have a record. I didn't make the connection with the unmounted-fuse thing until just now.]

@gregorygeller
Copy link
Author

So, if we don't want to set $? when attempting to copy a non-existent file, then what do you suggest I do to automate a test for proper behavior? Look for "Failed" or "MPI_ABORT" in the output?

@brettkettering
Copy link
Contributor

I don't understand Jeff's comment. I think this is a bug and specifically for the reason you mention. A script needs some way to detect if a command fails.

@jti-lanl
Copy link
Contributor

We should fix pfcp like Dave suggested. Users will always have marfs appear to be mounted somehow. Us squirrels can just deal.

@brettkettering
Copy link
Contributor

Jeff and I had a chat conversation about this. The pf* Python scripts were written with the assumption that file systems would always be mounted. This is not the case on the Batch FTAs in MarFS. the pf* scripts need to be changed so that they do not assume the file system is mounted. They need to look at the output of PFTool and return an error if PFTool returns an error. In the case of a FNF (file not found), the pf* scripts need to return some intelligible output to the user as well as set the return code to an error that a script can catch and process.

@cadejager
Copy link
Contributor

Sounds good. I will look into it this morning.

@cadejager
Copy link
Contributor

This bug also exists with pfcm and pfls. I have fixed both of them with my latest commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants