Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devel-branch pilot hangs while running examples (locally) in macOS #1298

Closed
itomaldonado opened this issue Mar 27, 2017 · 16 comments
Closed
Assignees
Labels
Milestone

Comments

@itomaldonado
Copy link
Contributor

I can't run pilot locally in macOS, as they hang during wait for X unit(s). See below for the steps I took to clone the stack and details of my set-up.

I left it running for 3+ hours with no luck...

screen shot 2017-03-27 at 11 39 48 am

itoMac:~/git/radical.pilot uname -a
Darwin itoMac.local 16.4.0 Darwin Kernel Version 16.4.0: Thu Dec 22 22:53:21 PST 2016; root:xnu-3789.41.3~3/RELEASE_X86_64 x86_64

itoMac:~/git/radical.pilot ulimit -n
2048

itoMac:~/git/radical.pilot ./radical-stack-clone -v ./venv2

tag @ branch requested: @devel
ve location  requested: ./venv2

create virtualenv ./venv2
source virtualenv ./venv2

mod                   repo                                                     branch                         commit                         tag
radical.utils         https://github.com/radical-cybertools/radical.utils.git  devel                                                         
saga-python           https://github.com/radical-cybertools/saga-python.git    devel                                                         
radical.pilot         https://github.com/radical-cybertools/radical.pilot.git  devel                                                         
radical.analytics     https://github.com/radical-cybertools/radical.analytics.git  devel                                                         

installed:
python            : 2.7.12
virtualenv        : /Users/itomaldonado/git/radical.pilot/venv2
radical.utils     : v0.45-2-g82050c5@devel
saga-python       : split-11-g807713c6@devel
radical.pilot     : split-29-g66b8b081@devel
radical.analytics : v0.1-137-g05f47f3@devel

itoMac:~/git/radical.pilot source ./venv2/bin/activate
(venv2) itoMac:~/git/radical.pilot env | grep RAD
(venv2) itoMac:~/git/radical.pilot python ./examples/00_getting_started.py 
new session: [rp.session.itoMac.local.itomaldonado.017252.0003]                \
database   : [mongodb://rp:rp@ds015335.mlab.com:15335/rp]                     ok
create pilot manager                                                          ok
submit 1 pilot(s)
        .                                                                     ok
create unit manager                                                           ok
add 1 pilot(s)                                                                ok
submit 128 unit(s)
        ........................................................................
        ........................................................              ok
wait for 128 unit(s)
        /
@andre-merzky
Copy link
Member

Thanks Manuel! Would you mind attaching a tarball of the pilot sandbox (if it exists) or a tarball of the session dir in your pwd (if the pilot sandbox does not exist)?

@andre-merzky andre-merzky self-assigned this Mar 27, 2017
@andre-merzky andre-merzky added this to the 0.46 milestone Mar 27, 2017
@itomaldonado
Copy link
Contributor Author

here is the session:
rp.session.itoMac.local.itomaldonado.017252.0003.tar.gz

Not sure where the sandbox is...

@itomaldonado
Copy link
Contributor Author

Oh, never mind, I for it at ~/radical.pilot.sandbox...

radical.pilot.sandbox.tar.gz

had to remove old content so I only kept the last two runs (including the one above)...

@andre-merzky
Copy link
Member

Sorry, I only looked at this now. There is something seriously wrong with the bootstrapper execution I'm afraid... :/

Can you please change #!/bin/bash -l in the first line of the bootstrapper in src/radical/pilot/agent/bootstrap_1.sh with #!/bin/bash -l -x, redeploy, and run again? The bootstrapper's stdout/stderr will be longer in the pilot sandbox - please attach them again.

@ibethune
Copy link
Contributor

@andre-merzky to investigate locally on his Mac

@ibethune
Copy link
Contributor

@itomaldonado I am going to suggest we defer this to future release, as (AFAIK) it's only affecting your machine. If it pops up in other configurations during testing, then we'll look at it. Let me know if you disagree!

@ibethune ibethune modified the milestones: Future Release, 0.46 May 16, 2017
@ibethune
Copy link
Contributor

From @andre-merzky

newer versions of the default Python deployment on MacOS don't support
select.poll() anymore

Needs a more compatible fix in RU.

@andre-merzky
Copy link
Member

This now depends on PR radical-cybertools/radical.utils/pull/106

@ibethune
Copy link
Contributor

@itomaldonado can you please test on your macOS config with RP,SG@devel, RU@feature/poll ? Alternatively, @andre-merzky do we know of anyone else who has a config where this could be recreated?

@andre-merzky
Copy link
Member

I am afraid this is not yet fully fixed on MacOS, or so it seems. I would still appreciate feedback, to understand if that is dependent on my OS version / install / ... - but I see spurious socket-closing and thus random termination... :(

@itomaldonado
Copy link
Contributor Author

I actually didn't get a notification for this, I will test again, but I do fear what @andre-merzky says.

@andre-merzky
Copy link
Member

An attempt to get RP stable again is in the RP branch fix/macos and RU branch feature/poll. It would be great is somebody could give this combo a try. RS from devel should be fine. Thanks!

@ibethune
Copy link
Contributor

These branches are running stably for me (although my python stack also worked with the previous implementation). I also set up a stack based on my system Python (normally I'm using a macports-based stack), and it was also working OK. I opened the PR #1356

@andre-merzky
Copy link
Member

Thanks for testing! It also works for brew-based and system-based Python on my Mac - I'll be merging this PR then, as it at least seems to not make matters worse :P

@ibethune
Copy link
Contributor

ibethune commented Jun 5, 2017

Yes, please merge then this can be closed.

@itomaldonado
Copy link
Contributor Author

I know this is late but I will write it here for future reference.

Another issue I has was that I had aliases and functions that modified the default behavior of certain built-in utilities (e.g. ls, cd, etc...). The problem specifically for me was that cd is used as part of a sub-shell command that does screen-scraping and my custom command messed with how this screen-scraping worked. The current solution is to unset all functions and aliases of the built-in commands we need. For example: unset -f cd .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants