Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParTask bug: conda activate does not work for the task.run #38

Closed
Michaelvll opened this issue Nov 18, 2021 · 4 comments
Closed

ParTask bug: conda activate does not work for the task.run #38

Michaelvll opened this issue Nov 18, 2021 · 4 comments

Comments

@Michaelvll
Copy link
Collaborator

Michaelvll commented Nov 18, 2021

Code for reproduce:

import sky
from sky import clouds

with sky.Dag() as dag:
    # The working directory contains all code and will be synced to remote.
    workdir = '~/Downloads/tpu'

    # The setup command.  Will be run under the working directory.
    setup = 'pip install --upgrade pip && \
        conda init bash && \
        conda activate resnet || \
          (conda create -n resnet python=3.7 -y && \
           conda activate resnet && \
           pip install tensorflow==2.4.0 pyyaml && \
           cd models && pip install -e .)'

    # The command to run.  Will be run under the working directory.
    run = 'conda activate resnet'

    conda1 = sky.Task(
        'activate_1',
        workdir=workdir,
        setup=setup,
        run=run,
    )
    conda1.set_resources({
        sky.Resources(clouds.AWS(), accelerators='V100'),
    })

    # Run the training and tensorboard in parallel.
    task = sky.ParTask([conda1])
    total = sky.Resources(clouds.AWS(), accelerators={'V100': 1})
    task.set_resources(total)

dag = sky.Optimizer.optimize(dag, minimize=sky.Optimizer.COST)
# sky.execute(dag, dryrun=True)
sky.execute(dag)

Error:

(pid=20548) 
(pid=20548) CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
(pid=20548) To initialize your shell, run
(pid=20548) 
(pid=20548)     $ conda init <SHELL_NAME>
(pid=20548) 
(pid=20548) Currently supported shells are:
(pid=20548)   - bash
(pid=20548)   - fish
(pid=20548)   - tcsh
(pid=20548)   - xonsh
(pid=20548)   - zsh
(pid=20548)   - powershell
(pid=20548) 
(pid=20548) See 'conda init --help' for more information and options.
(pid=20548) 
(pid=20548) IMPORTANT: You may need to close and restart your shell after running 'conda init'.
(pid=20548) 
(pid=20548) 
Traceback (most recent call last):
  File "/tmp/sky_app_lee66joq", line 15, in <module>
    ray.get(futures)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 89, in wrapper
    return func(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/ray/worker.py", line 1621, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(CalledProcessError): ray::activate_1 (pid=20548, ip=172.31.23.63)
  File "/tmp/sky_app_lee66joq", line 11, in <lambda>
    shell=True, check=True)) \
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'cd /tmp/workdir && conda activate resnet' returned non-zero exit status 1.
@Michaelvll Michaelvll changed the title ParTask bug: conda activate does not work for the task ParTask bug: conda activate does not work for the task.run Nov 18, 2021
@concretevitamin
Copy link
Collaborator

Good catch. Try adding /bin/bash -c in front of the user command: https://github.com/concretevitamin/sky-experiments/blob/master/prototype/sky/backends/cloud_vm_ray_backend.py#L407-L408. If it passes, gotta make sure run_smoke_tests.sh doesn't break.

@Michaelvll
Copy link
Collaborator Author

Good catch. Try adding /bin/bash -c in front of the user command: https://github.com/concretevitamin/sky-experiments/blob/master/prototype/sky/backends/cloud_vm_ray_backend.py#L407-L408. If it passes, gotta make sure run_smoke_tests.sh doesn't break.

Adding /bin/bash -c does not work. I tried to add source ~/.bashrc as well, but the problem still exists. I am looking into it.

@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Nov 19, 2021

Seems like activating conda in python subprocess could cause some problems. I am not sure the exact reason for this, but ray does have some code for activating conda here, that adds some additional setup code before conda activate.

One workaround is to add . $(conda info --base)/etc/profile.d/conda.sh before the user command if the conda exist.

@concretevitamin
Copy link
Collaborator

Fixed by #43.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants