NYU_cluster

Greene HPC cluster

You can find NYU's documentation here. Click Here to request an account.

Accessing the cluster

If you are connected to the NYU network, you can ssh the cluster directly:

ssh <NYU_NetID>@greene.hpc.nyu.edu

Otherwise, you need to go through the special gateway server.

ssh <NYU_NetID>@gw.hpc.nyu.edu
ssh <NYU_NetID>@greene.hpc.nyu.edu

You can set up a ssh tunnel by following this documentation.

Set up conda

You can download and install Miniconda with:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

When it asks whether to put Miniconda to your .bashrc, say yes.

Do you wish the installer to prepend the Miniconda install location
to PATH in your /home/<netid>/.bashrc ? [yes|no]

Then, source .bashrc and Conda should be available. You can try conda list.

You can learn how to manage Conda environments here. On Prince cluster, I noticed that conda activate myenv needs to be replace by source activate myenv in sbatch files. I need to check if it is still the case for Greene.

Launch jobs

Submitting jobs with sbatch
If you only want to access a node: srun --pty /bin/bash or srun --gres gpu:1 --pty /bin/bash.
You can see your jobs with watch squeue -u <netid>
Dask can be a very usefull toof if you want to run a lot of similar jobs. Here is an example:

from dask_jobqueue import SLURMCluster
from dask.distributed import Client
import itertools
from mycode import main_exp

learning_rate = [1e-3, 1e-4]
batch_size = [10, 100, 1000]
n_layers = [1, 2, 3, 4]

parallel_args = []
for (bs, lr, layer) in itertools.product(learning_rate, batch_size, n_layers):
        args = {"lr": lr, "bs":bs, "layer":layer}
        parallel_args.append(args)

env_extra = ['source activate myenv']

if __name__=='__main__':
        cluster = SLURMCluster(job_extra=['--cpus-per-task=1', '--ntasks-per-node=1'],
                        cores=1, processes=1,
                        memory='16GB',
                        walltime='96:00:00',
                        interface='ib0',
                        log_directory='log_dask',
                        local_directory='log_dask')


        n_workers = 4
        cluster.scale(n_workers)
        client = Client(cluster)
        print(client.cluster)

        results = [client.submit(main_exp, args) for args in parallel_args]
        print(results)
        print(client.gather(results))

Run solo12 demo

Link to solo12 Demo

All our open source software are licensed against the BSD 3-clause license.

Home Page
Contribute to the wiki
Logo
Introduction
- How to get started
Our Codebase
- BiconMP - Non linear MPC framework for Multiped Robots
- Kino-dynamic Trajectory Optimization for Multiped Robots
Build Our Codebase
- Build tools introduction
- Build chain tutorials
- Dependencies
- Building our software stack
- Troubleshooting
Robot Tutorials
- Programming
  - Dynamic Graph Head
- ODRI Robots
  - MicroDrivers
    - flashing the microdriver
  - Solo12
    - Solo12 Simulation
    - Solo12 Hardware
  - Bolt
    - Bolt Hardware
  - NYUFinger
    - NYU Finger Setup
    - NYU Finger Tutorial
- Kuka
Debugging
- Debugging python bindings over c++

Provide feedback

Saved searches