Skip to content

Getting access to BioHPC Linux nodes

Lars Vilhuber edited this page Dec 1, 2023 · 15 revisions

You may be asked to compute on BioHPC's ECCO Linux nodes (f.i., for very long-running jobs, or for very large memory).

Request an account

Go to the BioHPC account request page, and create an account on the BioHPC cluster.

Then contact BioHPC support, requesting to join the ECCO group and lv39 (Lars') "lab" (ecco_lv39).

Reserve a node

  • Go to BioHPC Reservations page, choose "Restricted", and reserve a node:
    • cbsuecco02: up to 7 days
    • all others: up to 3 days
    • in both cases, renewable
  • Then go to 'My Reservations' and share the reservation with Lars (lv39) and others, if necessary.

Access a node

See Getting Started Guide and Remote Access. SSH is the best path (if you don't need graphical applications).

Note that, for off-campus access, you will need to use Cornell VPN. Instructions can be found here.

Setting up the right permissions

Run this ONCE the first time you ever access BioHPC:

echo "umask 007" >> $HOME/.bash_profile

You can check if its there by running this command:

grep umask $HOME/.bash*

Fixing permission issues

Sometimes, permissions get out of sync and prevent your collaborators from accessing your files. Do this to fix it.

cd /to/the/right/location
chmod -R g+rwX problematic_directory

Notes

Shared directory

Your default home directory (/home/NETID) is not shared among group users (same as on CISER). Use /home/ecco_lv39 instead.

Using Stata

  • To use Stata version 16, execute one of the following commands before running stata-mp:

    /usr/local/stata16/stata-mp or export PATH=/usr/local/stata16:$PATH

  • Ensure that the tmp directory being used is running on the BioHPC /workdir space, by running the following commands before executing your program(s):

    export STATATMP=/workdir/netid/tmp
    mkdir $STATATMP
    
  • Don't run Stata interactively via SSH. Instead, execute the program by the following: stata-mp -b do master.do

Utilize tmux

Cheatsheet: https://gist.github.com/MohamedAlaa/2961058

  1. Login via SSH
  2. Launch tmux with a session name that makes sense, e.g. tmux new -s AEAREP-xxxx
  3. Launch your Matlab, Stata, etc job
  4. Disconnect from tmux: ctrl-b d. You don't need to press this both Keyboard shortcut at a time. First press "Ctrl+b" and then press "d".
  5. Log out of SSH

Next time:

  1. Login via SSH
  2. Reconnect to your tmux session: tmux a -t AEAREP-xxxx
  3. If you forgot what session, tmux ls

Sharing Tmux session

see https://www.howtoforge.com/sharing-terminal-sessions-with-tmux-and-screen#sharing-between-two-different-accounts-with-tmux

This must be done when launching tmux:

tmux -S /tmp/shareds new -s AEAREP-xxxx
chgrp ecco_lv39 /tmp/shareds

The second user can now connect to the first user's tmux screen by typing

tmux -S /tmp/shareds attach -t AEAREP-xxxx

Note: When logged into the compute node, you can call ps ux to see all your running jobs.

Saving Tmux output

See https://unix.stackexchange.com/questions/26548/write-all-tmux-scrollback-to-a-file

Using Docker

The BioHPC docker command is docker1, see here for more details. All files that are shared via the -v option must reside on /workdir/NETID and cannot be shared across nodes. To get the files to /workdir/NETID, the following commands can be used, assuming that your files are in /home/ecco_lv39/Workspace/aearep-$AEAREP:

  • Sync to workdir:
AEAREP=12345
[[ -d /workdir/$(id -nu) ]] || mkdir /workdir/$(id -nu)
rsync -auv /home/ecco_lv39/Workspace/aearep-$AEAREP/ /workdir/$(id -nu)/aearep-$AEAREP/
  • Sync back to shared drive (once computations are done, or at any time
AEAREP=12345
rsync -auv /workdir/$(id -nu)/aearep-$AEAREP/ /home/ecco_lv39/Workspace/aearep-$AEAREP/ 

Using Conda for Python package management

See Python tips

Transfer of Data to BioHPC (possibly OBSOLETE)

  • The BioHPC instructions for using FileZilla are great for moving data from your personal workspace to BioHPC.
  • In the event that you need to transfer data from CISER to BioHPC (CISER does not have FileZilla installed):
  1. First, open up a bash shell in the directory that holds the folder which you want to transfer to BioHPC.
  2. SFTP into BioHPC sftp netid@cbsuecco02.biohpc.cornell.edu. Your password is the same that you use to login to the cbsuecco02 node (or the login node).
  3. cd into the desired directory on the BioHPC node.
  4. You need to first create the directory on BioHPC: mkdir data.
  5. Use the put command to place the desired folder (i.e. "data") on BioHPC: put -r data/.
  6. If you run into an error along the lines of Can't find request for ID 31425, try zipping up the files and just transferring the zip file. Once transferred, you can unzip on BioHPC (if you run into issues with the "unzip" command, try using 7z: i.e. /programs/bin/util/7z x (ZIPFILE))
  7. Give Lars access to my "workdir/mjd443" with chmod -R a+rwX /workdir/netid (this command is not permanent and should be run again after any edits to the directory).
Clone this wiki locally