Skip to content

Latest commit

 

History

History
191 lines (142 loc) · 7.08 KB

1.Baby_steps.md

File metadata and controls

191 lines (142 loc) · 7.08 KB
layout title author
default
Take Baby Steps
Bob Freeman

Previous: README


Take baby steps: Run Your Script Code from the Terminal

Objectives

  • Learn how to run your code from the terminal, both interactively & in batch
  • Learn the scheduler job info command 'bjobs' & its options

In this section, we're going to assume It is important for you to be able to learn some basics of using the terminal, as one can run batch jobs most effectively & flexibly from the terminal.

Run Interactively from the Terminal

When starting out using a compute cluster, or often called high-performance computing (HPC) or high-throughput computing (HTC) systems, running applications interactively most effectively emulates how we work on our local laptop and desktop systems. Through GUI interfaces, we can

LSF, our scheduler, follows most resource manager conventions

  1. Open your remote desktop program (NoMachine) and connect to the HSBGrid compute cluster.

  2. After the session starts, open the Gnome terminal program (Applications > System Tools > Terminal).

  3. By default, most applications are not 'loaded' into our session (in the terminal). Since Spyder is bundled with Python, we'll 'load' in Spyder through the appropriate module load:

module load python/3.8.5
  1. With Python, Spyder, and all the data science environment tools in our $PATH, we can now run Spyder. To run interactive sessions from a terminal, we can follow the posted Interactive Job instructions, (Working with Custom Submission Scripts > Interactive) using the MATLAB example, to run Spyder:
cd $HOME/git/rcs__transitioning_to_batch

bsub -q short_int -Is -W 60 -R "rusage[mem=4000]" -M 4000 -hl spyder

If successful, after a few moments Spyder should open. Much has happened here:

This is all well and good, but we've only told Spyder to open, and not necessarily to open the code file. To do that, include the path to your code file by dragging the file icon into the terminal window after you've typed in the spyder job submission command but not yet pressed Enter. Your final result should look like:

bsub -q short_int -Is \
     -W 60 \
     -R "rusage[mem=4000]" -M 4000 -hl \
     spyder code/process_data.py
  1. Although we've loaded the code file interactively, our goal is to run it. So let's make those adjustments. Instead of using Spyder to view and run our code, let's simply run it using the Python interpreter by substituting in the python command:
bsub -q short_int -Is \
     -W 60 \
     -R "rusage[mem=4000]" -M 4000 -hl \
     python code/process_data.py

If all goes well, you should see in your terminal window the code running, albeit slowly. That's fine. Let's press Ctrl-C to abort the code.

Run Your Python Code in Batch

This is actually going to be much easier than it sounds, but there are a number of 'friction points' that will appear. Those will become apparent.

  1. In your terminal session, switch to running your code in batch by switching to a batch queue (long or short) and removing the 'interactive' flag (Is). The short queue is most appropriate, as this code will likely take minutes to run (~8.5 min on my 2020 iMac).
bsub -q short \
     -We 15 \
     -R "rusage[mem=4000]" -M 4000 -hl \
     python code/process_data.py

Since we're running now in batch, our scheduler will respond with our job ID, and we cross our fingers that it is off an running. We have every reason to expect it to run OK, as it ran OK when doing it interactively.

Get the Job Info with the bsub Command

Well, as you may have noticed, you're not quite sure what is going on. But we do have a few options:

  • Look at the directories for any file changes
  • Ask the scheduler if it's still running
  • Look at the job run details

Since we know we're writing out the running sum every 10 lines, our output file should be present and growning, which we can see:

$ ls -alt data/     # sort this reverse chronogically
total 55776
-rw-rw----. 1 rfreeman dor   231930 Aug 11 11:36 outfile1.txt
drwxrwx---. 3 rfreeman dor     4096 Aug 11 11:28 .
drwxrwx---. 6 rfreeman dor     4096 Aug 11 11:20 ..
-r--r-----. 1 rfreeman dor 11321676 Aug 11 11:20 file4.txt
-r--r-----. 1 rfreeman dor 11321569 Aug 11 11:20 file3.txt
-r--r-----. 1 rfreeman dor 11320369 Aug 11 11:20 file2.txt
-r--r-----. 1 rfreeman dor 11322055 Aug 11 11:20 file1.txt
-r--r-----. 1 rfreeman dor 11320368 Aug 11 11:20 file1.bad
dr-xr-x---. 2 rfreeman dor     4096 Aug 11 11:20 bak

Yup, it's there. But we can't see what's going on. Let's park that for a moment.

Our next option is asking the scheduler for our job status and info, which is easy to get:

$ bjobs
JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
1607942    rfreema RUN   short      rhrcscli1   rhrcsnod23  *s_data.py Aug 11 11:35

We can get a detailed report on our job by using the -l (long) option:

$ bjobs -l 1607942

Job <1607942>, User <rfreeman>, Project <default>, Status <RUN>, Queue <short>,
                     Command <python code/process_data.py>, Share group charged
                      </rfreeman>, Esub <hl mem>
Wed Aug 11 11:35:10: Submitted from host <rhrcscli1>, CWD <$HOME/git/rmf__trans
                     itioning_to_batch>, Requested Resources <rusage[mem=4000]>
                     , memory/swap limit enforced per-job/per-host;
Wed Aug 11 11:35:11: Started 1 Task(s) on Host(s) <rhrcsnod23>, Allocated 1 Slo
                     t(s) on Host(s) <rhrcsnod23>, Execution Home </export/home
                     /dor/rfreeman>, Execution CWD </export/home/dor/rfreeman/g
                     it/rmf__transitioning_to_batch>;
Wed Aug 11 11:35:46: Resource usage collected.
                     IDLE_FACTOR(cputime/runtime):   0.00
                     MEM: 5 Mbytes;  SWAP: 0 Mbytes;  NTHREAD: 5
                     PGID: 182729;  PIDs: 182729 182730 182732 183120 

 RUNLIMIT                
 4320.0 min

 MEMLIMIT
    3.9 G 
 ESTIMATED_RUNTIME:
 15.0 min based on user-provided runtime estimation.

 MEMORY USAGE:
 MAX MEM: 5 Mbytes;  AVG MEM: 4 Mbytes

 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -  
 loadStop    -     -     -     -       -     -    -     -     -      -      -  

 RESOURCE REQUIREMENT DETAILS:
 Combined: select[type == local] order[r15s:pg] rusage[mem=4000.00] span[hosts=
                     1] affinity[core(1)*1]
 Effective: select[type == local] order[r15s:pg] rusage[mem=4000.00] span[hosts
                     =1] affinity[core(1)*1] 

Indeed, we can see our job is running, and it's pretty active, as the IDLE_FACTOR is greater than 0.5, indicating 50% efficiency. Aside, we can see our MAX MEM is XXX; and so we can downgrade our RAM ask, in order to give RAM for others who might need it.


Next: Set Up Your Script's Inputs and Outputs