# Use Data from the OSDF to Run Jobs

TODO: 
- overview
- objectives
- come up with a short list of station ids to run (use for first module?)
- need generic lightweight python container
- test jobs

## Scenario: a List of Jobs

Suppose we wanted to run our analysis on each station. How many stations are there, again? 

In [None]:
wc -l ghcnd-stations.txt

That's a long list of tasks to run. 

Luckily, this workload profile - a list of jobs - is a perfect fit for execution via an HTCondor Access Point, on a system 
like the Open Science Pool. All we have to do to define this workload is to make a list and a job template. 

We could use the whole ghcnd-stations.txt file as our list, but for simplicity, we'll cut the full list down to about 20 stations. 

In [None]:
grep INTL ghcnd-stations.txt > station_list.txt

## Job Template

The following information needs to be communicated in the HTCondor job file: 

- **Software environment** 
    - blurb
    - in our example, we will use an existing container (that also happens to be available via the OSDF)
- **What the job should run**
    - blurb
    - For our example, the executable is the `example.py` script
- **Inputs (both scripts and data)**
    - blurb
    - We need to include both the helper scripts for our code (in the `scripts` directory) and the Pelican URL to our data file. 
- **Recording information about the job**
    - As with many other schedulers, HTCondor provides options for recording the standard output and error of a running job
    - Note below that these files are organized into their own directory. 
- **Resource needs**
    - Default resources that should be set for every HTCondor job list include cores, memory (RAM) and local disk on the execution point. 
    - our example

Each of these items is reflected in the example submit file: 

In [None]:
cat example.sub

Every line of the submit file (except the last one) should be thought of as the template for one job. At any point 
in this template where there is data that will be different for each job, we've placed a variable as a placeholder -- 
the variable format is `$(variable_name)`. 

The last line (`queue station_id from station_list.txt`) is what transforms this example into a job list -- HTCondor 
will iterate through the items in our list and create a job for each one. 

## Submitting Jobs

We can now submit our list of jobs: 

In [None]:
condor_submit example.sub

Jobs can be monitored using `condor_q`: 

In [None]:
condor_q