Skip to content
This repository has been archived by the owner on May 30, 2024. It is now read-only.

Defining the "common workflow" for our lesson #5

Open
1 of 2 tasks
ocaisa opened this issue Jul 20, 2022 · 3 comments
Open
1 of 2 tasks

Defining the "common workflow" for our lesson #5

ocaisa opened this issue Jul 20, 2022 · 3 comments

Comments

@ocaisa
Copy link
Member

ocaisa commented Jul 20, 2022

The current example is a set of books that are downloaded. How do we define our raw data? We effectively don't have any, what we are doing is taking measurements with amdahl which will become our raw data.

In 01-introduction.md we start off by creating a bash script describing the manual workflow. We will somehow need to replicate this. This will require:

  • Generating a set of data (which will require parsing of amdahl output, or perhaps adding a --terse option to amdahl, see Add --terse option to amdahl to make it easier to parse the output  #6). Redirecting the amdahl output to a file could work...or indeed using the output files from SLURM itself.
  • Plotting the result (both graphically...and perhaps in terminal)
@ocaisa
Copy link
Member Author

ocaisa commented Jul 21, 2022

The "common workflow" identified in 01-introduction.md is

  1. Read a data file.
  2. Perform an analysis on this data file.
  3. Write the analysis results to a new file.
  4. Plot a graph of the analysis results.
  5. Save the graph as an image, so we can put it in a paper.
  6. Make a summary table of the analyses, which requires aggregation of all previous results.

Can we cover the same points? I think the last point is the hardest (and unnecessary for us). Order could be changed though to

  1. Create data files (Run slurm job using job template we provide, store output in well-defined filename)
  2. Perform an analysis on the data files. (extract our timings convert into speedup)
  3. Write the analysis results to a new file.
  4. Plot a graph of the analysis results (could consider doing this locally or remotely).
  5. Save the graph as an image.
  6. Pull the results (in this case an image) from the cluster and review it.

@reid-a
Copy link
Member

reid-a commented Aug 1, 2022

This could build off what was done in the HPC Intro lesson -- call back to that lesson, show a job script, and look at the output of the job script. This could live in first episode. Induces HPC Intro as a pretty hard pre-requisite for this lesson.

@ocaisa ocaisa changed the title What is our raw data? Defining the "common workflow" for our lesson Aug 4, 2022
@bkmgit
Copy link
Contributor

bkmgit commented Aug 10, 2022

The current format with the job submission script at the end seems ok. However, one may wish to enable attendees to practice using SLURM, in which case one could introduce the job submission script at the beginning. The lesson seems independent of HPC intro, but does allow practice using a scheduler.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants