# Convert GitHub Actions `.yaml` to Parallel Works ACTIVATE `.yaml`

ACTIVATE `.yaml`-based workflows have the same "feel" as GitHub Actions so there are many similarities. However, they are not the same. GitHub Actions has a much wider feature set and `.yaml`s that run on GitHub runners have the advantage of implicit context with many additional environment variables.

This notebook is a first draft, interactive tool for ingesting a GitHub Actions `.yaml`, clearing out some things that Parallel Works ACTIVATE definitely does not support, adding some boilerplate "header" information that is likely to be useful for ACTIVATE users, and keeping key workflow steps that ACTIVATE does support (namely the `jobs: steps: run:` framework and any explicit `env:` variables).

This notebook is a work in progress and not a complete solution hence it is a living document and meant for experimentation. Any `.yaml` files generated here will probably require some kind of manual adjustment to make them actually runnable on ACTIVATE. However, the goal is for the bulk of the work of converting from GitHub `.yaml` to ACTIVATE `.yaml` can be achieved here.

## Parameters

Input and output files, dependencies, etc.

In [1]:
# ~ does not always work, so use full path
input_file="/home/jovyan/work/ubuntu-ci-x86_64-gnu.yaml"
output_file=input_file+".2pw.yaml"

In [2]:
# Dependencies may be already installed if running this notebook
# within a JupyterLab instance.
#!pip install pyyaml

In [3]:
import yaml
from yaml.resolver import BaseResolver

In [4]:
#==========================================================
# Additional set up so that | is included in the output.
#==========================================================
# According to flyx on 04/13/2021 at https://stackoverflow.com/questions/67080308/how-do-i-add-a-pipe-the-vertical-bar-into-a-yaml-file-from-python
# to get the |, you need to create a new "flag" string (the AsLiteral)
# which is then associated with the | by represent_literal_str.
# All blocks that you want a | in need to be passed to AsLiteral.
# Content in this cell is distributed under CC BY-SA 4.0, more
# information at: https://creativecommons.org/licenses/by-sa/4.0/

# Define a custom string subclass.
# This is a marker to tell the representer which objects to handle.
class AsLiteral(str):
    pass

# Define a custom representer function for the new class.
# It tells the dumper to represent this type of object as a literal scalar.
def represent_literal_str(dumper, data):
    return dumper.represent_scalar(BaseResolver.DEFAULT_SCALAR_TAG, data, style="|")

# Register the representer function with the Dumper.
# This makes it available for both the standard `yaml.dump` and `yaml.safe_dump`.
yaml.add_representer(AsLiteral, represent_literal_str)

## Load input file

In [5]:
# Load the file you want to convert
with open(input_file) as stream:
    try:
        data = yaml.safe_load(stream)
    except yaml.YAMLError as exc:
        print(exc)

In [6]:
data.keys()

dict_keys(['name', True, 'concurrency', 'defaults', 'env', 'jobs'])

## Drop things ACTIVATE `.yaml` does not support and keep `env:` and `jobs:`

Some key things that ACTIVATE `.yaml` does support:
+ `env:`
+ `jobs:`
+ `ssh:` - Unique to ACTIVATE `.yaml` for selecting a resource to run on

ACTIVATE `.yaml` has an `on:` section. When `pyyaml` loads the `.yaml`, since `on:` is empty, it gives it the `True` dictionary entry. Since ACTIVATE `on:` is different from GitHub `on:`, we just remove the original GitHub `on:` section and replace it with a section relevant to ACTIVATE, below. ACTIVATE's `on:` section is used to pass input parameters to the workflow. It does not neccessarily need to be at the end of the `.yaml` - it can be at the beginning or end. Most examples built into ACTIVATE have `on:` at the end of the file so users can focus on the `steps:` of the workflow so that convention is followed here.

Another important automated change is that `runs-on:` is removed and replaced with `ssh:` for use with the automated detection of connected resources.

In [7]:
# Get rid of som GitHub Actions things that ACTIVATE does not use
data.pop('name')
data.pop(True)
data.pop('concurrency')
data.pop('defaults')

# Print remaining keys
data.keys()

dict_keys(['env', 'jobs'])

In [9]:
# Change runs-on: to ssh:
# Get list of jobs, header_items first, before iterating
# because otherwise Python throws an error when size
# of the dictionary changes during the iteration.

jobs_list = list(data['jobs'].keys())
for job in jobs_list:

    header_list = list(data['jobs'][job].keys())
    for header_item in header_list:
        
        # Deal with runs-on:
        if header_item == "runs-on":
            # Create a new entry for ssh:
            data['jobs'][job]["ssh"] = data['jobs'][job]["runs-on"]
            # Populate ssh: parameters (need to add ACTIVATE inputs, see below)
            data['jobs'][job]["ssh"] = {"remoteHost": "${{inputs.resource.ip}}"}
            # Get rid of runs-on:
            del data['jobs'][job]["runs-on"]
        
        # Each step needs to be reprocessed with |
        if header_item == "steps":
            
            steps_list = list(data['jobs'][job]["steps"])
            steps_to_delete = []
            for ss, step in enumerate(steps_list):
                if step['name'] == "checkout":
                    # Special case likely with uses: actions/checkout
                    # We could just delete it...
                    #steps_to_delete.append(ss)
                    # ...but instead of that, let's reset the content
                    # to match some workflow inputs
                    data['jobs'][job]["steps"][ss]['run'] = AsLiteral(
                        "echo Cloning repo to $PWD\ngit clone https://github.com/${{ inputs.gh_org }}/${{ inputs.gh_repo }}\ncd ${{ inputs.gh_repo }}\ngit checkout ${{ inputs.gh_branch }}\n")
                    del data['jobs'][job]['steps'][ss]['uses']
                    del data['jobs'][job]['steps'][ss]['with']
                elif step['name'] == "cleanup":
                    # Sometimes a blanket clear all on the runner
                    steps_to_delete.append(ss)
                else:
                    # Flag all steps that we want to keep AsLiteral
                    # so they recieve a | for human readable YAML.
                    run_content = AsLiteral(step['run'])
                    data['jobs'][job]["steps"][ss]['run'] = run_content
                    
            # Done looping over the steps, need to delete the steps
            # marked for deletion
            for ss in steps_to_delete:
                print("Deleting step index: "+str(ss))
                del data['jobs'][job]["steps"][ss]

Deleting step index: 0


## Append `on:` and workflow `inputs:`

As mentioned briefly above, ACTIVATE `.yaml` uses the `on:` section to pass input parameters to a workflow. Here, we explicitly add the `on:` section and provide a template for some inputs that may be useful.

In [10]:
# Add on: and populate it with execute:
# Notes:
#------------
# 1) The section that is created is actually `'on':` (quoted on:). ACTIVATE
# default workflows have this quoted on: as well so it should
# still work.
#-----------
# 2) Python bools (title case) are automatically
# converted to YAML bools (all lower case).
#-----------
# 3) The resource input created here matches explicity with
# the remoteHost: ${{inputs.resource.ip}} entry in the 
# ssh: block, above.
#-----------
# 4) Added 
data["on"] = {"execute": 
                 {"inputs": 
                     {"resource":
                         {"label": "Workflow Target",
                          "type": "compute-clusters",
                          "autoselect": True,
                          "optional": False},
                      "gh_org":
                         {"label": "GitHub organization/owner",
                          "type": "string",
                          "default": "parallelworks"},
                      "gh_repo":
                         {"label": "GitHub repository",
                          "type": "string",
                          "default": "spack-stack"},
                      "gh_branch":
                         {"label": "GitHub branch",
                          "type": "string",
                          "default": "canary"}
                     }
                 }
             }

## Write the output file

In [11]:
# Print out the final converted file
with open(output_file, 'w') as stream:
    try:
        yaml.dump(data, stream, sort_keys=False) #, default_flow_style='True') # default_style='literal')    # Write a YAML representation of data
    except yaml.YAMLError as exc:
        print(exc)
        
print(yaml.dump(data, sort_keys=False))    # Output the document to the screen.

env:
  BUILD_CACHE_PATH: /home/ubuntu/spack-stack/build-cache-new-spack-v1
  SOURCE_CACHE_PATH: /home/ubuntu/spack-stack/source-cache
jobs:
  ubuntu-ci-c6a-x86_64-gnu-build:
    steps:
    - name: checkout
      run: |
        echo Cloning repo to $PWD
        git clone https://github.com/${{ inputs.gh_org}}/${{ inputs.gh_repo}}
        cd ${{ inputs.gh_repo }}
        git checkout ${{ inputs.gh_branch }}
    - name: prepare-directories
      run: |
        # DH* REVERT ME AFTER MERGE
        mkdir -p ${BUILD_CACHE_PATH}
        mkdir -p ${SOURCE_CACHE_PATH}
    - name: create-buildcache
      run: |
        # Get day of week to decide whether to use build caches or not
        DOW=$(date +%u)
        # Monday is 1 ... Sunday is 7
        if [[ $DOW == 7 ]]; then
          export USE_BINARY_CACHE=false
          echo "Ignore existing binary cache for creating buildcache environment"
        else
          export USE_BINARY_CACHE=true
          echo "Use existing binary cache for creati