Skip to content

opencube-horizon/pic-workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PIC workflow on Apache Airflow

A workflow for space plasma physics on Apache Airflow. The workflow is implemented as a DAG, and can be run in Apache Airflow, on a Kubernetes cluster.

Description

We provide a DAG to execute Particle-in-Cell simulation, using the SputniPIC space plasma simulation software.

PIC DAG in Apache Airflow UI

The PIC DAG, as presented in the Apache Airflow UI

Quickstart

Folders

The main DAG is contained in pic.py, we also provide with the following folders:

  • docker/ contains the Dockerfile, along with bash scripts which are included in the image;
  • misc/ contains various configuration files, used for testing and development;
  • plot/ contains python scripts to create plots.

Installation checklist (see Setup & Installation)

  • Kubernetes: PersistentVolume
  • Kubernetes: PersistentVolumeClaim with name pv-pic
  • Docker: Docker image available in public registry with name example/sputniPIC:latest
  • Apache Airflow: pic.py is in Apache Airflow's DAG folder.
  • DAG: in pic.py, PVC_NAME is set
  • DAG: in pic.py, IMAGE_NAME is set
  • DAG: one or more .inp simulation configuration files stored in the root of the PersitentVolume.

Setup & Installation

Requirements

  • A Kubernetes cluster
  • A working Apache Airflow setup: in particular, Apache Airflow must be configured to be able to run tasks on the Kubernetes cluster.

1. Kubernetes PersistentVolume and PersistentVolumeClaim

The workflow relies on a specific PersitentVolumeClaim to be present on the Kubernetes cluster to store files during execution. In this step, we describe how to create a PersistentVolume, and a PersistentVolumeClaim attached to this volume.

PersistentVolume. Your Kubernetes cluster administrator provides you with the name of the PersistentVolume you need to use. However, if you manage your own Kubernetes cluster, you need to create a PersistentVolume yourself, we provide an example in misc/pv.yaml. In this example, and in the rest of this tutorial, the PersistentVolume is named pv-local, and uses local storage as a backend. lease refer to Kubernetes documentation to learn more on PersistentVolume. It can be deployed using:

kubectl create -f misc/pv.yaml

Note

Any other kind of storage other than local storage can be used as a backend for the PersistenVolume, depending on the Cloud provider.

PersistentVolumeClaim. Once you know the name of the PersistentVolume (in this example pv-local), you need to create a PersistentVolumeClaim, containing information on the storage size. An example is provided in misc/pvclaim.yaml, the PersistentVolumeClain can be cerated using:

kubectl create -f misc/pvc.yaml -n airflow

Warning

It is crucial that the namespace used for the PVC is the same as the one under which the Apache Airflow is deployed, here we use airflow.

2. Building the Docker image

A Dockerfile is provided, along with scripts that will be included in the image, in the the docker folder.

To build and publish the image:

cd docker/
docker build -t gabinsc/sputnipic:latest
docker push gabinsc/sputnipic:latest

Note

The image must be published to a public Docker registry, or a registry which is accessible from the Apache Airflow setup. Please refer to Docker documentation for more details on building an image, and publishing it.

3. Deploying and adapting the DAG

In order for the DAG to be executed in your specific environment, some adjusments are required.

  1. Place the pic.py file in the DAG folder of your Apache Airflow setup.
  2. Adjust the following constants in pic.py:
    • IMAGE_NAME: name of the image that will be used for the containers.
    • PVC_NAME: name of the PersistentVolumeClaim created in step 1, pvc-pic.
  3. Validate that you can see the DAG under the name pic in the Apache Airflow UI. If not, DAG import errors are reported in the top of the UI.

Run

Before you run the DAG, place the various configuration files for the simulation, in .inp format, in the root of the PersistentVolume defined in the Kubernetes cluster.

Note

Input files are available in SputniPIC's repository: examples.

Click on "Trigger DAG" in the Apache Airlfow UI to start the DAG with the default parameters. You can customize the DAG parameters to your needs by clicking "Trigger DAG w/ config":

  • inputlist: list of experiments, each experiement is the name of the corresponding configuration files, without the .inp extension.

Results analysis (experimental)

We provide python scripts to create readable Gantt charts, based on the workflow execution. Note that a Gantt chart can be found for each DAG execution in Apache Airflow UI, however, this chart offers limited interactivity and can be hard to read for complex or long-running DAGs.

Requirements:

  • python (≥ 3.9)
  • python libraries: plotly, requests, pandas

For a specific DAG run, the plot/plot_gantt.py script creates two Gantt chart in SVG format:

  • Resource view: each line in the chart represents a slot in a pool (note that multi-slot tasks are not supported)
  • Task view: each line represents a tasks
  • (REMOVED) Multi-execution resource view: several DAG runs can be presented on the same Gantt chart, each run has its own color.

Before running this script, some information need to be set in plot/constants.py:

  • BASE_URL: base URL to access Airflow API
  • SESSION_COOKIE: session cookie, can typically be obtained from the Network section of your browser's DevTools when logged in on the Apache Airflow UI
  • POOL_ALIAS: alias names for the various pools, will be shown in the legend

The identifier of the DAG and the identifier of the specific DAG run are given as command-line arguments. The script can also plot data from a JSON file, a sample is provided in the samples/ directory.

When running the script, figures will be written to the figures/ folder.

Relevant publications

FAQ

About

Apache Airflow workflow for Particle-in-Cell (PIC) simulation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published