Skip to content

Commit

Permalink
add mpi examples (#168)
Browse files Browse the repository at this point in the history
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Co-authored-by: vsoch <vsoch@users.noreply.github.com>
  • Loading branch information
vsoch and vsoch committed May 14, 2023
1 parent 41a966f commit 101aed5
Show file tree
Hide file tree
Showing 5 changed files with 231 additions and 0 deletions.
6 changes: 6 additions & 0 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ The following tutorials are provided from their respective directories (and are
- [Dask with Scikit-Learn](https://github.com/flux-framework/flux-operator/blob/main/examples/machine-learning/dask/scikit-learn)
- [Ray with Scikit-Learn](https://github.com/flux-framework/flux-operator/blob/main/examples/machine-learning/ray/scikit-learn)

### Message Passing Interface (MPI)

- [openmpi](https://github.com/flux-framework/flux-operator/blob/main/examples/mpi/ompi)
- [mpich](https://github.com/flux-framework/flux-operator/blob/main/examples/mpi/mpich)


### Services

- [Merlin Basic](https://github.com/flux-framework/flux-operator/blob/main/examples/launchers/merlin/basic)
Expand Down
66 changes: 66 additions & 0 deletions examples/mpi/mpich/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Mpich Example

You should be able to create a MiniKube cluster, install the operator with creating the namespace:

```bash
$ minikube start
$ kubectl create namespace flux-operator
$ kubectl apply -f ../../dist/flux-operator.yaml
```

You might want to pre-pull the container:

```bash
$ minikube ssh docker pull ghcr.io/rse-ops/mpich:tag-mamba
```

And then create the MiniCluster:

```bash
$ kubectl create -f minicluster.yaml
```

And watch the example run!

```bash
$ kubectl logs -n flux-operator flux-sample-0-5gjqt -f
```

A successful run will show four MPI ranks...

```console
broker.info[0]: rc1.0: running /etc/flux/rc1.d/02-cron
broker.info[0]: rc1.0: /etc/flux/rc1 Exited (rc=0) 0.5s
broker.info[0]: rc1-success: init->quorum 0.543544s
broker.info[0]: online: flux-sample-0 (ranks 0)
broker.info[0]: online: flux-sample-[0-3] (ranks 0-3)
broker.info[0]: quorum-full: quorum->run 0.369278s
Hello, world! I am 1 of 4(Open MPI v4.0.3, package: Debian OpenMPI, ident: 4.0.3, repo rev: v4.0.3, Mar 03, 2020, 87)
Hello, world! I am 0 of 4(Open MPI v4.0.3, package: Debian OpenMPI, ident: 4.0.3, repo rev: v4.0.3, Mar 03, 2020, 87)
Hello, world! I am 2 of 4(Open MPI v4.0.3, package: Debian OpenMPI, ident: 4.0.3, repo rev: v4.0.3, Mar 03, 2020, 87)
Hello, world! I am 3 of 4(Open MPI v4.0.3, package: Debian OpenMPI, ident: 4.0.3, repo rev: v4.0.3, Mar 03, 2020, 87)
broker.info[0]: rc2.0: flux submit -N 4 -n 4 --quiet --watch ./hello_cxx Exited (rc=0) 0.8s
broker.info[0]: rc2-success: run->cleanup 0.843814s
broker.info[0]: cleanup.0: flux queue stop --quiet --all --nocheckpoint Exited (rc=0) 0.1s
broker.info[0]: cleanup.1: flux cancel --user=all --quiet --states RUN Exited (rc=0) 0.1s
broker.info[0]: cleanup.2: flux queue idle --quiet Exited (rc=0) 0.1s
broker.info[0]: cleanup-success: cleanup->shutdown 0.320065s
broker.info[0]: children-complete: shutdown->finalize 61.2525ms
broker.info[0]: rc3.0: running /etc/flux/rc3.d/01-sched-fluxion
broker.info[0]: rc3.0: /etc/flux/rc3 Exited (rc=0) 0.3s
broker.info[0]: rc3-success: finalize->goodbye 0.310701s
broker.info[0]: goodbye: goodbye->exit 0.037999ms
```

And the job will be completed.

```bash
kubectl get -n flux-operator pods
```
```console
NAME READY STATUS RESTARTS AGE
flux-sample-0-5gjqt 0/1 Completed 0 2m40s
flux-sample-1-j4zlc 0/1 Completed 0 2m40s
flux-sample-2-wdzz7 0/1 Completed 0 2m40s
flux-sample-3-vp8rx 0/1 Completed 0 2m40s
```
32 changes: 32 additions & 0 deletions examples/mpi/mpich/minicluster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
apiVersion: flux-framework.org/v1alpha1
kind: MiniCluster
metadata:
name: flux-sample
namespace: flux-operator
spec:
# Number of pods to create for MiniCluster
size: 4
tasks: 4

# suppress all output except for test run
logging:
quiet: false

# This is a list because a pod can support multiple containers
containers:
# The container URI to pull (currently needs to be public)
- image: ghcr.io/rse-ops/mpich:tag-mamba

# Note that there are many examples here!
# flux run -n 4 ./hello_c
# flux run -n 4 ./hello_cxx
# flux run -n 4 ./connectivity_c
# flux run -n 4 ./hello_usempi
# flux run -n 4 ./ring_c
# flux run -n 4 ./ring_usempi
# flux run -n 4 ./ring_mpifh
command: ./hello_cxx
workingDir: /opt/ompi
environment:
LD_LIBRARY_PATH: /opt/conda/lib
PYTHONPATH: /opt/conda/lib/python3.10/site-packages
98 changes: 98 additions & 0 deletions examples/mpi/ompi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# OpenMPI Example

You should be able to create a MiniKube cluster, install the operator with creating the namespace:

```bash
$ minikube start
$ kubectl create namespace flux-operator
$ kubectl apply -f ../../dist/flux-operator.yaml
```

You might want to pre-pull the container:

```bash
$ minikube ssh docker pull ghcr.io/rse-ops/ompi:flux-sched-focal
```

And then create the MiniCluster:

```bash
$ kubectl create -f minicluster.yaml
```

And watch the example run!

```bash
$ kubectl logs -n flux-operator flux-sample-0-5gjqt -f
```

A successful run will show four MPI ranks (and mpich is really vocal huh?)...

```console
broker.info[0]: rc1.0: running /etc/flux/rc1.d/02-cron
broker.info[0]: rc1.0: /etc/flux/rc1 Exited (rc=0) 0.6s
broker.info[0]: rc1-success: init->quorum 0.602697s
broker.info[0]: online: flux-sample-0 (ranks 0)
broker.info[0]: online: flux-sample-[0-3] (ranks 0-3)
broker.info[0]: quorum-full: quorum->run 0.361781s
Hello, world! I am 0 of 4(MPICH Version: 3.3a2
MPICH Release date: Sun Nov 13 09:12:11 MST 2016
MPICH Device: ch3:nemesis
MPICH configure: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --with-libfabric --enable-shared --prefix=/usr --enable-fortran=all --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/mpich --docdir=/usr/share/doc/mpich --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr CPPFLAGS= CFLAGS= CXXFLAGS= FFLAGS= FCFLAGS=
MPICH CC: gcc -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2
MPICH CXX: g++ -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2
MPICH F77: gfortran -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2
MPICH FC: gfortran -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2
, 1297)
Hello, world! I am 2 of 4(MPICH Version: 3.3a2
MPICH Release date: Sun Nov 13 09:12:11 MST 2016
MPICH Device: ch3:nemesis
MPICH configure: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --with-libfabric --enable-shared --prefix=/usr --enable-fortran=all --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/mpich --docdir=/usr/share/doc/mpich --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr CPPFLAGS= CFLAGS= CXXFLAGS= FFLAGS= FCFLAGS=
MPICH CC: gcc -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2
MPICH CXX: g++ -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2
MPICH F77: gfortran -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2
MPICH FC: gfortran -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2
, 1297)
Hello, world! I am 3 of 4(MPICH Version: 3.3a2
MPICH Release date: Sun Nov 13 09:12:11 MST 2016
MPICH Device: ch3:nemesis
MPICH configure: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --with-libfabric --enable-shared --prefix=/usr --enable-fortran=all --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/mpich --docdir=/usr/share/doc/mpich --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr CPPFLAGS= CFLAGS= CXXFLAGS= FFLAGS= FCFLAGS=
MPICH CC: gcc -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2
MPICH CXX: g++ -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2
Hello, world! I am 1 of 4(MPICH Version: 3.3a2
MPICH F77: gfortran -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2
MPICH Release date: Sun Nov 13 09:12:11 MST 2016
MPICH FC: gfortran -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2
MPICH Device: ch3:nemesis
, 1297)
MPICH configure: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --with-libfabric --enable-shared --prefix=/usr --enable-fortran=all --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/mpich --docdir=/usr/share/doc/mpich --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr CPPFLAGS= CFLAGS= CXXFLAGS= FFLAGS= FCFLAGS=
MPICH CC: gcc -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2
MPICH CXX: g++ -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2
MPICH F77: gfortran -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2
MPICH FC: gfortran -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2
, 1297)
broker.info[0]: rc2.0: flux submit -N 4 -n 4 --quiet --watch ./hello_cxx Exited (rc=0) 0.4s
broker.info[0]: rc2-success: run->cleanup 0.380367s
broker.info[0]: cleanup.0: flux queue stop --quiet --all --nocheckpoint Exited (rc=0) 0.1s
broker.info[0]: cleanup.1: flux cancel --user=all --quiet --states RUN Exited (rc=0) 0.1s
broker.info[0]: cleanup.2: flux queue idle --quiet Exited (rc=0) 0.1s
broker.info[0]: cleanup-success: cleanup->shutdown 0.264937s
broker.info[0]: children-complete: shutdown->finalize 62.0603ms
broker.info[0]: rc3.0: running /etc/flux/rc3.d/01-sched-fluxion
broker.info[0]: rc3.0: /etc/flux/rc3 Exited (rc=0) 0.2s
broker.info[0]: rc3-success: finalize->goodbye 0.217901s
broker.info[0]: goodbye: goodbye->exit 0.028526ms
```

And the job will be completed.

```bash
kubectl get -n flux-operator pods
```
```console
NAME READY STATUS RESTARTS AGE
flux-sample-0-flg28 0/1 Completed 0 9m39s
flux-sample-1-fplvv 0/1 Completed 0 9m39s
flux-sample-2-7bltz 0/1 Completed 0 9m39s
flux-sample-3-p8mtj 0/1 Completed 0 9m39s
```
29 changes: 29 additions & 0 deletions examples/mpi/ompi/minicluster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
apiVersion: flux-framework.org/v1alpha1
kind: MiniCluster
metadata:
name: flux-sample
namespace: flux-operator
spec:
# Number of pods to create for MiniCluster
size: 4
tasks: 4

# suppress all output except for test run
logging:
quiet: false

# This is a list because a pod can support multiple containers
containers:
# The container URI to pull (currently needs to be public)
- image: ghcr.io/rse-ops/ompi:flux-sched-focal

# Note that there are many examples here!
# flux run -n 4 ./hello_c
# flux run -n 4 ./hello_cxx
# flux run -n 4 ./connectivity_c
# flux run -n 4 ./hello_usempi
# flux run -n 4 ./ring_c
# flux run -n 4 ./ring_usempi
# flux run -n 4 ./ring_mpifh
workingDir: /opt/ompi
command: ./hello_cxx

0 comments on commit 101aed5

Please sign in to comment.