# Parallel Computations with Dolfinx using MPI

The aim of this tutorial is to show how variational problems can be solved with Python using Dolfinx running in parallel. The Message Passing Interface (MPI) standard will be used to carry out parallel computations. We will use the mpi4py package to interface MPI in Python.

First, we will look at some basic examples of MPI usage.

Next, we will cover how to define finite element function spaces and functions on several processes.

Furthermore, creating and distributing a finite element mesh in parallel will be demonstrated.

Finally, the elements of the tutorial are combined to show how the variational problem related to a partial differential equation can be solved in parallel.

This tutorial is inspired by and based on https://newfrac.gitlab.io/newfrac-fenicsx-training/05-dolfinx-parallel/dolfinx-parallel.html and https://jsdokken.com/dolfinx_docs/meshes.html.


In [1]:
## Parallel programming imports
import ipyparallel as ipp

from mpi4py import MPI

## Setting up a cluster
ipyparallel is used to set up a local cluster consisting of 2 processors. To run Jupyter Notebook cells in parallel, we use %%px cell magic. To learn more about this, see the first parts of the MPI tutorial ([Introduction to MPI](./intro-mpi/intro-mpi.ipynb)) as well as https://ipyparallel.readthedocs.io/en/latest/tutorial/magics.html#px-cell-magic.

In [2]:
cluster = ipp.Cluster(engines = "mpi", n = 2)
rc = cluster.start_and_connect_sync()

Starting 2 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>




  0%|          | 0/2 [00:00<?, ?engine/s]

## MPI communication in DOLFINx
When constructing a mesh in DOLFINx, the type of communicator must be specified. The mesh is partitioned by distributing the nodes of the mesh over different processes.

A parameter 'ghost_mode' must be specified. This determines how shared nodes are distributed as ghost nodes between the processes, i.e. which nodes are owned by the local processes and which nodes are ghost nodes that belong to the neighboring processes. We will used the 'shared_facet' option, where facet nodes are shared after mesh partitioning.

In [10]:
%%px
import dolfinx as dfx
import ufl

comm = MPI.COMM_WORLD # MPI communicator

# Define a function used to print stuff with the processor rank number in front
def mpi_print(s):
    print(f"Rank {comm.rank}: {s}")

Nx, Ny = 2, 2 # Mesh size

# Create a unit square mesh
mesh = dfx.mesh.create_unit_square(comm, Nx, Ny, ghost_mode = dfx.cpp.mesh.GhostMode.shared_facet)

The connectivity mapping between cells, facets and vertices of the mesh must be created. If the problem at hand does not require e.g. the mapping between cells and facets, one can omit creating the respective connectivity map to save computation time. Let's create the mapping between cells (for the unit square these are of dimension 2) and facets (dimension 1) and print them to see how they are distributed over the two processes

In [11]:
%%px
mesh.topology.create_connectivity(2, 1)
print("Cell (dim = 2) to facet (dim = 1) connectivity:")
mpi_print(mesh.topology.connectivity(2, 1))

[stdout:1] Cell (dim = 2) to facet (dim = 1) connectivity:
Rank 1: <AdjacencyList> with 6 nodes
  0: [0 8 7 ]
  1: [2 1 0 ]
  2: [4 1 3 ]
  3: [6 5 2 ]
  4: [9 8 10 ]
  5: [12 5 11 ]



[stdout:0] Cell (dim = 2) to facet (dim = 1) connectivity:
Rank 0: <AdjacencyList> with 6 nodes
  0: [5 4 0 ]
  1: [6 1 5 ]
  2: [3 1 2 ]
  3: [7 8 6 ]
  4: [10 4 9 ]
  5: [12 8 11 ]



The ghost nodes for each processor rank is stored in the index map of the mesh topology:

In [12]:
%%px
mpi_print(f"Ghost cells (global numbering): {mesh.topology.index_map(2).ghosts}")

[stdout:0] Rank 0: Ghost cells (global numbering): [4 7]


[stdout:1] Rank 1: Ghost cells (global numbering): [0 3]


## Dolfinx function spaces
The degrees of freedom of a finite element function space in dolfinx is distributed over the nodes of the mesh. To illustrate, we create a function space with 1st order Lagrange elements and print the global and local sizes of the dofmap, as well as the ghost nodes.

In [13]:
%%px
V = dfx.fem.FunctionSpace(mesh, ("Lagrange", 1))

mpi_print(f"Global dofmap size: {V.dofmap.index_map.size_global}")
mpi_print(f"Local dofmap size: {V.dofmap.index_map.size_local}")
mpi_print(f"Ghosts: {V.dofmap.index_map.ghosts}")

[stdout:1] Rank 1: Global dofmap size: 9
Rank 1: Local dofmap size: 5
Rank 1: Ghosts: [0 1 2]


[stdout:0] Rank 0: Global dofmap size: 9
Rank 0: Local dofmap size: 4
Rank 0: Ghosts: [5 8 4 6]


## Dolfinx functions
The degrees of freedom of a dolfinx function is distributed over the nodes of the mesh in the same way as function spaces, as the functions created from a function space inherit the dofmap of the space that they live in. We create a function from the previously defined space $V$ and print the size of the array.

In [14]:
%%px
u = dfx.fem.Function(V)
mpi_print(f"Local size of array: {u.x.map.size_local}")
mpi_print(f"Global size of array: {u.x.map.size_global}")

[stdout:1] Rank 1: Local size of array: 5
Rank 1: Global size of array: 9


[stdout:0] Rank 0: Local size of array: 4
Rank 0: Global size of array: 9


Since we have a scalar function, the size of the array of the function values is the same as the number of nodes in the mesh. If we e.g. had a two-dimensional vector function, the size of the array would be double the amount of mesh nodes.

We can also print the ghost nodes and the rank of the processor owning the ghost nodes:

In [15]:
%%px
mpi_print(f"Ghosts: {u.x.map.ghosts}")
mpi_print(f"Ghost owners: {u.x.map.owners}")

[stdout:0] Rank 0: Ghosts: [5 8 4 6]
Rank 0: Ghost owners: [1 1 1 1]


[stdout:1] Rank 1: Ghosts: [0 1 2]
Rank 1: Ghost owners: [0 0 0]


## Assembling scalars, vectors, matrices in parallel
To solve continuous problems numerically, we have to assemble a linear system of equations arising from discretization. Assembling scalars, vectors and matrices in dolfinx has to be carried out carefully when using several processes. We have to make sure that the processors communicate changes in values of overlapping nodes. We start by creating trial and test functions $u$ and $v$ from our function space $V$.

In [16]:
%%px

# Trial and test functions
u = ufl.TrialFunction(V)
v = ufl.TestFunction (V)

Let us consider a linear form
$$L(v) = \int_{\Omega}f v dx $$
where $v$ is a test function, $\Omega$ is the domain that we have discretized with our mesh and $f$ is a scalar-valued function. The test function is discretized with 1st order continuous Lagrange elements, and to assemble it as a vector we can run

In [17]:
%%px

import ufl

# UFL form of right-hand side
L = ufl.inner(1.0, v) * ufl.dx
L = dfx.fem.form(L)

# Assemble UFL form into a vector
_b = dfx.fem.Function(V)
dfx.fem.petsc.assemble_vector(_b.vector, L)
_b.x.scatter_forward()


Now, after assembling, it is important to distribute the node values from the different processes. After the initial assembly, our vector holds the values

In [18]:
%%px
# Print the size of the index map and the number of ghost nodes
#print(V.dofmap.index_map.size_local*V.dofmap.index_map_bs, V.dofmap.index_map.num_ghosts*V.dofmap.index_map_bs)
print("Prior to communication")
mpi_print(_b.x.array)  

# Add values from ghost regions and accumulate them on the owning process
_b.x.scatter_reverse(dfx.la.ScatterMode.add)

#_b.vector.ghostUpdate(addv = PETSc.InsertMode.ADD, mode = PETSc.ScatterMode.REVERSE)

print("After ADD/REVERSE update")
print(f"Rank: {comm.rank}: {_b.x.array}")   

# Ghost points still not updated, so their values are inconsistent
# Get value from owning process and update the ghosts
_b.x.scatter_forward()
#_b.ghostUpdate(addv = PETSc.InsertMode.INSERT, mode = PETSc.ScatterMode.FORWARD)

print("After INSERT/FORWARD update")
print(f"Rank: {comm.rank}: {_b.x.array}")   


[stdout:1] Prior to communication
Rank 1: [0.125      0.125      0.125      0.04166667 0.04166667 0.04166667
 0.125      0.125     ]
After ADD/REVERSE update
Rank: 1: [0.25       0.25       0.25       0.04166667 0.08333333 0.04166667
 0.125      0.125     ]
After INSERT/FORWARD update
Rank: 1: [0.25       0.25       0.25       0.04166667 0.08333333 0.08333333
 0.25       0.25      ]


[stdout:0] Prior to communication
Rank 0: [0.04166667 0.125      0.125      0.04166667 0.125      0.04166667
 0.125      0.125     ]
After ADD/REVERSE update
Rank: 0: [0.08333333 0.25       0.25       0.04166667 0.125      0.04166667
 0.125      0.125     ]
After INSERT/FORWARD update
Rank: 0: [0.08333333 0.25       0.25       0.04166667 0.25       0.08333333
 0.25       0.25      ]


The values from the ghost nodes can be accumulated on the owning process by running

In [19]:
%%px

# Add values from ghost regions and accumulate them on the owning process
_b.x.scatter_reverse(dfx.la.ScatterMode.add)

print("After ADD/REVERSE update")
mpi_print(_b.x.array)   

[stdout:1] After ADD/REVERSE update
Rank 1: [0.5        0.5        0.5        0.04166667 0.16666667 0.08333333
 0.25       0.25      ]


[stdout:0] After ADD/REVERSE update
Rank 0: [0.16666667 0.5        0.5        0.04166667 0.25       0.08333333
 0.25       0.25      ]


The ghost nodes are still not updated, and now we must distribute the values from the owning process to the ghost nodes

In [20]:
%%px

# Get value from owning process and update the ghosts
_b.x.scatter_forward()

print("After INSERT/FORWARD update")
mpi_print(_b.x.array)

[stdout:0] After INSERT/FORWARD update
Rank 0: [0.16666667 0.5        0.5        0.04166667 0.5        0.16666667
 0.5        0.5       ]


[stdout:1] After INSERT/FORWARD update
Rank 1: [0.5        0.5        0.5        0.04166667 0.16666667 0.16666667
 0.5        0.5       ]


## Putting it all together: Solving a variational problem
We will consider solving a Poisson problem on the unit square domain, denoted $\Omega$. The strong form of the problem is: determine $u$ such that
\begin{align}
    -\nabla^2 u &= f \quad \mathrm{in} \ \Omega, \\
    u &= g \quad \mathrm{on} \ \partial\Omega,
\end{align}
where $\partial\Omega$ is the boundary of the domain. The weak form of the problem is derived by multiplying the PDE with a test function $v$, integrating over the domain and applying integration by parts. This yields

$$\int_{\Omega} \nabla u \cdot \nabla v dx = \int_{\Omega}f v dx$$

where the boundary integral vanishes because $v = 0$ on the boundary due to the Dirichlet boundary condition. For simplicity we set $g = 0$.

The finite element problem can now be defined as: find $u_h \in V_h$ such that
$$a(u_h, v_h) = L(v_h), \forall \ v_h \in V_h,$$
where $V_h$ is the finite element space and
$$a(u, v) = \int_{\Omega} \nabla u \cdot \nabla v dx$$
and
$$L(v) = \int_{\Omega}f v dx.$$
The subscript $h$ emphasizes that the variables are defined on a discrete mesh.

PETSc is the linear algebra backend used for solving the linear system of equations that defines the weak form. For more information on the Krylov solver used here and its options, see: https://petsc.org/release/.

To visualize the solution, we use pyvista (https://docs.pyvista.org/). For a simple introduction to defining and solving variational problems with FEniCSx, see https://jsdokken.com/dolfinx-tutorial/.

In [21]:
%%px

import ufl
from petsc4py import PETSc
import pyvista as pv

comm = MPI.COMM_WORLD # MPI communicator

Nx, Ny = 2, 2 # Mesh size

# Create a unit square mesh
mesh = dfx.mesh.create_unit_square(comm, Nx, Ny, ghost_mode = dfx.cpp.mesh.GhostMode.none)
mesh.topology.create_entities(mesh.topology.dim - 1)
mesh.topology.create_connectivity(mesh.topology.dim - 1, mesh.topology.dim)

# Create a first-order Lagrange finite element space
V = dfx.fem.FunctionSpace(mesh, ("CG", 1))

# Trial and test functions
u = ufl.TrialFunction(V)
v = ufl.TestFunction (V)

u_h = dfx.fem.Function(V) # Solution function

f = dfx.fem.Function(V) # Source term
f.x.set(1) # Set function value to 1


# UFL form of the bilinear form
a = ufl.inner(ufl.grad(u), ufl.grad(v)) * ufl.dx
bilinear_form = dfx.fem.form(a)

# UFL form of right-hand side
L = f * v * ufl.dx
linear_form = dfx.fem.form(L)

# Boundary condition function
g = dfx.fem.Function(V) # Dolfinx function, default function value = 0

# Get the dofs of the boundary facets
boundary_facets = dfx.mesh.exterior_facet_indices(mesh.topology)
boundary_dofs   = dfx.fem.locate_dofs_topological(V, mesh.topology.dim - 1, boundary_facets)
bc_g = dfx.fem.dirichletbc(g, boundary_dofs)

bcs = [bc_g]

# Assemble matrix from the bilinear form
A = dfx.fem.petsc.assemble_matrix(bilinear_form, bcs = bcs)
A.assemble()

# Assemble UFL form into a vector
_b = dfx.fem.Function(V) # Dolfinx function of right-hand side
dfx.fem.petsc.assemble_vector(_b.vector, linear_form)
dfx.fem.petsc.apply_lifting(_b.vector, [bilinear_form], bcs = [bcs])
_b.x.scatter_reverse(dfx.la.ScatterMode.add)
dfx.fem.petsc.set_bc(_b.vector, bcs = bcs)

# Create a (direct) linear solver
solver = PETSc.KSP().create(mesh.comm)
solver.setOperators(A)
solver.setType("preonly")
solver.getPC().setType("lu")
solver.getPC().setFactorSolverType("mumps")

# Solve the variational problem
solver.solve(_b.vector, u_h.vector)

# Print the size of the index map and the number of ghost nodes
print(V.dofmap.index_map.size_local*V.dofmap.index_map_bs, V.dofmap.index_map.num_ghosts*V.dofmap.index_map_bs)

print("Prior to communication")
print(f"Rank: {comm.rank}: {u_h.x.array}")   
# Get value from owning process and update the ghosts
u_h.x.scatter_forward()


print("After scatter_forward update")
print(f"Rank: {comm.rank}: {u_h.x.array}")   

# Visualize the solution
topology, cell_types, x = dfx.plot.create_vtk_mesh(V)
grid = pv.UnstructuredGrid(topology, cell_types, x)

# Set output data
grid.point_data["u"] = u_h.x.array.real
grid.set_active_scalars("u")

# Create a pyvista plotter object and plot the datagrid
pl = pv.Plotter()
pl.add_text(f"Rank: {comm.rank}", font_size = 12)
pl.add_mesh(grid)
pl.show()

[stdout:1] 5 1
Prior to communication
Rank: 1: [0.     0.0625 0.     0.     0.     0.    ]
After scatter_forward update
Rank: 1: [0.     0.0625 0.     0.     0.     0.    ]


[stdout:0] 4 2
Prior to communication
Rank: 0: [0. 0. 0. 0. 0. 0.]
After scatter_forward update
Rank: 0: [0.     0.     0.     0.     0.0625 0.    ]


%px:   0%|          | 0/2 [00:00<?, ?tasks/s]

[output:0]

ViewInteractiveWidget(height=768, layout=Layout(height='auto', width='100%'), width=1024)

[output:1]

ViewInteractiveWidget(height=768, layout=Layout(height='auto', width='100%'), width=1024)

In [22]:
# hide this cell
print("testing hiding")

testing hiding


In [1]:
# Stop the cluster
rc.cluster.stop_cluster_sync()

NameError: name 'rc' is not defined