# LBM

![Wind turbine from XLB library]()

<figure>
  <img src="https://raw.githubusercontent.com/autodesk/xlb/main/assets/wind_turbine.gif" alt="Wind turbine from XLB library">
  <figcaption><strong>Figure 1: Wind turbine from XLB library</strong></figcaption>
</figure>

The Lattice Boltzmann method is a relatively new numerical technique in computational fluid dynamics (CFD). It offers computational features that make it extremely scalable on large-scale HPC systems. We will explore how some of the multi-GPU programming techniques presented in the previous exercises can be applied to an LBM-based solver.

Next, we introduce some basic concepts of LBM for a 2D problem. For general information, see the Wikipedia page: [Lattice Boltzmann methods](https://en.wikipedia.org/wiki/Lattice_Boltzmann_methods).

Part of the code in these exercises has been adapted from the XLB library—a scalable, differentiable, open-source Python library for LBM developed by Autodesk. To run and profile more complex problems, check out its GitHub page: [XLB](https://github.com/Autodesk/XLB).



## Lattice Boltzmann Method 
In the Lattice Boltzmann Method (LBM), the state of the fluid at each lattice node is represented by a set of particle distribution functions, $f_i(\mathbf{x}, t)$. Each $f_i$ can be thought of as the probability (or, more precisely, the expected number) of finding a fluid “particle” at position $\mathbf{x}$ and time $t$ that is moving along the discrete velocity direction $\mathbf{e}_i$. Rather than tracking individual molecules, LBM evolves these distributions through collision and streaming steps. In the collision step, the distributions at each node relax toward a local equilibrium—the Maxwell–Boltzmann distribution projected onto the discrete velocity set—ensuring that mass and momentum are conserved. In the streaming step, these post‐collision distributions propagate to neighboring nodes, advecting the particle probabilities across the lattice.

Because each distribution $f_i$ carries a fraction of the local density and momentum, macroscopic properties are retrieved simply by summing over all directions:

$$
\rho(\mathbf{x},t) \;=\; \sum_{i} f_i(\mathbf{x},t),
\qquad
\rho(\mathbf{x},t)\,\mathbf{u}(\mathbf{x},t) \;=\; \sum_{i} f_i(\mathbf{x},t)\,\mathbf{e}_i.
$$

This probabilistic interpretation makes LBM inherently statistical: collisions model how particle velocities redistribute toward equilibrium under local forces, and streaming moves those probabilities through space. It’s this combination of stochastic interpretation and discrete lattice mechanics that gives LBM both its physical fidelity and its remarkable parallel scalability.


## Starting Our First LBM Solver

Implementing and optimizing an LBM solver on multi-GPU systems using Warp promises to be an engaging challenge. However, to focus on core computational details without getting bogged down in tedious solver setup, we will leverage a local Python library called **lbm**. This library handles setting up most of the LBM data structures and problem configuration.

We'll fist start defining the size of our 2D domain and the number of iteration we wanto to run. 
<!-- <img src="img/lattice-discretization.jpg" width="500" height="340"> -->


In [34]:
import lbm
import time
import warp as wp
wp.clear_kernel_cache()
exercise_name = "01_AoS_user"

# Initializing the LBM parameters
params = lbm.Parameters(num_steps=5000,
                        nx=1024,
                        ny=1024,
                        prescribed_vel=0.5,
                        Re=10000.0)
print(params)


LBM Problem Parameters(nx=1024, ny=1024, num_steps=5000, Re=10000.0, prescribed_vel=0.5)


The **Parameter** class stores some LBM constant, for example the represenation of the lattice. There are differnet lattices for LBM and ther following three are just an example:We will be using the D2Q9 lattice, its represenation can seen as follows

<img src="img/lattices.jpg" width="500" height="340">

In our case, we'll be using a D2Q9. The velocity vectors of the D2Q9 are are represented by the **Parameter** via **c_host** and **c_dev** fields. It is worth noting that the lattice includes a null vector reprecenting the center of the cell. 

In [38]:
params.c_host.shape

(2, 9)

In [39]:
print(f"D2Q9 \n{params.c_host}")

D2Q9 
[[ 0  0  0  1 -1  1 -1  1 -1]
 [ 0  1 -1  0  1 -1  0  1 -1]]


The **Parameter** class also include functionality to retrieve opposite direction in the lattice as follows: 

In [40]:
a_target_direction = params.c_host[:,2]
its_opposite = params.c_host[:,params.opp_indices_host[2]]
print(f"The opposite of {a_target_direction} is {its_opposite}")

The opposite of [ 0 -1] is [0 1]


## The LBM Domain

<img src="img/lattice-discretization.jpg" width="500" height="340">

In LBM, we discretize the domain with a Cartesian background grid. To represent the probability distribution fields \(f_i\), we store, for each cell, one floating-point value per lattice direction. Therefore, we need to allocate a three-dimensional array where two dimensions represent the 2D spatial domain and the third represents the number of directions.

**In the following, please find the shape of the 3D array according to an Array-of-Structures layout.**

In [41]:
f_0 = wp.zeros((params.nx, params.ny, params.Q), dtype=wp.float64)
f_1 = wp.zeros((params.nx, params.ny, params.Q), dtype=wp.float64)

To abstract the access of the population fiels, we can define some read and write helper functions. 

In [42]:
@wp.func
def read_field(field: wp.array3d(dtype=wp.float64), card: wp.int32, xi: wp.int32, yi: wp.int32):
    return field[xi, yi, card]

@wp.func
def write_field(field: wp.array3d(dtype=wp.float64), card: wp.int32, xi: wp.int32, yi: wp.int32,
                value: wp.float64):
    field[xi, yi, card] = value

## Some Helper Functions

In the following, we’ll use the **lbm** library to allocate additional fields for the macroscopic quantities. We aren’t concerned with these fields beyond visualization purposes—indeed, the population fields are the only state variables required for LBM. We will also define several functions and kernels that will serve as black boxes in our LBM solver.

In [43]:
# Initialize the memory
mem = lbm.Memory(params,
                 f_0=f_0,
                 f_1=f_1,
                 read=read_field,
                 write=write_field)

# Initialize the kernels
functions = lbm.Functions(params)
kernels = lbm.Kernels(params, mem)

Q = params.Q
D = params.D
bc_bulk = params.bc_bulk
c_dev = params.c_dev
dim_dev = params.dim_dev

## The LBM Operators
### Streaming

In [None]:
@wp.kernel
def stream(
        f_in: wp.array3d(dtype=wp.float64),
        f_out: wp.array3d(dtype=wp.float64),
):
    # Get the global index
    ix, iy = wp.tid()
    index = wp.vec2i(ix, iy)
    f_post = wp.vec(length=Q, dtype=wp.float64)

    for q in range(params.Q):
        pull_ngh = wp.vec2i(0, 0)
        outside_domain = False

        for d in range(D):
            pull_ngh[d] = index[d] - c_dev[d, q]

            if pull_ngh[d] < 0 or pull_ngh[d] >= dim_dev[d]:
                outside_domain = True
        if not outside_domain:
            f_post[q] = read_field(field=f_in, card=q, xi=pull_ngh[0], yi=pull_ngh[1])

    # Set the output
    for q in range(params.Q):
        write_field(field=f_out, card=q, xi=index[0], yi=index[1], value=f_post[q])

### Managing boundary conditions

In [None]:
compute_boundaries = functions.get_apply_boundary_conditions()

In [None]:
@wp.kernel
def apply_boundary_conditions(
        bc_type_field: wp.array2d(dtype=wp.uint8),
        f_out: wp.array3d(dtype=wp.float64),
):
    # Get the global index
    ix, iy = wp.tid()

    bc_type = bc_type_field[ix, iy]
    if bc_type == bc_bulk:
        return

    f = compute_boundaries(bc_type)

    for q in range(params.Q):
        write_field(field=f_out, card=q, xi=ix, yi=iy, value=f[q])

### Collision

In [None]:
compute_macroscopic = functions.get_macroscopic()
compute_equilibrium = functions.get_equilibrium()
compute_collision = functions.get_kbc()

In [None]:
@wp.kernel
def collide(
        f: wp.array3d(dtype=wp.float64),
        omega: wp.float64,
):
    # Get the global index
    ix, iy = wp.tid()
    # Get the equilibrium

    f_post_stream = wp.vec(length=Q, dtype=wp.float64)
    for q in range(params.Q):
        f_post_stream[q] = read_field(field=f, card=q, xi=ix, yi=iy)

    mcrpc = compute_macroscopic(f_post_stream)

    # Compute the equilibrium
    f_eq = compute_equilibrium(mcrpc)

    f_post_collision = compute_collision(f_post_stream, f_eq, mcrpc, omega)

    # Set the output
    for q in range(params.Q):
        write_field(field=f, card=q, xi=ix, yi=iy, value=f_post_collision[q])