Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



69 Commits

Repository files navigation

An Implicitly Parallel Meshfree Solver in Regent


This project implements a meshfree solver for inviscid, compressible fluid flows using q-LSKUM in the implicitly parallel, high-level programming language Regent. This is a first step towards our aim of developing a hybrid, production-level CFD code that is able to utilize the full power of today's heterogeneous supercomputers by simultaneously executing on both CPUs and GPGPUs.


R. Soi, N. R. Mamidi, E. Slaughter, K. Prasun, A. Nemili and S. M. Deshpande, "An Implicitly Parallel Meshfree Solver in Regent," 2020 IEEE/ACM 3rd Annual Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM), Atlanta, GA, USA, 2020


Python 3.x


For PAW-ATM 2020

All tests were done with the Control Replication branch of Legion at 0baf4f44b2e744af26565e2b36dba7fea386d857 installed with OpenMP support.


Input File Generation

Most input files can't be provided here due to GitHub's file size contraints. The scripts directory contains various pre-processing scripts to transform files output from the partitioner to the correct format, but the entire process is somewhat clunky:

  • Run on the partitioner output to generate a grid file
  • Run on the partitioner output to generate a bitmask file
  • Run on the output of the above two steps to sort the grid according to the bitmask (this is done to compact memory instances). This generates the final input file.

Sample Input File

The finest grid file (40 M points) partGrid40M_16_b can be downloaded here. It has been partitioned into 16 subgrids by METIS. The code has already been configured for this particular file, so you just need to put it inside the grids sub-directory.

Code Setup

Most parameters, including number of partitions, input file path, physical constants etc, can be configured inside src/config.rg. Please ensure that the correct file path and number of partitions is specified.


To run, do python3 ../legion/language/ src/meshfree_solver.rg followed by the required flags. We document some useful flags below. It is generally recommended to launch a single Regent (Legion) process per node.

SNo. Flag Effect
1 -fopenmp 0 Suppress OpenMP code generation
2 -fcuda 0 Suppress CUDA code generation
3 -ll:cpu N Use N CPU cores per node
4 -ll:csize m Use m MB of RAM per node
5 -ll:gpu N Use N GPUs per node
6 -ll:fsize m Use m MB of GPU framebuffer memory per GPU
7 -level runtime=5 Suppress most runtime warnings

Further, the number of iterations and inner iterations can be set by appending --iter X --inner-iter Y to the flags. We generally set X = 1000, Y = 3.

For CPU runs (without OpenMP), the number of partitions should be equal to the number of CPU cores for optimal performance.

For Regent + OpenMP, we generally do -ll:ocpu 1 -ll:onuma 1 -ll:othr 30 which gives us 30 OpenMP cores per socket. You should experiment with this depending on the configuration available to you. Additionally, we have noticed optimal performance with number of partitions equal to the number of sockets (generally two per node).

Note that Legion requires a few CPU cores for runtime dependence analysis, so never assign all cores using -ll:cpu etc.


Please contact the author for any help in running the code or obtaining other input files.


Rupanshu Soi, Department of Computer Science, Birla Institute of Technology and Science, Pilani at Hyderabad, India.

Email: or