# Hands-on: High Performance Computing applied to Industry

## ☆ Final Exercise: Accelerate and Optimize an N-Body Simulator

An [n-body](https://en.wikipedia.org/wiki/N-body_problem) simulator predicts the individual motions of a group of objects interacting with each other gravitationally. [mini-nbody-sequential.cu] contains a simple, though working, n-body simulator for bodies moving through 3 dimensional space.

In its current CPU-only form, this application takes about 5 seconds to run on 4096 particles, and **20 minutes** to run on 65536 particles. Your task is to accelerate the program, retaining the correctness of the simulation.

### Considerations to Guide Your Work

Here are some things to consider before beginning your work:

- Especially for your first refactors, the logic of the application, the `bodyForce` function in particular, can and should remain largely unchanged: focus on accelerating it as easily as possible.
- The code base contains a for-loop inside `main` for integrating the interbody forces calculated by `bodyForce` into the positions of the bodies in the system. This integration both needs to occur after `bodyForce` runs, and, needs to complete before the next call to `bodyForce`. Keep this in mind when choosing how and where to parallelize.
- You are not required to add error handling to your code, but you might find it helpful, as you are responsible for your code working correctly.

**Have Fun!**

### A Single CPU Implementation 

In [1]:
%%writefile mini-nbody-sequential.cu
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

#define BLOCK_SIZE 256
#define SOFTENING 1e-9f

typedef struct { float x, y, z, vx, vy, vz; } Body;

void randomizeBodies(float *data, int n) 
{
  for (int i = 0; i < n; i++)
    data[i] = 2.0f * (rand() / (float)RAND_MAX) - 1.0f;

}

void bodyForce(Body *p, float dt, int n) 
{
  float Fx;
    
  for (int i = 0; i < n; ++i) {
    Fx = 0.0f; float Fy = 0.0f; float Fz = 0.0f;

    for (int j = 0; j < n; j++) {
      float dx = p[j].x - p[i].x;
      float dy = p[j].y - p[i].y;
      float dz = p[j].z - p[i].z;
      float distSqr = dx*dx + dy*dy + dz*dz + SOFTENING;
      float invDist = rsqrtf(distSqr);
      float invDist3 = invDist * invDist * invDist;

      Fx += dx * invDist3; Fy += dy * invDist3; Fz += dz * invDist3;
    }
    p[i].vx += dt*Fx; p[i].vy += dt*Fy; p[i].vz += dt*Fz;
  }
    
}

int main(const int argc, const char** argv) 
{
  int nBodies = 30000; //size of the problem (bodies)
    
  if (argc > 1) 
    nBodies = atoi(argv[1]);

  const float dt   = 0.01f; // time step
  const int nIters = 10;    // simulation iterations
  int bytes  = nBodies * sizeof(Body);
  float *buf = (float*) malloc (bytes);
  Body *p    = (Body*) buf;

  randomizeBodies(buf, 6*nBodies); // Init pos/vel data

  const double t1 = omp_get_wtime();

  for (int iter = 1; iter <= nIters; iter++) 
  {
    bodyForce(p, dt, nBodies); // compute interbody forces
  
    for (int i = 0 ; i < nBodies; i++) { // integrate position
      p[i].x += p[i].vx*dt;
      p[i].y += p[i].vy*dt;
      p[i].z += p[i].vz*dt;
    }

  }
    
  const double t2 = omp_get_wtime();

  double avgTime = (t2-t1) / (double)(nIters-1); 
  
  float billionsOfOpsPerSecond = 1e-9 * nBodies * nBodies / avgTime;
  printf("\nSize (Bodies) = %d\n", nBodies);
  printf("%0.3f Billion Interactions/second\n", billionsOfOpsPerSecond);
  printf("%0.3f second\n", avgTime);
  
  free(buf);

  return 0;
}

Writing mini-nbody-sequential.cu


In [2]:
!nvcc mini-nbody-sequential.cu -o mini-nbody-sequential -Xcompiler -fopenmp -O3

In [3]:
!./mini-nbody-sequential


Size (Bodies) = 30000
0.160 Billion Interactions/second
5.614 second


## Clear the Temporary Files

Before moving on, please execute the following cell to clear up the directory. This is required to move on to the next notebook.

In [4]:
!rm -rf r00* gmon.out heat* mm* wave* mini-* report1* test_* vector-* ../Documents ../intel