<a href="https://colab.research.google.com/github/pranavkantgaur/training_materials/blob/master/ACT_demos.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Common Imports
import numpy as np
import matplotlib.pyplot as plt
import time
import random

##Lecture 1: Naive Monte Carlo Simulation

Objective: Implement a serial neutron absorption simulator.

In [None]:
def simulate_serial(N=10000, p_abs=0.01, max_steps=100):
    absorbed = 0
    random.seed(42)
    for _ in range(N):
        for _ in range(max_steps):
            if random.random() < p_abs:
                absorbed += 1
                break
    return absorbed

# Runtime and accuracy test
start = time.time()
result = simulate_serial(10000)  # Smaller N for quick demo
print(f"Absorbed: {result} | Time: {time.time() - start:.2f}s")

Absorbed: 6346 | Time: 0.04s


Explanation:

    Each neutron is simulated independently with a loop-in-loop structure.

    p_abs=0.01 means a 1% chance of absorption per step.



## C++ version

In [None]:
%%shell
cat << 'EOF' > neutron.cpp
#include <iostream>
#include <random>
#include <chrono>

int simulate_cpp(int N=10000, double p_abs=0.01, int max_steps=100) {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_real_distribution<> dis(0.0, 1.0);

    int absorbed = 0;
    for (int i = 0; i < N; ++i) {
        for (int step = 0; step < max_steps; ++step) {
            if (dis(gen) < p_abs) {
                absorbed++;
                break;
            }
        }
    }
    return absorbed;
}

int main() {
    auto start = std::chrono::high_resolution_clock::now();
    int result = simulate_cpp();
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = end - start;
    std::cout << "C++ Result: " << result
              << " | Time: " << elapsed.count() << "s\n";
    return 0;
}
EOF



In [None]:
%%shell
# Compile C++ code with optimizations
g++ -O3 -o neutron_simulation neutron.cpp
./neutron_simulation

C++ Result: 6395 | Time: 0.011817s




## Time profile serial implementation

In [None]:
import cProfile

def simulate_serial(N=10000, p_abs=0.01, max_steps=100):
    absorbed = 0
    random.seed(42)
    for _ in range(N):
        for _ in range(max_steps):
            if random.random() < p_abs:
                absorbed += 1
                break
    return absorbed

# Profile function
cProfile.run('simulate_serial(10000)', sort='cumulative')

         635455 function calls in 0.196 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.196    0.196 {built-in method builtins.exec}
        1    0.000    0.000    0.196    0.196 <string>:1(<module>)
        1    0.147    0.147    0.196    0.196 <ipython-input-42-ecb417cdbb4f>:3(simulate_serial)
   635447    0.048    0.000    0.048    0.000 {method 'random' of '_random.Random' objects}
        1    0.000    0.000    0.000    0.000 random.py:128(seed)
        1    0.000    0.000    0.000    0.000 {function Random.seed at 0x7b4b5bb90180}
        2    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




### Line by line profiling

In [3]:
!pip install line_profiler

Collecting line_profiler
  Downloading line_profiler-4.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (34 kB)
Downloading line_profiler-4.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (750 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m750.2/750.2 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: line_profiler
Successfully installed line_profiler-4.2.0


In [6]:
def simulate_serial(N=10_000, p_abs=0.01, max_steps=100):
    absorbed = 0
    for _ in range(N):
        for _ in range(max_steps):
            if random.random() < p_abs:
                absorbed += 1
                break
    return absorbed

# Explicitly add the decorator (Colab workaround)
from line_profiler import LineProfiler
lp = LineProfiler()
lp_wrapper = lp(simulate_serial)

In [5]:
# Execute with profiling
result = lp_wrapper()
lp.print_stats()

Timer unit: 1e-09 s

Total time: 1.33926 s
File: <ipython-input-4-47fe190b5943>
Function: simulate_serial at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
     4                                           def simulate_serial(N=10_000, p_abs=0.01, max_steps=100):
     5         1       1384.0   1384.0      0.0      absorbed = 0
     6     10001   12482288.0   1248.1      0.9      for _ in range(N):
     7    639367  454249092.0    710.5     33.9          for _ in range(max_steps):
     8    635750  865666756.0   1361.6     64.6              if random.random() < p_abs:
     9      6383    3841695.0    601.9      0.3                  absorbed += 1
    10      6383    3016845.0    472.6      0.2                  break
    11         1        833.0    833.0      0.0      return absorbed



Assignment:
1. How to improve the runtime performance without resorting to parallelization/vectorization?

### How the above line_profiler works? (Decorators)

In [15]:
class SimpleDecorator:
    def __init__(self, func):
        self.func = func  # Store the original function

    def __call__(self, *args, **kwargs):
        """This gets called when the decorated function is invoked"""
        print(f"Before calling {self.func.__name__}")
        result = self.func(*args, **kwargs)  # Execute original function
        print(f"After calling {self.func.__name__}")
        return result

'''
# Usage
@SimpleDecorator
def hello(name):
    print(f"Hello, {name}!")
# Test it
#hello("Alice")
'''
sd = SimpleDecorator(hello)
sd("Alice")
#hello("Alice")

Hello, Alice!


### Why line_profiler is not designed with interface like cProfile?

1. To profile individual lines (not just function calls), line_profiler must instrument the function line-by-line. This requires wrapping the function explicitly.
2. cProfile operates at the function-call level, which is simpler to implement without modifying the target code.


## Before vectorizaton, can Cython help here?

In [16]:
# STEP 1: Install required packages
!pip install line_profiler cython



In [17]:
# STEP 2: Original Python implementation with line profiling
import random
from line_profiler import LineProfiler

def simulate_serial(N=10_000, p_abs=0.01, max_steps=100):
    absorbed = 0
    for _ in range(N):
        for _ in range(max_steps):
            if random.random() < p_abs:
                absorbed += 1
                break
    return absorbed

# Profile the Python version
lp_py = LineProfiler()
lp_py_wrapper = lp_py(simulate_serial)
%time lp_py_wrapper()
lp_py.print_stats()

CPU times: user 1.35 s, sys: 0 ns, total: 1.35 s
Wall time: 1.38 s
Timer unit: 1e-09 s

Total time: 0.672218 s
File: <ipython-input-17-2ab58afc3253>
Function: simulate_serial at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
     5                                           def simulate_serial(N=10_000, p_abs=0.01, max_steps=100):
     6         1       2000.0   2000.0      0.0      absorbed = 0
     7     10001    4193472.0    419.3      0.6      for _ in range(N):
     8    638466  245690483.0    384.8     36.5          for _ in range(max_steps):
     9    634804  416341239.0    655.9     61.9              if random.random() < p_abs:
    10      6338    3079014.0    485.8      0.5                  absorbed += 1
    11      6338    2911334.0    459.3      0.4                  break
    12         1        865.0    865.0      0.0      return absorbed



In [21]:
%%file simulate_cython.pyx
# cython: linetrace=True
# distutils: define_macros=CYTHON_TRACE_NOGIL=1

import cython
from libc.stdlib cimport rand, RAND_MAX
from libc.time cimport time

@cython.binding(True)
def simulate_cython(int N=10_000, double p_abs=0.01, int max_steps=100):
    cdef:
        int absorbed = 0
        int i, j
        double rand_val

    # Initialize random seed
    cdef unsigned int seed = time(NULL)

    for i in range(N):
        for j in range(max_steps):
            rand_val = <double>rand() / (RAND_MAX + 1.0)
            if rand_val < p_abs:
                absorbed += 1
                break
    return absorbed

Writing simulate_cython.pyx


In [22]:
from distutils.core import setup
from Cython.Build import cythonize
import sys, os

# Workaround for Colab's temporary filesystem
os.chdir('/content')
setup(ext_modules=cythonize('simulate_cython.pyx', compiler_directives={'linetrace': True}))

Compiling simulate_cython.pyx because it changed.
[1/1] Cythonizing simulate_cython.pyx


  tree = Parsing.p_module(s, pxd, full_module_name)
ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/setuptools/_distutils/fancy_getopt.py", line 245, in getopt
    opts, args = getopt.getopt(args, short_opts, self.long_opts)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/getopt.py", line 95, in getopt
    opts, args = do_shorts(opts, args[0][1:], shortopts, args[1:])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/getopt.py", line 195, in do_shorts
    if short_has_arg(opt, shortopts):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/getopt.py", line 211, in short_has_arg
    raise GetoptError(_('option -%s not recognized') % opt, opt)
getopt.GetoptError: option -f not recognized

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/setuptools/_distutils/core.py", line 170, in setup
    ok = dist.p

TypeError: object of type 'NoneType' has no len()

In [23]:
import pyximport
pyximport.install()

import simulate_cython

In [24]:
from line_profiler import LineProfiler

lp_cy = LineProfiler()
lp_wrapper_cy = lp_cy(simulate_cython.simulate_cython)
%time result_cy = lp_wrapper_cy()
print(f"Cython result: {result_cy}")
lp_cy.print_stats()

CPU times: user 14.2 ms, sys: 0 ns, total: 14.2 ms
Wall time: 14.1 ms
Cython result: 6345
Timer unit: 1e-09 s



In [27]:
import timeit

n = 10_000
p = 0.01
steps = 100

# Python timing
py_time = timeit.timeit(lambda: simulate_serial(n, p, steps), number=10)
print(f"Python average time: {py_time/10:.4f}s")

# Cython timing
cy_time = timeit.timeit(lambda: simulate_cython.simulate_cython(n, p, steps), number=10)
print(f"Cython average time: {cy_time/10:.4f}s")
print(f"Speedup factor: {py_time/cy_time:.1f}x")

Python average time: 0.0375s
Cython average time: 0.0128s
Speedup factor: 2.9x


## Vectorized Implementation

Objective: Optimize using array operations.

### With Numpy

In [None]:
import numpy as np
from line_profiler import LineProfiler
from memory_profiler import profile

# Vectorized simulation
def simulate_numpy(N=10_000, p_abs=0.01, max_steps=100):
    # Generate all random numbers at once
    steps = np.random.rand(N, max_steps)  # Shape: (N, max_steps)

    # Check absorption per neutron (any step < p_abs)
    absorbed = np.any(steps < p_abs, axis=1).sum()

    return absorbed

# Line-level runtime profiling
lp = LineProfiler()
lp.add_function(simulate_numpy) # Add the function to be profiled
lp_wrapper = lp(simulate_numpy) # Wrap the function using lp

# Execute with profiling, this will call the wrapped function
result = lp_wrapper(N=10_000, p_abs=0.01, max_steps=100)

# Print line profiling stats
lp.print_stats()

# Print memory profiling results (will be printed to the console)

Timer unit: 1e-09 s

Total time: 0.0124622 s
File: <ipython-input-23-639abaf06017>
Function: simulate_numpy at line 6

Line #      Hits         Time  Per Hit   % Time  Line Contents
     6                                           def simulate_numpy(N=10_000, p_abs=0.01, max_steps=100):
     7                                               # Generate all random numbers at once
     8         1   10791134.0    1e+07     86.6      steps = np.random.rand(N, max_steps)  # Shape: (N, max_steps)
     9                                               
    10                                               # Check absorption per neutron (any step < p_abs)
    11         1    1670348.0    2e+06     13.4      absorbed = np.any(steps < p_abs, axis=1).sum()
    12                                               
    13         1        748.0    748.0      0.0      return absorbed



In [None]:
%%writefile my_script.py
import numpy as np

def simulate_numpy(N=10_000, p_abs=0.01, max_steps=100):
    # Generate all random numbers at once
    steps = np.random.rand(N, max_steps)  # Shape: (N, max_steps)

    # Check absorption per neutron (any step < p_abs)
    absorbed = np.any(steps < p_abs, axis=1).sum()

    return absorbed

Writing my_script.py


## C++ version

In [None]:
%%shell
cat << 'EOF' > neutron_vectorized.cpp
#include <iostream>
#include <vector>
#include <random>
#include <chrono>
#include <algorithm>

int simulate_cpp_vectorized(int N=10000, double p_abs=0.01, int max_steps=100) {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_real_distribution<> dis(0.0, 1.0);

    std::vector<bool> absorbed(N, false);  // Vectorized absorption flags

    for (int i = 0; i < N; ++i) {
        for (int step = 0; step < max_steps; ++step) {
            if (dis(gen) < p_abs) {
                absorbed[i] = true;
                break;
            }
        }
    }

    return std::count(absorbed.begin(), absorbed.end(), true);
}

int main() {
    auto start = std::chrono::high_resolution_clock::now();
    int result = simulate_cpp_vectorized();
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = end - start;

    std::cout << "C++ Vectorized Result: " << result
              << " | Time: " << elapsed.count() << "s\n";
    return 0;
}
EOF

g++ -O3 -o neutron_vectorized neutron_vectorized.cpp && ./neutron_vectorized

C++ Vectorized Result: 6274 | Time: 0.0123285s




## Memory profiling

In [None]:
from my_script import simulate_numpy
from memory_profiler import profile

# Memory profiling (run separately)
%mprun -f simulate_numpy simulate_numpy(100000)




Assignment:

1. How to reduce the memory usage of the vectorized version?
2. Why `absorbed` did not take memory?

## Lecture 3: Multiprocessing Parallelization

Objective: Distribute work across CPU cores.

In [None]:
from multiprocessing import Pool

def simulate_neutron(args):
    p_abs, max_steps = args
    for _ in range(max_steps):
        if random.random() < p_abs:
            return 1
    return 0

def simulate_parallel(N=10_000, p_abs=0.01, max_steps=100):
    with Pool(2) as pool:  # Use 4 cores
        results = pool.map(simulate_neutron, [(p_abs, max_steps)] * N)
    return sum(results)
start = time.time()
print("Parallel Result:", simulate_parallel())
print(f"Parallel Time: {time.time() - start:.4f}s")

Parallel Result: 6328
Parallel Time: 0.0978s


Explanation:

    Pool.map splits N neutrons across workers.

    Each neutron simulation is independent (embarrassingly parallel).

Assignment:

    Benchmark performance for Pool(2) vs. Pool(8) on Google Colab.

## Lecture 4: GPU Acceleration with CuPy

Objective: Offload computation to GPU.

In [None]:
!apt-get update
!apt-get install -y --no-install-recommends cuda-drivers

# Check driver version to confirm update
!nvidia-smi

0% [Working]            Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
0% [Connecting to archive.ubuntu.com (91.189.91.82)] [Connecting to security.ubuntu.com (185.125.1900% [Connecting to archive.ubuntu.com (91.189.91.82)] [Connecting to security.ubuntu.com (185.125.190                                                                                                    Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:6 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:8 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,381 kB]
Hit:9 https://ppa.launchpadcontent.n

In [None]:
!pip install cupy-cuda11x  # Install CuPy for Colab GPU

import cupy as cp

def simulate_gpu(N=10_000, p_abs=0.01, max_steps=100):
    steps = cp.random.rand(N, max_steps)
    return (cp.any(steps < p_abs, axis=1)).sum()

start = time.time()
result = simulate_gpu().get()  # Move data from GPU to CPU
print(f"GPU Result: {result} | Time: {time.time() - start:.4f}s")



CUDARuntimeError: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version

Explanation:

    cp.random.rand generates random numbers on the GPU.

    Operations like cp.any execute in parallel on GPU threads.

Assignment:

    Test with max_steps=1000 and compare GPU/CPU runtimes.

Lecture 5: Variance Reduction Techniques

Objective: Implement Russian Roulette for faster convergence.

In [None]:
def simulate_russian_roulette(N=10_000, p_abs=0.01, survival_prob=0.5):
    absorbed = 0
    for _ in range(N):
        weight = 1.0
        while True:
            if random.random() < p_abs:
                absorbed += weight
                break
            # Russian Roulette: Kill neutron with 50% probability
            if random.random() > survival_prob:
                break
            weight /= survival_prob  # Adjust weight
    return absorbed

print("With Variance Reduction:", simulate_russian_roulette())

Explanation:

    Low-weight neutrons are probabilistically terminated to save computation.

    survival_prob balances computation and statistical bias.

Assignment:

    Compare convergence rates with/without Russian Roulette.

Lecture 6: OpenMC Reactor Simulation

Objective: Simulate a 3D fuel rod using OpenMC.

In [None]:
!pip install --pre openmc
!python -m openmc.install

import openmc

# Define materials
fuel = openmc.Material()
fuel.add_element('U', 1.0, enrichment=4.25)
fuel.set_density('g/cm3', 10.0)

# Define geometry
sphere = openmc.Sphere(r=100.0)
cell = openmc.Cell(fill=fuel, region=-sphere)
geometry = openmc.Geometry([cell])

# Settings
settings = openmc.Settings()
settings.particles = 1000
settings.batches = 10

# Run simulation
model = openmc.Model(geometry=geometry, materials=openmc.Materials([fuel]), settings=settings)
model.run()

Explanation:

    OpenMC uses real nuclear data libraries (e.g., ENDF/B-VIII).

    Tallies track absorption, fission, etc., in 3D geometry.

Assignment:

    Add a water moderator around the fuel and compare absorption rates.

Lecture 7: Error Analysis

Objective: Compute statistical uncertainty.

In [None]:
def simulate_with_error(N=10_000, p_abs=0.01, n_batches=10):
    results = []
    for _ in range(n_batches):
        absorbed = simulate_numpy(N // n_batches, p_abs)
        results.append(absorbed)
    mean = np.mean(results)
    std = np.std(results) / np.sqrt(n_batches)
    return mean, std

mean, std = simulate_with_error()
print(f"Absorption: {mean:.1f} ± {2*std:.1f} (95% CI)")

Explanation:

    Batches reduce correlation between samples.

    Standard error decreases as 1/sqrt(n_batches).

Assignment:

    Plot confidence intervals vs. number of batches.

Lecture 8: MPI for HPC

Objective: Scale simulations across nodes.

In [None]:
!pip install mpi4py

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

def simulate_mpi(N=10_000, p_abs=0.01):
    chunk = N // size
    local_absorbed = simulate_numpy(chunk, p_abs)
    total = comm.reduce(local_absorbed, op=MPI.SUM)
    if rank == 0:
        return total

print("MPI Result:", simulate_mpi())

Explanation:

    mpi4py splits N neutrons across MPI ranks.

    comm.reduce aggregates results to rank 0.

Assignment:

    Run on 4 MPI processes and measure weak scaling efficiency.

Lecture 9: ML for Variance Reduction

Objective: Use a neural network to guide neutron paths.

In [None]:
import tensorflow as tf

# Train a surrogate model to predict absorption probability
model = tf.keras.Sequential([
    tf.keras.layers.Dense(32, activation='relu', input_shape=(3,)),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='mse')

# Hybrid simulation (pseudo-code)
def simulate_ml(N=1000):
    absorbed = 0
    for _ in range(N):
        position = np.random.rand(3)  # 3D position
        p_abs_pred = model.predict(position.reshape(1, -1))[0][0]
        if random.random() < p_abs_pred:
            absorbed += 1
    return absorbed

Explanation:

    The neural network predicts location-dependent absorption probabilities.

    Simulations focus on high-probability regions.

Assignment:

    Train the model on OpenMC data and compare convergence rates.

Lecture 10: Final Project

Objective: Optimize a 2D reactor simulation.
Guidelines:

    Combine GPU acceleration (CuPy), variance reduction, and MPI.

    Compare runtime/accuracy trade-offs.

    Visualize neutron flux distribution.

In [None]:
# 2D reactor core with materials
def simulate_2d_reactor(size=100):
    flux = np.zeros((size, size))
    for _ in range(N):
        x, y = np.random.randint(0, size, 2)
        # Track neutron path in 2D grid
        ...
    return flux