# Lesson 5b: GPU programming in Python

## Random numbers

<br>

Not all problems are data in → data out. We like Monte Carlo, which generates its own data from random numbers.

<br>

Computers (including GPUs) are deterministic, so "random numbers" really means "arbitrary numbers," a sequence that starts with a seed number that would be unsurprising if we were expecting a uniformly distributed, statistically independent set.

<br>

In [None]:
import numpy as np

# np.random.seed(12345)

np.random.uniform(0, 1, 25)

In [None]:
from hist import Hist

np.random.seed(12345)
Hist.new.Reg(1000, 0, 1).Double().fill(np.random.uniform(0, 1, 1000000)).plot();

**Note:** if you want "truly" random numbers, Python has a way to set the seed with an arbitrary value. The `os.urandom` function returns random bytes from `/dev/urandom` (the operating system), which you can cast into an integer.

<br>

In [None]:
import os

os.urandom(4)

<br>

In [None]:
np.array(os.urandom(4)).view(np.uint32)

<br>

In [None]:
np.random.seed(np.array(os.urandom(4)).view(np.uint32))

Random numbers get particularly [interesting when parallel processing](https://kaushikghose.wordpress.com/2013/11/22/random-numbers-in-a-parallel-world/).

<br>

In [None]:
import multiprocessing

def child(n):
    # np.random.seed(np.array(os.urandom(4)).view(np.uint32))
    return np.random.normal(0, 1, 6)

pool = multiprocessing.Pool()
for line in pool.map(child, range(15)):
    print(line)

For proper seeding, CUDA (and Numba) come with specialized functions to send appropriate seeds to each thread.

In Numba, these are `numba.cuda.random.*xoroshiro128p*` ([docs](https://numba.readthedocs.io/en/stable/cuda/random.html)).

In [None]:
import cupy as cp
import numba as nb
import numba.cuda
from numba.cuda.random import create_xoroshiro128p_states
from numba.cuda.random import xoroshiro128p_uniform_float32

@nb.cuda.jit
def generate_uniform(rng_states, out):
    thread_idx = nb.cuda.grid(1)
    for j in range(1000):
        out[thread_idx, j] = xoroshiro128p_uniform_float32(rng_states, thread_idx)

out = cp.empty((10000, 1000), dtype=np.float32)

num_threads = 1024
num_blocks = int(np.ceil(len(out) / 1024))

rng_states = create_xoroshiro128p_states(num_threads * num_blocks, seed=12345)

generate_uniform[num_blocks, num_threads](rng_states, out)
out

In [None]:
Hist.new.Reg(10000, 0, 1).Double().fill(out.get().flatten()).plot();

I want to leave extra time for you to work on the last project and for your feedback.

<br><br><br><br><br>

It's a fun one!