---
# **LAB 1 - Intro to Numba**
---

# ‚ñ∂Ô∏è Google Colaboratory (colab)

[Colaboratory](https://research.google.com/colaboratory/faq.html) (or Colab) is a **free research tool** from *Google* for machine learning education and research built on top of [Jupyter Notebook](https://jupyter.org/). It requires no setup and runs entirely in the **cloud**. In Google Colab you can write, execute, save and share your Jupiter Notebooks. You access powerful computing resources like **TPUs** and **GPUs** all for free through your browser. All major Python libraries like **Tensorflow**, **Scikit-learn**, **PyTorch**, **Pandas**, etc. are pre-installed. Google Colab requires no configuration, you only need a **Google Account** and then you are good to go. Your notebooks are stored in your **Google Drive**, or can be loaded from **GitHub**. Colab notebooks can be shared just as you would with Google Docs or Sheets. Simply click the Share button at the top right of any Colab notebook, or follow these Google Drive file sharing instructions.




### Notebook rules

Some basic notebook rules:


1.   Click inside a cell with code and press SHIFT+ENTER (or click "PLAY" button) to execute it.
2.   Re-executing a cell will reset it (any input will be lost).
3.   Execute cells TOP TO BOTTOM.
5. Notebooks are saved to your Google Drive
6. Mount your Google Drive to have a direct access from a notebook to the files stored in the drive (this includes Team Drives).
7. If using Colab's virtual storage only, all the uploaded/stored files will get deleted when a runtime is recycled.

### Shell commands

The command `uname` displays the information about the system.

* **-a option:** It prints all the system information in the following order: Kernel name, network node hostname,
kernel release date, kernel version, machine hardware name, hardware platform, operating system
.

In [5]:
!uname -a && cat /etc/*release

Linux 588a60014de5 6.6.105+ #1 SMP Thu Oct  2 10:42:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.4 LTS"
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy


In [6]:
!pwd

/content


In [7]:
!ls -la

total 16
drwxr-xr-x 1 root root 4096 Dec  9 14:41 .
drwxr-xr-x 1 root root 4096 Jan 16 13:04 ..
drwxr-xr-x 4 root root 4096 Dec  9 14:41 .config
drwxr-xr-x 1 root root 4096 Dec  9 14:42 sample_data


### Set up Google Drive...

That snippet is used in **Google Colab** to mount your **Google Drive**.

- Imports the **Colab utility** to access Google Drive
- Mounts your Drive at the **path**:
`/content/drive/MyDrive/`

In [9]:
from google.colab import drive
drive.mount('/content/drive')

KeyboardInterrupt: 

# ‚ñ∂Ô∏è CUDA tools...

**NVIDIA System Management Interface (nvidia-smi)**

The NVIDIA System Management Interface (**`nvidia-smi`**) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the **management** and **monitoring** of NVIDIA GPU devices.

This utility allows administrators to query GPU device state and with the appropriate privileges, permits administrators to modify GPU device state.  It is targeted at the TeslaTM, GRIDTM, QuadroTM and Titan X product, though limited support is also available on other NVIDIA GPUs.

For more details, please refer to the **`nvidia-smi`** documentation ([doc](http://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf))

For information on **Tesla T4** see:

In [10]:
!nvidia-smi

Fri Jan 16 13:09:02 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   54C    P8             10W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

**Numba code used as a CUDA sanity check**:
-   Imports Numba, a JIT compiler that accelerates Python code.
-	numba.cuda provides GPU (CUDA) support using NVIDIA GPUs.
	- ‚úî Confirms NumPy and Numba are installed
	- ‚úî Confirms CUDA drivers are visible
	- ‚úî Confirms GPU compute capability
	- ‚úî Helps debug environment issues before running GPU kernels

Probes the system for available CUDA-capable GPUs. Prints:
-	Number of GPUs
-	GPU names
-	Compute capability
-	Driver/runtime status


In [2]:
import numpy as np
import numba
from numba import cuda
import warnings
warnings.filterwarnings("ignore")

print(np.__version__)
print(numba.__version__)

cuda.detect()



2.3.5
0.63.1
Found 1 CUDA devices
id 0    b'NVIDIA GeForce MX330'                              [SUPPORTED]
                      Compute Capability: 6.1
                           PCI Device ID: 0
                              PCI Bus ID: 59
                                    UUID: GPU-01144e3b-2add-f806-624d-45fa97ef273a
                                Watchdog: Enabled
                            Compute Mode: WDDM
             FP32/FP64 Performance Ratio: 32
Summary:
	1/1 devices are supported


True

# ‚úÖ Hello World!

**My first CUDA program: HelloFromGPU!**

CUDA kernel for Hello world...

In [12]:
from numba import cuda

@cuda.jit
def hello_kernel():
    print("Hello from GPU!")
    
# launch with 1 block, 10 thread
hello_kernel[1, 10]()    # type: ignore
cuda.synchronize()

# ‚úÖ Parallel vector sum

In [13]:
import numpy as np
from numba import cuda

@cuda.jit
def add_arrays_cuda(a, b, c):
    i = cuda.grid(1)
    if i < c.size:
        c[i] = a[i] + b[i]

# Example
n = 1000    # size of arrays = number of threads
a = np.ones(n, dtype=np.float32)
b = np.ones(n, dtype=np.float32)

d_a = cuda.to_device(a)
d_b = cuda.to_device(b)
d_c = cuda.device_array_like(a)

add_arrays_cuda[1, n](d_a, d_b, d_c)
cuda.synchronize()

c = d_c.copy_to_host()
print(c[0:10], c[-10:])

[2. 2. 2. 2. 2. 2. 2. 2. 2. 2.] [2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]


## ‚ÜòÔ∏è TODO...

**Check Fibonacci Membership** 

üîπ **Exercise Goal**

-   Given a vector $x$ of integers, build a CUDA kernel (Numba) that produces a vector $y$ such that:

$$
y_i =
\begin{cases}
1 & \text{if } x_i \text{ is a Fibonacci number} \\
0 & \text{otherwise}
\end{cases}
$$


üîπ **Input** 

- `x`: 1D array of integers (e.g., `np.int32`)
- `n = x.size`

üîπ **Output**

- `y`: 1D array of `np.uint8` or `np.int32`
- Same length as `x`


üîπ **Constraints**

- One GPU thread handles one element:
  $$
  i = \text{cuda.grid}(1)
  $$
- Must include a bounds check:
  $$
  i < n
  $$
- Avoid Python objects and dynamic allocation inside the kernel


üîπ **A classic property:**

> An integer $v \ge 0$ is Fibonacci **iff** one of these is a perfect square:
$$
5v^2 + 4 \quad \text{or} \quad 5v^2 - 4
$$


üîπ **Your Tasks**

1. Write a device function:
   - `is_perfect_square(m) -> bool`

2. Write a device function:
   - `is_fib(v) -> bool`

3. Write a kernel:
   - `fib_mask(x, y)` that sets `y[i] = 1` if `x[i]` is Fibonacci else `0`

4. Write host code to:
   - allocate/copy arrays to GPU
   - launch the kernel
   - copy results back and validate

üîπ **Expected result** 
-    for the first $n = 1000$ integers: [0,   1,   2,   3,   5,   8,  13,  21,  34,  55,  89, 144, 233, 377, 610, 987]

üîπ **Skeleton Code (Fill the TODOs)**

```{python}
    #| echo: true
    import numpy as np
    from numba import cuda
    import math

    @cuda.jit
    def fib_numbers(x,y):
        # TODO
        pass

    # input data
        # TODO

    # GPU memory allocation
        # TODO

    # Kernel launch
        # TODO

    # Copy back results & print indexes of Fibonacci numbers
        # TODO
```

In [20]:
import math
@cuda.jit(device=True, inline=True)
def is_perfect_square(m) -> bool:
    if m < 0:
        return False
    s = int(math.sqrt(m))
    return s*s == m

@cuda.jit(device=True, inline=True)
def is_fib(v) -> bool:
    vv = 5*v**2
    return is_perfect_square(vv - 4) or is_perfect_square(vv + 4)

# kernel
@cuda.jit
def fib_numbers(x, y):
    id = cuda.grid(1)

    if id < y.size and is_fib(x[id]): 
        y[id] = 1


In [23]:
n = 1024
x = np.arange(n, dtype=np.int32)
d_x = cuda.to_device(x)
d_y = cuda.device_array_like(d_x)

fib_numbers[1, n](d_x, d_y)
cuda.synchronize()
y = d_y.copy_to_host()

print(y.nonzero())

(array([  0,   1,   2,   3,   5,   8,  13,  21,  34,  55,  89, 144, 233,
       377, 610, 987]),)
