In [2]:
import os
from IPython.display import Code
# Define the desired working directory
new_directory = "../src"
# Change the current working directory
os.chdir(new_directory)

def displaycode(path, number=True, lang='c', start=None, end=None):
    """
    Display a snippet of the file at `path` in a Jupyter notebook.

    Args:
      path   – path to the source file
      number – if True, prefix each line with its line number
      lang   – language for syntax‐highlighting
      start  – 1‐based first line to include (inclusive). Defaults to 1.
      end    – 1‐based last line to include (inclusive). Defaults to EOF.
    """
    # Read all lines
    with open(path, 'r') as f:
        all_lines = f.readlines()
    total = len(all_lines)

    # Normalize start/end to valid 1-based indices
    if start is None or start < 1:
        start = 1
    if end is None or end > total:
        end = total
    if start > end:
        return Code("", language=lang)  # empty if bad range

    # Slice the desired block (python slices are 0-based and end-exclusive)
    snippet = all_lines[start-1:end]

    if number:
        # Determine width from the max line number
        width = len(str(end))
        # Enumerate, but start numbering at `start`
        numbered = (
            f"{i:0{width}d}: {line}"
            for i, line in enumerate(snippet, start=start)
        )
        payload = "".join(numbered)
    else:
        payload = "".join(snippet)

    return Code(payload, language=lang)

# Hello World

Getting started with a simple CUDA 'C' Hello world program.

## Take 1: Host vs Device functions

Lets start with a 

## Final Version

In [3]:
displaycode("../src/hello.cu")

### Compile and Link

In [3]:
!bash -v -c "nvcc -o hello hello.cu"

nvcc -o hello hello.cu


### Run

In [4]:
!bash -v -c "./hello"

./hello
Hello from CPU


#### What's Wrong?

Wait a second, maybe we'd better check to see that we have a working GPU and CUDA runtime!

### `nvidia-smi`

A Standard way to check the state of CUDA is to run the `nvidia-smi` tool, which, with no arguments, reports the available GPU devices and other helpful info.

In [5]:
!bash -v -c "nvidia-smi"

nvidia-smi
bash: line 1: nvidia-smi: command not found


Well, that does not bode well.  But before we discuss how to get our code running on a GPU, let's use our newfound CUDA programming skills to write code that checks the state of the GPU devices and CUDA for ourselves. 

## gpuinfo.cu



We will write a little utility that retrieves and displays basic facts about the CUDA environment on the system where we run the binary, as well as information about the GPU devices. Hopefully, this code will help us develop a mental model of the CUDA API and give us a better sense of how to use the documentation.  

In the end, the code is straightforward to write once you get a sense for the basic design of the CUDA API. 

### cuda_runtime.h

CUDA includes many API calls that allow you to query (probe) the hardware and software configuration at runtime so that you can make your code flexible, robust, and self-adapting!  

Given that our installed CUDA version is 12.6.3, we can consult the 12.6.3 documentation available at https://docs.nvidia.com/cuda/archive/12.6.3/.  

#### `cudaError_t`
[documentation link](https://docs.nvidia.com/cuda/archive/12.6.3/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gf599e5b8b829ce7db0f5216928f6ecb6)

Most CUDA calls return a value whose type is `cudaError_t`.  As is the case whenever writing good, especially low-level, code, you should always check the return value from a CUDA call and ensure the call was successful, and if not, print out an appropriate error message.   CUDA makes this very easy.  First the values of `cudaError_t` are well documented here [`enum cudaError`](https://docs.nvidia.com/cuda/archive/12.6.3/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038).  Furthermore, testing for success is easy and translating error numbers to a helpful message is easy too. 

See also `cudaSuccess` and `cudaGetErrorString` in the documentation.

#### Driver, runtime, and device count

The three basic CUDA environment facts we will query (using CUDA runtime API calls) are: 

1. the version the CUDA OS device driver that is installed ([`cudaDriverGetVersion`](https://docs.nvidia.com/cuda/archive/12.6.3/cuda-runtime-api/group__CUDART____VERSION.html#group__CUDART____VERSION_1g8a06ee14a0551606b7c780084d5564ab))
2. the version CUDA Runtime software tools and libraries ([`cudaRuntimeVersion`](https://docs.nvidia.com/cuda/archive/12.6.3/cuda-runtime-api/group__CUDART____VERSION.html#group__CUDART____VERSION_1g0e3952c7802fd730432180f1f4a6cdc6
3. the number of GPU Devices recognized by CUDA on the system ([`cudaGetDeviceCount`](https://docs.nvidia.com/cuda/archive/12.6.3/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g18808e54893cfcaafefeab31a73cc55f))

#### Device properties

We will end our code with a loop that queries each of the devices to print out some basic properties of each GPU on the system.  We will use the CUDA type [`cudaDeviceProp`](https://docs.nvidia.com/cuda/archive/12.6.3/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp) and [`cudaGetDeviceProperties`](https://docs.nvidia.com/cuda/archive/12.6.3/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g1bf9d625a931d657e08db2b4391170f0API) call.  

In [6]:
displaycode("../src/gpuinfo.cu")

In [4]:
%%bash 
touch gpuinfo.cu
make gpuinfo

nvcc  gpuinfo.cu -o gpuinfo


In [8]:
!./gpuinfo

CUDA Driver Version: 0.0
CUDA Runtime Version: 12.6
cudaGetDeviceCount Error:35 and Count:0 
Error: CUDA driver version is insufficient for CUDA runtime version


Well there you have it our code can't find an OS Driver install and hence can't find any devices. 

## Running CUDA Binaries on the NERC MOC

As it turns out, GPU devices, especially Data Center GPUs, can be expensive and highly sought-after resources. To allow us to work with such GPUs, we will use a traditional batch computing model. Specifically, we will develop our code and build our binaries in a container that has the CUDA runtime (software tools and libraries). However, to run our code, we will use a script called `run` that will submit the specified "command" as a batch job to a queue of jobs for a specific type of GPU.  By default, `run` will submit your job to a queue that will be serviced by a node that has an [NVIDIA V100]( https://www.nvidia.com/en-sg/data-center/v100/).  There are queues for other GPU types such as [A100](https://www.nvidia.com/en-us/data-center/a100/) and [H100](https://www.nvidia.com/en-us/data-center/h100/) GPUS.  However, you will need to get permission from your instructor for these.  

### `run`

As part of the CUDASUDAWUDA (csw) resources we have written a simple script that use the [OpenShift](https://www.redhat.com/en/technologies/cloud-computing/openshif) command line interface (CLI) `oc` to generate the necessary `YAML` required to create and submit a batch job via the [Kubernetes Kueue System](https://kueue.sigs.k8s.io/).  By default the `run` will wait until the job has executed on the GPU before it returns.  Upon completion it will print out the output from the command and then cleanup any resources associated with the batch job that was created.  Most of the behavior of `run` can be customzied and over-ridden using environment variables.  Below we will provide some simple examples of using `run`.    If you find a bug or add a feature please be sure to contribute it back to the CUDASUDAWUDA repositories.  

#### Run the `gpuinfo` and `hello` binaries