# python

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/paga-hb/C1PD2C_2025/blob/main/notebooks/python.ipynb)


This notebook shows how to use a normal python notebook with shell commands to create, compile, and run C/C++/CUDA/OpenCL programs.

Requirements for the notebook to work:

- NVIDIA GPU + Drivers (runtime)
  - You need a supported NVIDIA GPU and the appropriate driver version installed to actually run the CUDA code.
- CUDA Toolkit
  - Includes nvcc (the NVIDIA CUDA Compiler).
  - Provides headers and libraries needed for compiling GPU code.
  - Must be compatible with your system's GPU drivers.
- OpenCL Toolkit
  - On Linux: usually g++ or clang++.
  - On Windows: MSVC (Microsoft Visual C++) or WSL with Linux tools.
  - On macOS: CUDA isn't officially supported for newer GPUs, so this is trickier.
- C/C++ Compiler
  - On Linux: usually g++ or clang++.
  - On Windows: MSVC (Microsoft Visual C++) or WSL with Linux tools.
  - On macOS: CUDA isn't officially supported for newer GPUs, so this is trickier.

The notebook executes shell commands with `!` or `%%system`.

In [37]:
# Create folders for code files (src), header files (include), and executable files (build) 
!mkdir -p ../src
!mkdir -p ../include
!mkdir -p ../build

---
## C

In [20]:
%%file ../src/c_hello.c
// main.c
#include <stdio.h>

int main(void)
{
    printf("%s", "Hello from C!");
    return 0;
}

Writing ../src/c_hello.c


In [21]:
!gcc -g -I ../include ../src/c_hello.c -o ../build/c_hello

In [22]:
!clang -g -I ../include ../src/c_hello.c -o ../build/c_hello

In [23]:
!../build/c_hello

Hello from C!

---
## C++

In [24]:
%%file ../src/cpp_hello.cpp
// main.cpp
#include <iostream>

// using namespace std;

int main(void)
{
    std::cout << "Hello from C++!" << std::endl;
    // cout << "Hello World!" << endl;
    return 0;
}

Writing ../src/cpp_hello.cpp


In [25]:
!g++ -g -I ../include ../src/cpp_hello.cpp -o ../build/cpp_hello

In [26]:
!clang++ -g -I ../include ../src/cpp_hello.cpp -o ../build/cpp_hello

In [27]:
!../build/cpp_hello

Hello from C++!


---
## CUDA C

In [28]:
%%file ../src/c_cuda_hello.cu
#include <stdio.h>

__global__ void hello()
{
    // CUDA supports the printf() function on the device
    printf("Hello from device -> block: %u, thread: %u\n", blockIdx.x, threadIdx.x);
}

int main()
{
    printf("Hello from host\n");
    hello<<<2, 2>>>();
    cudaDeviceSynchronize();
    return 0;
}

Writing ../src/c_cuda_hello.cu


In [29]:
!nvcc -g -G -Wno-deprecated-gpu-targets -I ../include ../src/c_cuda_hello.cu -o ../build/c_cuda_hello

In [30]:
!../build/c_cuda_hello

Hello from host
Hello from device -> block: 1, thread: 0
Hello from device -> block: 1, thread: 1
Hello from device -> block: 0, thread: 0
Hello from device -> block: 0, thread: 1


---
## CUDA C++

In [31]:
%%file ../src/cpp_cuda_hello.cu
#include <iostream>

using namespace std;

__global__ void hello()
{
    // Note! CUDA does NOT support cout on the device (so use printf() for debugging)
    printf("Hello from device block: %u, thread: %u\n", blockIdx.x, threadIdx.x);
}

int main()
{
    // It's fine to use cout on the host
    cout << "Hello from host" << endl;
    hello<<<2, 2>>>();
    cudaDeviceSynchronize();
    return 0;
}

Writing ../src/cpp_cuda_hello.cu


In [32]:
!nvcc -g -G -Wno-deprecated-gpu-targets -I ../include ../src/cpp_cuda_hello.cu -o ../build/cpp_cuda_hello

In [33]:
!../build/cpp_cuda_hello

Hello from host
Hello from device block: 1, thread: 0
Hello from device block: 1, thread: 1
Hello from device block: 0, thread: 0
Hello from device block: 0, thread: 1


In [34]:
%%file ../src/largest_element.cu
#include <cstdio>
#include <iostream>

using namespace std;

__global__ void maxi(int* a, int* b, int n)
{
    int block = 256 * blockIdx.x;
    int max = 0;

    for (int i = block; i < min(256 + block, n); i++) {

        if (max < a[i]) {
            max = a[i];
        }
    }
    b[blockIdx.x] = max;
}

int main()
{
    int n;
    n = 3 << 2;
    int a[n];

    cout << "Elements: ";
    for (int i = 0; i < n; i++) {
        a[i] = rand() % n;
        cout << a[i] << "\t";
    }

    cudaEvent_t start, end;
    int *ad, *bd;
    int size = n * sizeof(int);
    cudaMalloc(&ad, size);
    cudaMemcpy(ad, a, size, cudaMemcpyHostToDevice);
    int grids = ceil(n * 1.0f / 256.0f);
    cudaMalloc(&bd, grids * sizeof(int));

    dim3 grid(grids, 1);
    dim3 block(1, 1);

    cudaEventCreate(&start);
    cudaEventCreate(&end);
    cudaEventRecord(start);

    while (n > 1) {
        maxi<<<grids, block>>>(ad, bd, n);
        n = ceil(n * 1.0f / 256.0f);
        cudaMemcpy(ad, bd, n * sizeof(int), cudaMemcpyDeviceToDevice);
    }

    cudaEventRecord(end);
    cudaEventSynchronize(end);

    float time = 0;
    cudaEventElapsedTime(&time, start, end);

    int ans[2];
    cudaMemcpy(ans, ad, 4, cudaMemcpyDeviceToHost);

    cout << "\nThe largest element is: " << ans[0] << endl;

    cout << "The time required: " << time << " seconds" << endl;
}

Writing ../src/largest_element.cu


In [35]:
!nvcc -g -G -Wno-deprecated-gpu-targets -I ../include ../src/largest_element.cu -o ../build/largest_element

In [36]:
!../build/largest_element

Elements: 7	10	9	7	5	7	10	0	9	1	2	7	
The largest element is: 10
The time required: 0.18944 seconds


In [39]:
# Remove folders with code files, header files, and executable files
!rm -rf ../src
!rm -rf ../include
!rm -rf ../build