### Homework of GPU programming for the urban analysis

Frank Chen

#### Step 1. Computer Set Up

In this section, I will introduce how to set up the PC for the GPU programming, especially for the PyCUDA programming.

First, you need to have a NVIDIA GPU card, and install the CUDA toolkit. You can download the [CUDA toolkit from the NVIDIA website](https://developer.nvidia.com/cuda-downloads) for your own operating system. As an example, I'm running the Windows 10 with an RTX 3070 graphic card, so I download and install the CUDA toolkit for Windows 10.

Next, you need to install the Visual Studio, which is required by the CUDA toolkit for C++ programming. If you don't want to use the complete version of Visual Studio, you can install the [Visual Studio Build Tools](https://visualstudio.microsoft.com/zh-hans/downloads/?q=build+tools) instead, which is more lightweight and includes the necessary components for the CUDA programming. Remember to choose "Desktop development with C++" when you install it, and add the file path containing the cl.exe to the system environment variable. For file path, please refer to [this post on stackoverflow](https://stackoverflow.com/questions/8125826/error-compiling-cuda-from-command-prompt).

Finally, you need to install the PyCUDA module, which is a Python wrapper for the CUDA programming. You can install it through conda or pip. I'm using mamba as the package manager, so I install the PyCUDA module by running the following command: `mamba install -c conda-forge pycuda`.

To examine whether the PyCUDA module is installed successfully, you can run `nvcc --version` in the terminal to check the version of the CUDA toolkit.

#### Step 2. Process the Data

In this section, I will introduce how to import the PyCUDA module and process the data for the GPU programming.

First, we need to import all the modules we need for the GPU programming. Then, we need to load the data using `rasterio` and copy the metadata of the input raster to the output raster. We also need to define the kernel function for the GPU programming. We then get the time and sun parameters including azimuth and elevation from the [NOAA solar position calculator](https://gml.noaa.gov/grad/solcalc/azel.html). Finally, we write a for loop to calculate the shadow distribution for each hour and export the result to tif file.

In [None]:
from pycuda.compiler import SourceModule
import pycuda
from pycuda import gpuarray
from pycuda import compiler
import pycuda.autoinit             # PyCuda autoinit
import pycuda.driver as cuda       # PyCuda In, Out helpers
import os, os.path
import rasterio as rio
from osgeo import gdal
from osgeo.gdalconst import *
import numpy as np
import rasterio
import time
import math
# Load the Drive helper and mount
# from google.colab import drive
import matplotlib.cm as colormap   # Library to plot
import numpy                       # Fast math library
import matplotlib.image as mpimg       # reading images to numpy arrays
import matplotlib.pyplot as plt        # to plot any graph
import scipy.ndimage as ndi            # to determine shape centrality
# matplotlib setup

%matplotlib inline
from pylab import rcParams
rcParams['figure.figsize'] = (8, 8)      # setting default size of plots

print("pycuda installed and ready to use.")

print("%d device(s) found." % cuda.Device.count())
for ordinal in range(cuda.Device.count()):
    dev = cuda.Device(ordinal)
    print ("Device #%d: %s" % (ordinal, dev.name()))
print (cuda)

dsm_file = "data/row11-col17.tif"

with rio.open(dsm_file) as dsm_dataset:
        # Read bounds and image data
        dsm_bounds = dsm_dataset.bounds
        dsm_img = dsm_dataset.read(1)
        print('The DSM bounds are:', dsm_bounds)

        # Get the affine transformation parameters
        transform = dsm_dataset.transform
        # Pixel sizes (cell sizes)
        pixel_width = transform[0]           # x-direction scale
        pixel_height = -transform[4]         # y-direction scale (usually negative)
        cell_size = (pixel_width, pixel_height)
        print('Cell size (width, height):', cell_size)

        # Copy metadata for later export
        metadata = dsm_dataset.meta.copy()

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule

# Define the CUDA kernel for the shadow calculation
kernel_shadow = """
#define PI 3.1415926

__global__ void calculate_shadow(float *dsm, bool *shadow, int width, int height, float cell_size, float sun_azimuth, float sun_elevation) {
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    // Check if the thread is within the bounds of the image
    if (x >= width || y >= height)
        return;

    int idx = y * width + x;

    // Convert sun azimuth and elevation to radians
    float azimuth_rad = sun_azimuth * 3.14159 / 180.0;
    float elevation_rad = sun_elevation * 3.14159 / 180.0;

    // Direction of the shadow ray based on sun azimuth
    float dx = sin(azimuth_rad);
    float dy = -cos(azimuth_rad);
    float dz = tan(elevation_rad);

    // Initial height at the current pixel
    float initial_height = dsm[idx];
    bool in_shadow = false;

    // Trace the shadow ray
    for (float t = cell_size; t < 2000.0f; t += cell_size) { // Limit tracing distance
        int x_offset = x + int(dx * t / cell_size);
        int y_offset = y + int(dy * t / cell_size);

        if (x_offset < 0 || x_offset >= width || y_offset < 0 || y_offset >= height)
            break;  // Out of bounds

        int offset_idx = y_offset * width + x_offset;

        // Calculate height along the ray
        float height_along_ray = initial_height + dz * t;

        // Check if there is a higher point along the path
        if (dsm[offset_idx] > height_along_ray) {
            in_shadow = true;
            break;
        }
    }

    // Store shadow result: 1 for shadow, 0 for no shadow
    shadow[idx] = in_shadow;
}
"""

# Compile the kernel
mod = SourceModule(kernel_shadow)
calculate_shadows = mod.get_function("calculate_shadow")

# Prepare parameters and allocate memory
height, width = dsm_img.shape

# Allocate memory for input DSM and output shadow map
d_dsm = cuda.mem_alloc(dsm_img.nbytes)

# Copy DSM to GPU
cuda.memcpy_htod(d_dsm, dsm_img)

# Initialize shadow output array on CPU
shadow_result = np.zeros_like(dsm_img, dtype=bool)

# Define block, grid, and cell size
block_size = (16, 16, 1)
grid_size = (int(np.ceil(width / block_size[0])), int(np.ceil(height / block_size[1])))
cell_size = np.float32(pixel_width)

# Time and sun parameters
time_azi_ele = [
      (8, 93.7, 32.87),
      (9, 104.85, 44.2),
      (10, 119.64, 54.83),
      (11, 141.98, 63.58),
      (12, 175.62, 67.82),
      (13, 211.21, 65.07),
      (14, 236, 57.06),
      (15, 252.08, 46.7),
      (16, 263.83, 35.47),
      (17, 273.63, 23.99),
      (18, 282.74, 12.65)
]

# for loop to generate shadow maps for each time and save to tif files
for time_val, sun_azimuth, sun_elevation in time_azi_ele:
    print(f"Processing time: {time_val}h, Azimuth: {sun_azimuth}, Elevation: {sun_elevation}")
    
    # Allocate memory for the shadow result on the GPU for this iteration.
    d_shadow = cuda.mem_alloc(shadow_result.nbytes)
    
    # Launch the CUDA kernel with the current sun parameters.
    calculate_shadows(
        d_dsm, d_shadow, np.int32(width), np.int32(height),
        np.float32(cell_size), np.float32(sun_azimuth), np.float32(sun_elevation),
        block=block_size, grid=grid_size
    )
    
    # Copy the computed shadow data back to the host.
    cuda.memcpy_dtoh(shadow_result, d_shadow)
    
    # Free the shadow memory for this iteration.
    d_shadow.free()
    
    # Prepare an output filename (e.g., "shadow_8.tif" for time=8).
    output_filename = f"data/shadow_{time_val}.tif"
    
    # For instance, we set the data type to uint8 (1 for shadow, 0 for no shadow)
    metadata.update(dtype=rasterio.uint8, count=1)
    
    # Remove the nodata field from the metadata if it exists, to avoid value range issues
    metadata.pop('nodata', None)

    # Convert the boolean shadow array to uint8 for saving (1 for True, 0 for False)
    shadow_uint8 = shadow_result.astype(np.uint8)
    
    # Save the shadow distribution as a GeoTIFF file using rasterio.
    with rasterio.open(output_filename, 'w', **metadata) as dst:
        dst.write(shadow_uint8, 1)
    
    print(f"Saved shadow file for time {time_val}h to {output_filename}")

# Free the DSM memory after all iterations are complete.
d_dsm.free()


pycuda installed and ready to use.
1 device(s) found.
Device #0: NVIDIA GeForce RTX 3070
<module 'pycuda.driver' from 'e:\\miniforge3\\envs\\spatial\\lib\\site-packages\\pycuda\\driver.py'>
The DSM bounds are: BoundingBox(left=2712748.0, bottom=259294.0, right=2717948.0, top=264494.0)
Cell size (width, height): (2.0, 2.0)
Processing time: 8h, Azimuth: 93.7, Elevation: 32.87
Saved shadow file for time 8h to data/shadow_8.tif
Processing time: 9h, Azimuth: 104.85, Elevation: 44.2
Saved shadow file for time 9h to data/shadow_9.tif
Processing time: 10h, Azimuth: 119.64, Elevation: 54.83
Saved shadow file for time 10h to data/shadow_10.tif
Processing time: 11h, Azimuth: 141.98, Elevation: 63.58
Saved shadow file for time 11h to data/shadow_11.tif
Processing time: 12h, Azimuth: 175.62, Elevation: 67.82
Saved shadow file for time 12h to data/shadow_12.tif
Processing time: 13h, Azimuth: 211.21, Elevation: 65.07
Saved shadow file for time 13h to data/shadow_13.tif
Processing time: 14h, Azimuth: 

#### Step 3. Visualize the Result

In this section, I will introduce how to visualize the result of the shadow distribution. We can use the `matplotlib` module to plot the shadow distribution for each hour, and save the plot as a png file. Then, we can use the `imageio` module to convert the png files to a gif file, and set the duration of each frame as 1000 ms.

In [55]:
import rasterio
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import imageio.v3 as iio

# List of times corresponding to shadow files generated earlier
times = list(range(8, 19))

# Loop through each time value to process each shadow file
for time_val in times:
    # Construct file paths for the input TIFF and output PNG
    tif_path = f"data/shadow_{time_val}.tif"
    png_path = f"data/shadow_{time_val}.png"
    
    # Open the shadow map using rasterio and read the first band
    with rasterio.open(tif_path) as src:
        shadow = src.read(1)
    
    # Create a new figure for visualization
    plt.figure(figsize=(8, 8))
    
    plt.imshow(shadow, cmap='viridis')
    plt.title(f"Shadow Distribution at {time_val}:00", pad=20, fontsize=18)
    plt.axis('off')  # Hide axes

    color_no_shadow = "#440154"  # for value 0
    color_shadow = "#FDE725"     # for value 1

    # Create custom legend patches using the extracted colors.
    shadow_patch = mpatches.Patch(facecolor=color_shadow, label='Shadow (1)')
    no_shadow_patch = mpatches.Patch(facecolor=color_no_shadow, label='No Shadow (0)')
    
    # Add the legend to the plot.
    plt.legend(handles=[no_shadow_patch, shadow_patch], loc='lower right', framealpha=0.7)
    
    # Save the figure to a PNG file and close the figure to free memory
    plt.savefig(png_path, bbox_inches='tight', pad_inches=0.2)
    plt.close()
    
    print(f"Saved PNG file: {png_path}")

png_files = [f"data/shadow_{time_val}.png" for time_val in times]

# Read each image and store them in a list
images = []
for file in png_files:
    images.append(iio.imread(file))

# Define the output GIF path
gif_path = "data/shadow_animation.gif"

iio.imwrite(gif_path, images, duration=1000)
print(f"Saved GIF: {gif_path}")

Saved PNG file: data/shadow_8.png
Saved PNG file: data/shadow_9.png
Saved PNG file: data/shadow_10.png
Saved PNG file: data/shadow_11.png
Saved PNG file: data/shadow_12.png
Saved PNG file: data/shadow_13.png
Saved PNG file: data/shadow_14.png
Saved PNG file: data/shadow_15.png
Saved PNG file: data/shadow_16.png
Saved PNG file: data/shadow_17.png
Saved PNG file: data/shadow_18.png
Saved GIF: data/shadow_animation.gif
