pmudry
diff --git a/‎CMakeLists.txt‎
Lines changed: 14 additions & 0 deletions b/‎CMakeLists.txt‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 2 additions & 2 deletions b/‎README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎explanations/CUDA_RENDER_EXPLANATION.md‎
Lines changed: 91 additions & 0 deletions b/‎explanations/CUDA_RENDER_EXPLANATION.md‎
Lines changed: 91 additions & 0 deletions
diff --git a/‎explanations/REALTIME_DISPLAY_README.md‎
Lines changed: 102 additions & 0 deletions b/‎explanations/REALTIME_DISPLAY_README.md‎
Lines changed: 102 additions & 0 deletions
diff --git a/‎ROUGH_MIRROR_README.md‎ ‎explanations/ROUGH_MIRROR_README.md‎ROUGH_MIRROR_README.md renamed to explanations/ROUGH_MIRROR_README.md b/‎ROUGH_MIRROR_README.md‎ ‎explanations/ROUGH_MIRROR_README.md‎ROUGH_MIRROR_README.md renamed to explanations/ROUGH_MIRROR_README.md
diff --git a/‎SPHERICAL_CHECKERBOARD_FIX.md‎ ‎…planations/SPHERICAL_CHECKERBOARD_FIX.md‎SPHERICAL_CHECKERBOARD_FIX.md renamed to explanations/SPHERICAL_CHECKERBOARD_FIX.md b/‎SPHERICAL_CHECKERBOARD_FIX.md‎ ‎…planations/SPHERICAL_CHECKERBOARD_FIX.md‎SPHERICAL_CHECKERBOARD_FIX.md renamed to explanations/SPHERICAL_CHECKERBOARD_FIX.md
diff --git a/‎TINTED_ROUGH_MIRROR_README.md‎ ‎…planations/TINTED_ROUGH_MIRROR_README.md‎TINTED_ROUGH_MIRROR_README.md renamed to explanations/TINTED_ROUGH_MIRROR_README.md b/‎TINTED_ROUGH_MIRROR_README.md‎ ‎…planations/TINTED_ROUGH_MIRROR_README.md‎TINTED_ROUGH_MIRROR_README.md renamed to explanations/TINTED_ROUGH_MIRROR_README.md
diff --git a/‎src/v0_single_threaded/camera.h‎
Lines changed: 122 additions & 3 deletions b/‎src/v0_single_threaded/camera.h‎
Lines changed: 122 additions & 3 deletions
@@ -99,6 +99,20 @@ endif()
 # Executables
 add_executable(v0_single_threaded ${EXTERNAL} ${SOURCE_V0_SINGLE_THREADED})
 
+# Find and link SDL2 for real-time display (optional)
+find_package(SDL2 QUIET)
+if(SDL2_FOUND)
+    include_directories(${SDL2_INCLUDE_DIRS})
+    target_link_libraries(v0_single_threaded ${SDL2_LIBRARIES})
+    add_definitions(-DSDL2_FOUND)
+    message(STATUS "SDL2 found and linked for real-time display")
+else()
+    message(WARNING "SDL2 not found. Real-time display will be disabled. Install SDL2 to enable real-time rendering.")
+    message(WARNING "On Ubuntu/Debian: sudo apt-get install libsdl2-dev")
+    message(WARNING "On macOS: brew install sdl2")
+    message(WARNING "On Windows: Download from https://www.libsdl.org/download-2.0.php")
+endif()
+
 # Link CUDA libraries if CUDA is found
 # Create res directory in output directory
 file(MAKE_DIRECTORY ${CMAKE_BINARY_DIR}/res)
 
@@ -13,15 +13,15 @@ It uses [single-file public domain (or MIT licensed) libraries for C/C++](https:
 ## Using CMAKE
 
 ```bash
-cmake .
+cmake 
 make -j 
 ./v0_single_threaded
 ```
 
 Or, all at once : 
 
 ```bash
-cmake . && make -j 24 && ./v0_single_threaded
+cmake . && make -j && ./v0_single_threaded
 ```
 
 ## Whitin VSCode
 
@@ -0,0 +1,91 @@
+# Explanation of `renderPixelsCUDA` and `renderPixelsKernel`
+
+This document provides a detailed explanation of the CUDA implementation for rendering pixels in the ray tracer, focusing on the `renderPixelsCUDA` host function and the `renderPixelsKernel` device kernel found in `src/v0_single_threaded/camera_cuda.cu`.
+
+## Overview
+
+The rendering process is parallelized using CUDA by assigning the computation of each pixel's color to a separate GPU thread. This allows for a massive speedup compared to a single-threaded CPU implementation. The process involves three main stages:
+
+1.  **Setup on the Host (CPU)**: The C++ code prepares the rendering parameters and allocates memory on the GPU.
+2.  **Execution on the Device (GPU)**: A CUDA kernel is launched, where thousands of threads concurrently trace rays to compute pixel colors.
+3.  **Data Transfer back to Host (CPU)**: The final rendered image is copied from GPU memory back to CPU memory to be saved.
+
+---
+
+## Host Function: `renderPixelsCUDA`
+
+This function is the C++ entry point that orchestrates the entire CUDA rendering pipeline. It runs on the CPU.
+
+```cpp
+extern "C" unsigned long long renderPixelsCUDA(unsigned char* image, int width, int height,
+                                               double cam_center_x, double cam_center_y, double cam_center_z,
+                                               double pixel00_x, double pixel00_y, double pixel00_z,
+                                               double delta_u_x, double delta_u_y, double delta_u_z,
+                                               double delta_v_x, double delta_v_y, double delta_v_z,
+                                               int samples_per_pixel, int max_depth)
+```
+
+### Key Steps and CUDA Specifics:
+
+1.  **GPU Memory Allocation (`cudaMalloc`)**:
+    *   `cudaMalloc(&d_image, image_size)`: Allocates a buffer on the GPU's VRAM to store the final image data. `d_image` is a pointer to this device memory.
+    *   `cudaMalloc(&d_ray_count, sizeof(unsigned long long))`: Allocates memory for a single 64-bit integer on the device. This will be used as an atomic counter to track the total number of rays traced by all threads.
+
+2.  **GPU Memory Initialization (`cudaMemset`)**:
+    *   `cudaMemset(d_ray_count, 0, ...)`: Initializes the ray counter on the device to zero.
+    *   `cudaMemset(d_image, 0, ...)`: Clears the device image buffer.
+
+3.  **Execution Configuration (Grid and Block Dimensions)**:
+    *   `dim3 block_size(32, 4);`: Defines the dimensions of a **thread block**. Here, each block contains `32 * 4 = 128` threads arranged in a 2D layout. Using rectangular blocks is a common heuristic to improve performance by optimizing memory access patterns and avoiding artifacts that can sometimes arise from perfectly square configurations.
+    *   `dim3 grid_size(...)`: Defines the dimensions of the **grid of blocks**. The calculation `(width + block_size.x - 1) / block_size.x` is a standard CUDA idiom to ensure enough blocks are launched to cover every pixel of the image, even if the image dimensions are not perfect multiples of the block dimensions.
+
+4.  **Kernel Launch (`<<<...>>>`)**:
+    *   `renderPixelsKernel<<<grid_size, block_size>>>(...);`: This is the most critical part. It launches the `renderPixelsKernel` function on the GPU.
+    *   The `<<<grid_size, block_size>>>` syntax tells the CUDA runtime how many threads to launch and how to group them. In this case, it launches a 2D grid of thread blocks.
+    *   All parameters (camera data, image dimensions, device pointers) are passed from the host to the kernel. Note that `double` precision values from the host are cast to `float`, as the kernel is optimized to use single-precision arithmetic, which is much faster on most consumer GPUs.
+
+5.  **Synchronization and Error Checking**:
+    *   `cudaGetLastError()`: Since kernel launches are asynchronous (the CPU code continues immediately without waiting for the GPU to finish), this function is called to check for any errors that might have occurred when launching the kernel.
+    *   `cudaDeviceSynchronize()`: This is a blocking call that pauses the CPU thread until all previously issued commands on the GPU have completed. This is essential to ensure the rendering is finished before we try to copy the results back.
+
+6.  **Data Transfer from Device to Host (`cudaMemcpy`)**:
+    *   `cudaMemcpy(image, d_image, ..., cudaMemcpyDeviceToHost)`: Copies the rendered pixel data from the GPU's memory (`d_image`) back to the host's main memory (`image`).
+    *   `cudaMemcpy(&host_ray_count, d_ray_count, ...)`: Copies the final ray count from the GPU back to a host variable.
+
+7.  **Cleanup (`cudaFree`)**:
+    *   `cudaFree(d_image)` and `cudaFree(d_ray_count)`: Releases the memory that was allocated on the GPU, preventing memory leaks in VRAM.
+
+---
+
+## Device Kernel: `renderPixelsKernel`
+
+This function runs on the GPU. A separate instance of this kernel (a thread) is executed for each pixel in the output image.
+
+```cpp
+__global__ void renderPixelsKernel(unsigned char* image, int width, int height, ..., unsigned long long* ray_count)
+```
+
+### Key Steps and CUDA Specifics:
+
+1.  **`__global__` Specifier**: This keyword declares the function as a "kernel" that can be called from the host (CPU) and is executed on the device (GPU).
+
+2.  **Global Thread-to-Pixel Mapping**:
+    *   `int x = blockIdx.x * blockDim.x + threadIdx.x;`
+    *   `int y = blockIdx.y * blockDim.y + threadIdx.y;`
+    *   This is the standard CUDA pattern for computing a unique global ID for each thread. `blockIdx` gives the ID of the current block in the grid, `blockDim` gives the size of the block, and `threadIdx` gives the ID of the current thread within its block. This calculation maps each thread to a unique `(x, y)` pixel coordinate.
+
+3.  **Random State Initialization (`curand_init`)**:
+    *   Each thread must have its own independent random number generator state to avoid visual artifacts.
+    *   `curand_init(...)`: Initializes the cuRAND library's state for the current thread. The seed is made unique for each pixel by combining its coordinates and the system clock, ensuring that each pixel's anti-aliasing and material scattering calculations are statistically independent.
+
+4.  **Ray Tracing Loop**:
+    *   For each sample per pixel, the thread calculates a unique ray direction with a random offset for anti-aliasing.
+    *   It calls the `ray_color` device function, which recursively traces the ray through the scene.
+
+5.  **Atomic Operations (`atomicAdd`)**:
+    *   Inside `ray_color`, the global ray counter is incremented using `atomicAdd(ray_count, 1)`.
+    *   An atomic operation is crucial here because thousands of threads are trying to increment the same memory location (`d_ray_count`) simultaneously. `atomicAdd` ensures that these operations are serialized, preventing race conditions and guaranteeing a correct final count.
+
+6.  **Writing Output**:
+    *   After accumulating the color from all samples, the thread performs gamma correction and converts the final floating-point color value to an 8-bit RGB triplet.
+    *   It then writes these three bytes directly to the correct location in the global image buffer (`d_image`). Since each thread is responsible for a unique pixel, there are no write conflicts between threads at this stage.
@@ -0,0 +1,102 @@
+# Real-time CUDA Ray Tracer Display
+
+This document explains how to use the real-time display feature of the CUDA ray tracer.
+
+## Prerequisites
+
+To enable real-time display, you need to install SDL2:
+
+### Ubuntu/Debian
+```bash
+sudo apt-get update
+sudo apt-get install libsdl2-dev
+```
+
+### macOS
+```bash
+brew install sdl2
+```
+
+### Windows
+Download the development libraries from: https://www.libsdl.org/download-2.0.php
+
+## How It Works
+
+The real-time display feature renders the image in small tiles (64x64 pixels by default) and updates the display after each tile is completed. This allows you to see the rendering progress in real-time rather than waiting for the entire image to finish.
+
+### Key Features:
+- **Progressive Rendering**: Watch the image build up tile by tile
+- **Interactive Window**: Close the window anytime with the X button
+- **Same Quality**: Uses the same CUDA rendering engine as the offline version
+- **Memory Efficient**: Only renders one tile at a time on the GPU
+
+## Usage
+
+1. **Build the project** (after installing SDL2):
+   ```bash
+   mkdir build
+   cd build
+   cmake ..
+   make
+   ```
+
+2. **Run the program**:
+   ```bash
+   ./v0_single_threaded
+   ```
+
+3. **Choose rendering mode**:
+   - Option 1: CPU Parallel (original)
+   - Option 2: CUDA GPU (original)
+   - Option 3: **CUDA GPU with Real-time Display** (new)
+
+4. **Watch the rendering**:
+   - A window will open showing the progressive rendering
+   - Each tile appears as it's completed
+   - The window remains open after rendering is complete
+   - Close the window to exit
+
+## Technical Details
+
+### Tile-Based Rendering
+- **Tile Size**: 64x64 pixels (configurable in `camera.h`)
+- **Rendering Order**: Left-to-right, top-to-bottom
+- **Memory Management**: Full image buffer maintained on both CPU and GPU
+- **Synchronization**: Each tile is synchronized before display update
+
+### Performance Considerations
+- **GPU Memory**: Uses same amount as full rendering (not tile-optimized yet)
+- **Display Updates**: Small delay (10ms) between tiles for visibility
+- **Event Handling**: Window remains responsive during rendering
+
+### Code Structure
+- `renderPixelsCUDART()`: Main real-time rendering method in `camera.h`
+- `renderPixelsCUDATile()`: Host function for tile rendering in `camera_cuda.cu`
+- `renderPixelsTileKernel()`: CUDA kernel for tile-based rendering
+
+## Troubleshooting
+
+### SDL2 Not Found
+If you see warnings about SDL2 not being found:
+1. Install SDL2 using the commands above
+2. Re-run `cmake ..` and `make`
+3. The real-time option will be automatically enabled
+
+### Window Doesn't Appear
+- Check that your display is properly configured
+- Try running from a graphical terminal
+- On Linux, ensure X11 forwarding is working if using SSH
+
+### Performance Issues
+- Reduce tile size in `camera.h` for more frequent updates
+- Increase samples per pixel for better quality
+- The real-time version uses the same CUDA optimization as the offline version
+
+## Future Improvements
+
+Potential enhancements for the real-time display:
+- Adaptive tile sizing based on GPU memory
+- Parallel tile rendering for faster updates
+- Preview mode with lower quality tiles
+- Progress bar and timing estimates
+- Save intermediate results
@@ -7,6 +7,10 @@
 #include <atomic>
 #include <chrono>
 
+#ifdef SDL2_FOUND
+#include <SDL2/SDL.h>
+#endif
+
 #pragma once
 
 inline double degrees_to_radians(double degrees)
@@ -23,12 +27,12 @@ class Camera
     const int image_width = static_cast<int>(aspect_ratio * image_height);
 
     double vfov = 35.0;                // vertical field of view in degrees
-    point3 lookfrom = point3(2, 2.5, 3); // Point camera is looking from
-    point3 lookat = point3(-1, 0, -1);  // Point camera is looking at
+    point3 lookfrom = point3(-2, 2, 5); // Point camera is looking from
+    point3 lookat = point3(-2, -0.5, -1);  // Point camera is looking at
     vec3 vup = vec3(0, 1, 0);          // Camera-relative "up" direction
 
     int samples_per_pixel = 1;
-    const int max_depth = 16; // Maximum ray bounce depth
+    const int max_depth = 24; // Maximum ray bounce depth
 
     std::atomic<long long> n_rays{0}; // Number of rays traced so far with this cam (thread-safe)
 
@@ -105,6 +109,121 @@ class Camera
 
         cout << "CUDA rendering completed in " << duration.count() << " milliseconds" << endl;
     }
+
+#ifdef SDL2_FOUND
+    void renderPixelsCUDART(vector<unsigned char> &image)
+    {
+        // Initialize SDL
+        if (SDL_Init(SDL_INIT_VIDEO) < 0) {
+            std::cerr << "SDL could not initialize! SDL_Error: " << SDL_GetError() << std::endl;
+            return;
+        }
+
+        // Create window
+        SDL_Window* window = SDL_CreateWindow("CUDA Ray Tracer - Real-time",
+                                              SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED,
+                                              image_width, image_height, SDL_WINDOW_SHOWN);
+        if (window == nullptr) {
+            std::cerr << "Window could not be created! SDL_Error: " << SDL_GetError() << std::endl;
+            SDL_Quit();
+            return;
+        }
+
+        // Create renderer
+        SDL_Renderer* renderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_ACCELERATED);
+        if (renderer == nullptr) {
+            std::cerr << "Renderer could not be created! SDL_Error: " << SDL_GetError() << std::endl;
+            SDL_DestroyWindow(window);
+            SDL_Quit();
+            return;
+        }
+
+        // Create texture for the image
+        SDL_Texture* texture = SDL_CreateTexture(renderer, SDL_PIXELFORMAT_RGB24,
+                                                 SDL_TEXTUREACCESS_STREAMING,
+                                                 image_width, image_height);
+        if (texture == nullptr) {
+            std::cerr << "Texture could not be created! SDL_Error: " << SDL_GetError() << std::endl;
+            SDL_DestroyRenderer(renderer);
+            SDL_DestroyWindow(window);
+            SDL_Quit();
+            return;
+        }
+
+        auto start_time = std::chrono::high_resolution_clock::now();
+
+        // Render in tiles for real-time display
+        const int tile_size = 128; // Size of each tile
+        const int tiles_x = (image_width + tile_size - 1) / tile_size;
+        const int tiles_y = (image_height + tile_size - 1) / tile_size;
+
+        std::cout << "Rendering in " << tiles_x << "x" << tiles_y << " tiles..." << std::endl;
+
+        for (int tile_y = 0; tile_y < tiles_y; ++tile_y) {
+            for (int tile_x = 0; tile_x < tiles_x; ++tile_x) {
+                int start_x = tile_x * tile_size;
+                int start_y = tile_y * tile_size;
+                int end_x = std::min(start_x + tile_size, image_width);
+                int end_y = std::min(start_y + tile_size, image_height);
+
+                // Render this tile
+                unsigned long long cuda_ray_count = ::renderPixelsCUDATile(
+                    image.data(), image_width, image_height,
+                    camera_center.x(), camera_center.y(), camera_center.z(),
+                    pixel00_loc.x(), pixel00_loc.y(), pixel00_loc.z(),
+                    pixel_delta_u.x(), pixel_delta_u.y(), pixel_delta_u.z(),
+                    pixel_delta_v.x(), pixel_delta_v.y(), pixel_delta_v.z(),
+                    samples_per_pixel, max_depth, start_x, start_y, end_x, end_y);
+
+                n_rays.fetch_add(cuda_ray_count, std::memory_order_relaxed);
+
+                // Update texture with the new tile
+                SDL_UpdateTexture(texture, nullptr, image.data(), image_width * 3);
+
+                // Clear and render
+                SDL_RenderClear(renderer);
+                SDL_RenderCopy(renderer, texture, nullptr, nullptr);
+                SDL_RenderPresent(renderer);
+
+                // Handle events to keep window responsive
+                SDL_Event event;
+                while (SDL_PollEvent(&event)) {
+                    if (event.type == SDL_QUIT) {
+                        goto cleanup;
+                    }
+                }
+
+                // Small delay to make the progressive rendering visible
+                SDL_Delay(1);
+            }
+        }
+
+        cleanup:
+        auto end_time = std::chrono::high_resolution_clock::now();
+        auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time);
+
+        std::cout << "Real-time CUDA rendering completed in " << duration.count() << " milliseconds" << std::endl;
+        std::cout << "Press any key to close the window..." << std::endl;
+
+        // Wait for user to close window
+        bool quit = false;
+        while (!quit) {
+            SDL_Event event;
+            while (SDL_PollEvent(&event)) {
+                if (event.type == SDL_QUIT) {
+                    quit = true;
+                }
+            }
+            SDL_Delay(1);
+        }
+
+        // Cleanup
+        SDL_DestroyTexture(texture);
+        SDL_DestroyRenderer(renderer);
+        SDL_DestroyWindow(window);
+        SDL_Quit();
+    }
+#endif
 
     void renderPixelsParallelWithTiming(const hittable &scene, vector<unsigned char> &image)
     {