# 00 Atomicadd**FreeCodeCamp CUDA Course - Module 5**Original Course: [https://www.youtube.com/watch?v=86FAWCzIe_4](https://www.youtube.com/watch?v=86FAWCzIe_4)Source File: `00_atomicAdd.cu`---## OverviewUse atomic operations to handle race conditions.---

## Learning ObjectivesBy the end of this notebook, you will:1. Understand CUDA kernel syntax and execution2. Learn GPU memory allocation and data transfer3. Use atomic operations for thread-safe updates---

## SetupMake sure you've completed the setup from the first notebook (GPU enabled, nvcc4jupyter installed).---

## Key Concepts- **Kernel Function**: Uses `__global__` qualifier for GPU execution- **Device Memory**: Allocated using `cudaMalloc`- **Data Transfer**: Uses `cudaMemcpy` between host and device- **Kernel Launch**: Syntax `kernel<<<blocks, threads>>>(...)`---## CUDA Implementation

In [None]:
%%cu#include <cuda_runtime.h>#include <stdio.h>#define NUM_THREADS 1000#define NUM_BLOCKS 1000// Kernel without atomics (incorrect)__global__ void incrementCounterNonAtomic(int* counter) {    // not locked    int old = *counter;    int new_value = old + 1;    // not unlocked    *counter = new_value;}// Kernel with atomics (correct)__global__ void incrementCounterAtomic(int* counter) {    int a = atomicAdd(counter, 1);}int main() {    int h_counterNonAtomic = 0;    int h_counterAtomic = 0;    int *d_counterNonAtomic, *d_counterAtomic;    // Allocate device memory    cudaMalloc((void**)&d_counterNonAtomic, sizeof(int));    cudaMalloc((void**)&d_counterAtomic, sizeof(int));    // Copy initial counter values to device    cudaMemcpy(d_counterNonAtomic, &h_counterNonAtomic, sizeof(int), cudaMemcpyHostToDevice);    cudaMemcpy(d_counterAtomic, &h_counterAtomic, sizeof(int), cudaMemcpyHostToDevice);    // Launch kernels    incrementCounterNonAtomic<<<NUM_BLOCKS, NUM_THREADS>>>(d_counterNonAtomic);    incrementCounterAtomic<<<NUM_BLOCKS, NUM_THREADS>>>(d_counterAtomic);    // Copy results back to host    cudaMemcpy(&h_counterNonAtomic, d_counterNonAtomic, sizeof(int), cudaMemcpyDeviceToHost);    cudaMemcpy(&h_counterAtomic, d_counterAtomic, sizeof(int), cudaMemcpyDeviceToHost);    // Print results    printf("Non-atomic counter value: %d\n", h_counterNonAtomic);    printf("Atomic counter value: %d\n", h_counterAtomic);    // Free device memory    cudaFree(d_counterNonAtomic);    cudaFree(d_counterAtomic);    return 0;}

## ExercisesExplore atomic operations:1. **Other Atomic Functions**: Try `atomicMin`, `atomicMax`, `atomicExch`2. **Performance Comparison**: Measure execution time with and without atomics3. **Histogram**: Use atomics to build a histogram of random values4. **Race Condition Demo**: Create a kernel showing race conditions without atomics

---## Key Takeaways- CUDA enables massive parallelism for compute-intensive tasks- Proper memory management is crucial for performance- Understanding the thread hierarchy helps write efficient kernels- Always synchronize when needed to ensure correctness---## Next StepsContinue to the next notebook in Module 5 to learn more CUDA concepts!---## Notes*Use this space for your learning notes:*