# GPU Optimization with SYCL

Below is list of topics in the __GPU Optimization using SYCL__ modules:
- [__Introduction to GPU Optimization__](01_Introduction_to_GPU_Optimization/01_Introduction.ipynb)
  - Phases in the Optimization Workflow
  - Locality Matters
  - Parallelization
  - GPU Execution Model Overview
- [__Thread Mapping and Occupancy__](02_Thread_Mapping_and_Occupancy/02_Thread_Mapping_and_Occupancy.ipynb)
  - nd_range Kernel
  - Thread Synchronization
  - Mapping Work-groups to Xe-cores for Maximum Occupancy
  - Intel® GPU Occupancy Calculator
- [__Memory Optimizations__](03_Memory_Optimization/03_Memory_Optimization.ipynb)
  - [Memory Optimization - Buffers](03_Memory_Optimization/031_Memory_Optimization_Buffers.ipynb)
    - Buffer Accessor Modes
    - Optimizing Memory Movement Between Host and Device
    - Avoid Declaring Buffers in a Loop
    - Avoid Moving Data Back and Forth Between Host and Device
  - [Memory Optimization - USM](03_Memory_Optimization/032_Memory_Optimization_USM.ipynb)
    - Overlapping Data Transfer from Host to Device
    - Avoid Copying Unnecessary Block of Data
    - Copying Memory from Host to USM Device Allocation
- [__Kernel Submission__](04_Kernel_Submission/04_Kernel_Submission.ipynb)
  - Kernel Launch
  - Executing Multiple Kernels
  - Submitting Kernels to Multiple Queues
  - Avoid Redundant Queue Construction
- [__Kernel Programming__](05_Kernel_Programming/05_Kernel_Programming.ipynb)
  - Considerations for Selecting Work-group Size
  - Removing Conditional Checks
  - Avoiding Register Spills
- [__Shared Local Memory__](06_Shared_Local_Memory/06_Shared_Local_Memory.ipynb)
  - SLM Size and Work-group Size
  - Bank Conflicts
  - Using SLM as Cache
  - Data Sharing and Work-group Barriers
- [__Sub-Groups__](07_Sub_Groups/07_Sub_Groups.ipynb)
  - Sub-group Sizes
  - Sub-group Size vs. Maximum Sub-group Size
  - Vectorization and Memory Access
  - Data Sharing
- [__Atomic Operations__](08_Atomic_Operations/08_Atomic_Operations.ipynb)
  - Data Types for Atomic Operations
  - Atomic Operations in Global vs Local Space
- [__Kernel Reduction__](09_Kernel_Reduction/09_Kernel_Reduction.ipynb)
  - Reduction Using Atomic Operation
  - Reduction Using Shared Local Memory
  - Reduction Using Sub-Groups
  - Reduction Using SYCL Reduction Kernel

# Reference
Original Source: [oneAPI GPU Optimization Guide](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-gpu-optimization-guide/top.html)
