In [1]:
import os
os.environ["PATH"] = "/usr/local/cuda/bin:" + os.environ["PATH"]


**Table of contents**<a id='toc0_'></a>    
- [Pointers Vs Iterators](#toc1_)    
- [Summary of Capture Behaviors in CUDA](#toc2_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Pointers Vs Iterators](#toc0_)
**Rules**
Don't use pointers in STL/Thrust algorithms. Use iterators instead.
Don't use iterators with raw memory management.

Lists the iterators of different containers:

In [11]:
%%writefile Sources/iterators.cpp
#include <cstdio>
#include <array>
#include <tuple>

// Counting iterator that returns the index as the value
struct counting_iterator{
    int operator[](int i){
        return i;
    }
};

// Transform iterator that applies a transformation (doubling the value) to the input array
template<typename T>
struct transform_iterator{
    T *a;
    T operator[](int i){
        return a[i]*2;
    }

};
// Zip iterator that combines two arrays into a tuple of their elements
struct zip_iterator
{
    int *a;
    int *b;

    std::tuple<int, int> operator[](int i)
    {
        return {this->a[i], this->b[i]};
    }
};

struct wrapper{
    void operator=(int value){
        // do nothing
    }
};

struct discard_iterator{
    wrapper operator[](int i){
        return {};
    }
};






int main(){
    counting_iterator it_count;
    std::printf("Output the counting_iterator values:\n");
    std::printf("Value at index 5 at it_count[5]: %d\n", it_count[5]);
    std::printf("Value at index 10 at it_count[10]: %d\n", it_count[10]);

    std::array<int, 3> x{0,1,2};
    transform_iterator<decltype(x)::value_type> it_transform{x.data()};
    std::printf("Output the counting_iterator values:\n");
    std::printf("Array values: %d, %d, %d\n", x[0], x[1], x[2]);;

    std::printf("Value at index 0 at it_transform[0]: %d\n", it_transform[0]);
    std::printf("Value at index 1 at it_transform[1]: %d\n", it_transform[1]);

    std::array<int, 3> a{ 0, 1, 2 };
    std::array<int, 3> b{ 5, 4, 2 };

    zip_iterator it{a.data(), b.data()};

    std::printf("it[0]: (%d, %d)\n", std::get<0>(it[0]), std::get<1>(it[0])); // prints (0, 5)
    std::printf("it[0]: (%d, %d)\n", std::get<0>(it[1]), std::get<1>(it[1])); // prints (1, 4)

    discard_iterator it_dis{};

    it_dis[0] = 10;
    it_dis[1] = 20;
}

Overwriting Sources/iterators.cpp


In [12]:
# !nvcc --extended-lambda -o /tmp/a.out Sources/iterators.cpp -x cu -arch=native # build executable
!g++ -o /tmp/a.out Sources/iterators.cpp # build executable
!/tmp/a.out # run executable

Output the counting_iterator values:
Value at index 5 at it_count[5]: 5
Value at index 10 at it_count[10]: 10
Output the counting_iterator values:
Array values: 0, 1, 2
Value at index 0 at it_transform[0]: 0
Value at index 1 at it_transform[1]: 2
it[0]: (0, 5)
it[0]: (1, 4)


In [31]:
%%writefile Sources/transform-zip.cpp
#include <cstdio>
#include <tuple>

struct zip_iterator{
    int *a;
    int *b;

    std::tuple<int, int> operator[](int i)
    {
        return {a[i], b[i]};
    }
};

struct transform_iterator
{
    zip_iterator zip;

    int operator[](int i)
    {
        auto [a, b] = zip[i];
        return abs(a - b);
    }
};

int main()
{
    std::array<int, 3> a{ 0, 1, 2 };
    std::array<int, 3> b{ 5, 4, 2 };

    zip_iterator zip{a.data(), b.data()};
    transform_iterator it{zip};

    std::printf("it[0]: %d\n", it[0]); // prints 5
    std::printf("it[0]: %d\n", it[1]); // prints 3
}











Overwriting Sources/transform-zip.cpp


In [30]:
!nvcc --extended-lambda -o /tmp/a.out Sources/transform-zip.cpp -x cu -arch=native # build executable
!/tmp/a.out # run executable

it[0]: 5
it[0]: 3


In [33]:
%%writefile Sources/transform-output.cpp
#include <cstdio>
#include <tuple>
#include <array>

struct wrapper
{
    int *ptr;

    void operator=(int value) {
        *ptr = value / 2;
    }
};

struct transform_output_iterator
{
    int *a;
    // when return {a + i}, it constructs a wrapper with ptr = a + i at the same time.
    wrapper operator[](int i)
    {
        return {a + i};
    }
};

int main()
{
    std::array<int, 3> a{ 0, 1, 2 };
    transform_output_iterator it{a.data()};

    it[0] = 10;
    it[1] = 20;

  std::printf("a[0]: %d\n", a[0]); // prints 5
  std::printf("a[1]: %d\n", a[1]); // prints 10
}

Overwriting Sources/transform-output.cpp


In [34]:
!nvcc --extended-lambda -o /tmp/a.out Sources/transform-output.cpp -x cu -arch=native # build executable
!/tmp/a.out # run executable

a[0]: 5
a[1]: 10




# <a id='toc2_'></a>[Summary of Capture Behaviors in CUDA](#toc0_)
| Capture Method       | Result in __device__ code | Explanation                                               |
|----------------------|---------------------------|-----------------------------------------------------------|
| By Value [=] (Object) | Fail                      | Tries to copy the entire vector object (including host-side logic) to the GPU. |
| By Reference [&]      | Crash                     | Tries to access a CPU stack address from the GPU.          |
| By Value [=] (Pointer) | Success                   | The GPU receives a direct memory address it can actually read. |