<a href="https://colab.research.google.com/github/ejmata2/python/blob/main/PyCuda.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial de PyCuda

## Instalación 


In [2]:
!pip install pycuda

Collecting pycuda
[?25l  Downloading https://files.pythonhosted.org/packages/5a/56/4682a5118a234d15aa1c8768a528aac4858c7b04d2674e18d586d3dfda04/pycuda-2021.1.tar.gz (1.7MB)
[K     |████████████████████████████████| 1.7MB 8.7MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting mako
[?25l  Downloading https://files.pythonhosted.org/packages/f3/54/dbc07fbb20865d3b78fdb7cf7fa713e2cba4f87f71100074ef2dc9f9d1f7/Mako-1.1.4-py2.py3-none-any.whl (75kB)
[K     |████████████████████████████████| 81kB 9.4MB/s 
[?25hCollecting pytools>=2011.2
[?25l  Downloading https://files.pythonhosted.org/packages/49/5b/136e5688da9bbd915ee8190bfd6a007fc0b19d71f26d5a2ab4b737b2eeb4/pytools-2021.2.6.tar.gz (63kB)
[K     |████████████████████████████████| 71kB 9.5MB/s 
Building wheels for collected packages: pycuda
  Building wheel for pycuda (PEP 517) ... [?25l[?25hdone
  Creat

Before you can use PyCuda, you have to import and initialize it:

In [3]:
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule

In PyCuda, you will mostly transfer data from numpy arrays on the host. Let’s make a 4x4 array of random numbers:

In [4]:
import numpy
a = numpy.random.randn(4,4)

a consists of double precision numbers, but most nVidia devices only support single precision:

In [5]:
a = a.astype(numpy.float32)

 allocate memory on the device

In [6]:
a_gpu = cuda.mem_alloc(a.nbytes)

 transfer the data to the GPU:

In [7]:
cuda.memcpy_htod(a_gpu, a)

# Executing a Kernel

For this tutorial, we’ll stick to something simple: We will write code to double each entry in a_gpu. To this end, we write the corresponding CUDA C code, and feed it into the constructor of a pycuda.compiler.SourceModule:

In [8]:
mod = SourceModule("""
  __global__ void doublify(float *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)

If there aren’t any errors, the code is now compiled and loaded onto the device. We find a reference to our pycuda.driver.Function and call it, specifying a_gpu as the argument, and a block size of 4x4:

In [9]:
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))

Finally, we fetch the data back from the GPU and display it, together with the original a:

In [11]:
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

[[ 2.985099   -2.987172    4.665507   -2.0345824 ]
 [-0.5008996  -5.4453635   2.6517894  -1.3422637 ]
 [ 0.13100177  1.7423439  -2.8372016  -3.248441  ]
 [-2.0961182   2.1598656  -0.7870664   2.3708656 ]]
[[ 1.4925495  -1.493586    2.3327534  -1.0172912 ]
 [-0.2504498  -2.7226818   1.3258947  -0.67113185]
 [ 0.06550089  0.87117195 -1.4186008  -1.6242205 ]
 [-1.0480591   1.0799328  -0.3935332   1.1854328 ]]
