# Configuração do Ambiente

Aqui são configurados os compiladores e backend para que o offloading para a GPU possa ser realizado.

No final, um programa teste simples mostra se o dispositivo acelerador (GPU) foi encontrado.

In [None]:
%%shell
ln -sfnv /usr/local/cuda-11/ /usr/local/cuda
wget https://openmp-course.s3.amazonaws.com/llvm.tar.gz
tar -xzvf llvm.tar.gz >/dev/null 2>&1

'/usr/local/cuda' -> '/usr/local/cuda-11/'
--2023-09-27 02:27:30--  https://openmp-course.s3.amazonaws.com/llvm.tar.gz
Resolving openmp-course.s3.amazonaws.com (openmp-course.s3.amazonaws.com)... 52.217.48.92, 52.217.169.185, 16.182.74.73, ...
Connecting to openmp-course.s3.amazonaws.com (openmp-course.s3.amazonaws.com)|52.217.48.92|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 810538565 (773M) [application/x-gzip]
Saving to: ‘llvm.tar.gz’


2023-09-27 02:27:47 (45.8 MB/s) - ‘llvm.tar.gz’ saved [810538565/810538565]





In [None]:
import os

os.environ['LLVM_PATH'] = '/content/llvm'
os.environ['PATH'] = os.environ['LLVM_PATH'] + '/bin:' + os.environ['PATH']
os.environ['LD_LIBRARY_PATH'] = os.environ['LLVM_PATH'] + '/lib:' + os.environ['LD_LIBRARY_PATH']
os.environ['TSAN_OPTIONS'] = 'ignore_noninstrumented_modules=1'

In [None]:
%%writefile test.c

#include <omp.h>
#include <stdio.h>

int main() {
  int num_devices = omp_get_num_devices();
  printf("Temos %d dispositivo(s) alocado(s)\n", num_devices);
}

Writing test.c


Essa é a linha de comando principal do compilador. Você deve usar essas opções sempre que for fazer offloading para GPU.



In [None]:
%%shell

clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_75 test.c -o teste

./teste

Temos 1 dispositivo(s) alocado(s)




# Experimentos com offloading para GPU

Programa teste para execução na GPU.

In [None]:
%%writefile vadd.c
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#define N 500000000
#define TOL  0.0000001
//
//  This is a simple program to add two vectors
//  and verify the results.
//
//  History: Written by Tim Mattson, November 2017
//
int main()
{

    float *a, *b, *c, *res;
    int err=0;

    a = (float *)malloc(sizeof(float)*N);
    if (a==NULL) {printf("could not allocate memory\n"); exit(-1);}
    b = (float *)malloc(sizeof(float)*N);
    if (b==NULL) {printf("could not allocate memory\n"); exit(-1);}
    c = (float *)malloc(sizeof(float)*N);
    if (c==NULL) {printf("could not allocate memory\n"); exit(-1);}
    res = (float *)malloc(sizeof(float)*N);
    if (res==NULL) {printf("could not allocate memory\n"); exit(-1);}

    double init_time, compute_time, test_time;
    init_time    = -omp_get_wtime();

   // fill the arrays
   for (long i=0; i<N; i++){
      a[i] = (float)i;
      b[i] = 2.0*(float)i;
      c[i] = 0.0;
      res[i] = (float)i + 2.0*(float)i;
   }

   init_time    +=  omp_get_wtime();
   compute_time  = -omp_get_wtime();

   // add two vectors
   //#pragma omp target teams distribute parallel for simd map(to:a[0:N], b[0:N]) map(tofrom:c[0:N])
   for (long i=0; i<N; i++){
      c[i] = a[i] + b[i];
   }


   compute_time +=  omp_get_wtime();
   test_time     = -omp_get_wtime();

   // test results
   for (long i=0;i<N;i++){
      float val = c[i] - res[i];
      val = val*val;
      if (val>TOL) err++;
   }

   test_time    +=  omp_get_wtime();

   printf(" vectors added with %d errors\n",err);
   printf("Init time:    %.3fs\n", init_time);
   printf("Compute time: %.3fs\n", compute_time);
   printf("Test time:    %.3fs\n", test_time);
   printf("Total time:   %.3fs\n", init_time + compute_time + test_time);

   free(a);
   free(b);
   free(c);
   free(res);
   return 0;
}

Writing vadd.c


In [None]:
!clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_75 vadd.c -o vadd.x



In [None]:
!./vadd.x

 vectors added with 0 errors
Init time:    7.681s
Compute time: 1.412s
Test time:    1.698s
Total time:   10.791s
