# Configuração do Ambiente

Aqui são configurados os compiladores e backend para que o offloading para a GPU possa ser realizado.

No final, um programa teste simples mostra se o dispositivo acelerador (GPU) foi encontrado.

In [34]:
%%shell
ln -sfnv /usr/local/cuda-11/ /usr/local/cuda
wget https://openmp-course.s3.amazonaws.com/llvm.tar.gz
tar -xzvf llvm.tar.gz >/dev/null 2>&1

'/usr/local/cuda' -> '/usr/local/cuda-11/'
--2023-09-27 19:02:09--  https://openmp-course.s3.amazonaws.com/llvm.tar.gz
Resolving openmp-course.s3.amazonaws.com (openmp-course.s3.amazonaws.com)... 52.216.35.121, 16.182.106.97, 52.217.86.196, ...
Connecting to openmp-course.s3.amazonaws.com (openmp-course.s3.amazonaws.com)|52.216.35.121|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 810538565 (773M) [application/x-gzip]
Saving to: ‘llvm.tar.gz.1’


2023-09-27 19:02:22 (59.1 MB/s) - ‘llvm.tar.gz.1’ saved [810538565/810538565]





In [35]:
import os

os.environ['LLVM_PATH'] = '/content/llvm'
os.environ['PATH'] = os.environ['LLVM_PATH'] + '/bin:' + os.environ['PATH']
os.environ['LD_LIBRARY_PATH'] = os.environ['LLVM_PATH'] + '/lib:' + os.environ['LD_LIBRARY_PATH']
os.environ['TSAN_OPTIONS'] = 'ignore_noninstrumented_modules=1'

In [36]:
%%writefile test.c

#include <omp.h>
#include <stdio.h>

int main() {
  int num_devices = omp_get_num_devices();
  printf("Temos %d dispositivo(s) alocado(s)\n", num_devices);
}

Overwriting test.c


Essa é a linha de comando principal do compilador. Você deve usar essas opções sempre que for fazer offloading para GPU.



In [37]:
%%shell

clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_75 test.c -o teste

./teste

Temos 1 dispositivo(s) alocado(s)




# Experimentos com offloading para GPU

Programa teste para execução na GPU.

In [65]:
%%writefile mmult.c
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <sys/time.h>

// parâmetro de linha de comando: <tamanho>
int main(int argc, char *argv[])
{

	struct timeval start, end; //gettimeofday
	double t, sum;
	int i, j, k, tam;

  if (argc < 2) {
    printf("É necessário especificar o tamanho da matriz\n");
    exit(-1);
  }

  srand(0);

	tam = atoi(argv[1]);

	// Aloca espaço dinâmico para as matrizes


  double *ma = (double *) malloc(tam * tam * sizeof(double));
  double *mb = (double *) malloc(tam * tam * sizeof(double));
  double *mfim = (double *) malloc(tam * tam * sizeof(double));


	if (ma == NULL || mb == NULL || mfim == NULL)
    {
        fprintf(stderr, "Out of memory");
        exit(-1);
    }

	// Preenche as matrizes com valores randômicos
	for (i=0;i<tam;i++)
		for (j=0;j<tam;j++){
			ma[i*tam+j] = (fmod (rand(), 50.111));
			mb[i*tam+j] = (fmod (rand(), 50.111));
		}


	// a multiplicação
	gettimeofday(&start, NULL);

#pragma omp target teams distribute parallel for private(j, k) map(to:ma[0:tam*tam], mb[0:tam*tam]) map(tofrom:mfim[0:tam*tam])
	for (i=0; i<tam; i++)
		for (j=0; j<tam; j++)
			for (k=0; k<tam; k++)
        mfim[i*tam+j] += ma[i*tam+k] * mb[k*tam+j];

	gettimeofday(&end, NULL);


	t = (double) ((end.tv_sec * 1000000 + end.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec)) / 1000000.0;

	printf("Tempo gasto: %f\n", t);

  /* imprime matriz resultante no stderr */
	for (i=0; i<tam; i++)
		for (j=0; j<tam; j++)
      fprintf(stderr, "%g ", mfim[i*tam+j]);


	free(ma);
  free(mb);
  free(mfim);

	return 0;
}


Overwriting mmult.c


In [66]:
!clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_75 -lm mmult.c -o mmult.x



In [67]:
!./mmult.x 1000 2>saida2

Tempo gasto: 2.602798
