### Numpy code

In [None]:
# ENTRADA DE DATOS POR SBATCH
import sys

if len(sys.argv) > 1:
    n = int(sys.argv[1])
else:
    print ("Debe introducir un valor entero")

In [2]:
import numpy as np

# Example: Large matrices (adjust size as needed)
#n = 7000  # For very large matrices, ensure you have enough RAM
A = np.random.rand(n, n).astype(np.float32)
B = np.random.rand(n, n).astype(np.float32)

C = np.dot(A, B)  # warm-up and Matrix multiplication

%timeit -r 2 -o np.dot(A, B)

print(f"Result shape: {C.shape}")
print(f"Result type: {C.dtype}")


1.06 s ± 5.51 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result shape: (7000, 7000)
Result type: float32


### PyTorch

In [1]:
import time
import torch

# Comprobamos que haya una GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Device: {device}")

# Example: Large matrices (adjust size as needed)
# n = 7000  # Establecemos el tamaño de la matriz
# Generamos las matrices en GPU
A_torch = torch.rand((n, n), dtype=torch.float32, device=device)
B_torch = torch.rand((n, n), dtype=torch.float32, device=device)

# Hacemos la multiplicación de matrices y precompilado
C_torch = torch.matmul(A_torch, B_torch)
torch.cuda.synchronize()

def matmul_torch(A,B):
    C = torch.matmul(A, B)
    if device.type == "cuda" :
        torch.cuda.synchronize() #Asegurar que se sincroniza
    return C

## Calculamos tiempo

%timeit -r 3 -n 3 matmul_torch(A_torch, B_torch)

C_result = matmul_torch(A_torch, B_torch)
print(f"Result shape: {C_result.shape}")
print(f"Result type: {C_result.dtype}")
#print(C_torch)

Device: cuda


    Found GPU0 NVIDIA GeForce GTX 1080 which is of cuda capability 6.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (7.0) - (12.0)
    
    Please install PyTorch with a following CUDA
    configurations:  12.6 following instructions at
    https://pytorch.org/get-started/locally/
    
NVIDIA GeForce GTX 1080 with CUDA capability sm_61 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_70 sm_75 sm_80 sm_86 sm_90 sm_100 sm_120.
If you want to use the NVIDIA GeForce GTX 1080 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/



AcceleratorError: CUDA error: no kernel image is available for execution on the device
Search for `cudaErrorNoKernelImageForDevice' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


### RESULTADOS OBTENIDOS (n = 7000)
#### NUMPY

530 ms ± 589 μs per loop (mean ± std. dev. of 2 runs, 100 loops each)

Result shape: (1000, 1000)

Result type: float32


#### PYTORCH
557 ms ± 6.12 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)

Result shape: torch.Size([1000, 1000])

Result type: torch.float32


Cuando estudiamos la comparativa, vemos que Pytorch es incluso más lento que Numpy. Esto se debe a que se trata de una matriz de tamaño moderado, lo cual hace que el tiempo de la transferencia de datos pueda ser hasta mayor que el propio cálculo en la GPU.

Si se estudiase con matrices de mayor tamaño, quizá Pytorch alcanzaría tiempos menores que Numpy.