# HPC Exercices
# **Please start this notebook with GPU support, as you will need it later in the exercise. You can change this by going to `Runtime -> Change runtime type -> T4 GPU`**
## In this series of exercises, you'll get to compare a few different techniques on accelerated computing that was shown in the lecture. We start by downloading the simulated sales data from the webshop dataset:

In [None]:
# Download sales data
!gdown 1xWAK9ruxl9C9SHNFV09oEasnEWhLcvWZ

# Install numba
!pip install numba

Downloading...
From: https://drive.google.com/uc?id=1xWAK9ruxl9C9SHNFV09oEasnEWhLcvWZ
To: /content/sales_data.parquet
100% 79.4M/79.4M [00:01<00:00, 75.4MB/s]


Let's now extract the item prices and the order totals as two seperate numpy arrays:

In [None]:
import pandas as pd
import numpy as np

item_price = np.array(pd.read_parquet('sales_data.parquet')['item_price'])
order_total = np.array(pd.read_parquet('sales_data.parquet')['order_total'])

We want to compare the execution times of a few methods shown in the lecture. The quantity we want to calculate as fast as possible is the _item price percentage_ i.e the percentage each items' price fills of the order total. A simple for-loop that calculates this quantity is shown below:

In [None]:
def stupid_python_loop(item_price, order_total):
  "Calculate the percentage that each item fills in the order total"
  result = np.zeros(item_price.shape)
  for row in range(len(item_price)):
    result[row] = (item_price[row]/order_total[row])*100
  return result

percentages = stupid_python_loop(item_price, order_total)


In [None]:
percentages


array([ 3.11111013, 34.81997645, 13.97132363, ..., 11.84896251,
       10.54651084, 23.53581838])

# **Exercise 1.1**:
## use `%timeit -n 10` to profile the execution time of `stupid_python_loop`.


In [None]:
# SOLUTION BY RASMUS
%timeit -n 2 stupid_python_loop(item_price, order_total)

1.2 s ± 140 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)


# **Exercise 1.2**:
## We now want to compare the execution time above with a version of the for-loop that has been altered such that it is JIT-compiled using NUMBA.

## Write a version of `stupid_python_loop` that uses the `numba.jit` decorator, and use `%timeit -n 20` to profile the execution time. Name this function `python_loop_jit` How many times is it faster than the original version?

In [None]:
# SOLUTION BY RASMUS
import numba

@numba.jit
def python_loop_jit(item_price, order_total):
  "Calculate the percentage that each item fills in the order total"
  result = np.zeros(item_price.shape)
  for row in range(len(item_price)):
    result[row] = (item_price[row]/order_total[row])*100
  return result

%timeit -n 20 python_loop_jit(item_price, order_total)

  def python_loop_jit(item_price, order_total):


The slowest run took 4.29 times longer than the fastest. This could mean that an intermediate result is being cached.
10.7 ms ± 7 ms per loop (mean ± std. dev. of 7 runs, 20 loops each)


In [None]:
# 1.63 s -> 1630 ms  -> 1630 ms/8.97 ms =~ 183 times faster

# **Exercise 1.2**:
## We now want to compare the execution times from 1.1 and 1.2 with numpy's in-built vectorization method.

## Write a version of `stupid_python_loop` that utilizes numpy vectorization instead of loops to calculate the item price percentage. Call this function `python_vectorized` and profile the execution time using `%timeit -n 20`


In [None]:
# SOLUTION BY RASMUS

def python_vectorized(item_price, order_total):
  return (item_price/order_total) * 100

%timeit -n 20 python_vectorized(item_price, order_total)


5.7 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)


# **Exercise 1.3**:
## We now want to compare the execution times from 1.1, 1.2 and 1.3 with a version of the code run by a GPU.

## Write a version of `stupid_python_loop` that utilizes numpy vectorization instead of loops to calculate the item price percentage. Call this function `python_vectorized` and profile the execution time using `%timeit -n 20`

## Does it matter for the execution time whether the factor `100` is on GPU or CPU?

In [None]:
# SOLUTION BY RASMUS

import cupy as cp

def python_gpu(item_price, order_total):
  return item_price/order_total

item_price_gpu = cp.array(item_price)
order_total_gpu = cp.array(order_total)


%timeit -n 20 python_gpu(item_price_gpu, order_total_gpu)

The slowest run took 2341.94 times longer than the fastest. This could mean that an intermediate result is being cached.
5.2 ms ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 20 loops each)
