# Crossfit

### Why do frameworks like Tensorflow/PyTorch/Jax exist?
Numpy is great but it lacks a few important pieces:
- Lack of hardware acceleration
- Lack of automatic differentation

This notebook focusses on TensorFlow in particular from the DL-frameworks.

In [2]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

import warnings
warnings.filterwarnings("ignore")

import numpy as np
import cupy as cp
import tensorflow as tf
import crossfit as cf

### Let's check the impact of hardware acceleration

In [3]:
x_np = np.random.random((5000, 5000))
x_tf = cf.convert_array(x_np, tf.Tensor)
x_cp = cf.convert_array(x_np, cp.ndarray)

In [3]:
%timeit -n 5 -r 5 np.dot(x_np, x_np)

544 ms ± 30.5 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


In [4]:
%timeit -n 5 -r 5 tf.matmul(x_tf, x_tf)

The slowest run took 2549.23 times longer than the fastest. This could mean that an intermediate result is being cached.
20.6 ms ± 41 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


In [5]:
%timeit -n 5 -r 5 cp.dot(x_cp, x_cp)

The slowest run took 23.01 times longer than the fastest. This could mean that an intermediate result is being cached.
65.1 µs ± 106 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)


Tensorflow is definetely faster (since it leverages the GPU). *Note, the API is slightly different: `tf.matmul` vs `np.dot`*. 

This is where crossfit comes in! You can write your code using numpy, and crossfit takes care of running it in various supported backends including pytorch/jax/tensorflow.

### Consistent API using numpy

Crossfit enables writing your code using numpy & run it in a variaty of different backends. 

`crossarray` can be be used as a decorator or as a context-manager

In [3]:
dot = cf.crossarray(np.dot)

In [7]:
%timeit -n 5 -r 5 dot(x_np, x_np)

512 ms ± 4.14 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


In [8]:
%timeit -n 5 -r 5 dot(x_tf, x_tf)

The slowest run took 4.72 times longer than the fastest. This could mean that an intermediate result is being cached.
823 ms ± 702 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


In [9]:
%timeit -n 5 -r 5 dot(x_cp, x_cp)

The slowest run took 7.46 times longer than the fastest. This could mean that an intermediate result is being cached.
82.6 µs ± 91.8 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)


🤔 Faster but not as fast as `tf.matmul`...

In [4]:
dot_jitted = tf.function(dot, jit_compile=True)

In [5]:
%timeit -n 5 -r 5 dot_jitted(x_tf, x_tf)

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got builtin_function_or_method


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got builtin_function_or_method


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got builtin_function_or_method
The slowest run took 547.72 times longer than the fastest. This could mean that an intermediate result is being cached.
22.4 ms ± 44.5 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


#### So we need jit-compilation in TF to get speed

In [6]:
dot = cf.crossarray(np.dot, jit=True, overwrite=True)

In [8]:
%timeit -n 5 -r 5 dot(x_tf, x_tf)

The slowest run took 21.72 times longer than the fastest. This could mean that an intermediate result is being cached.
1.1 ms ± 1.77 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


### Cross-framework numpy-API means: we can use other tools like sklearn

In [4]:
from sklearn.metrics import accuracy_score

size = int(1e6)
y_true = np.random.randint(2, size=size)
y_pred = np.random.rand(size)
y_true_tf = tf.convert_to_tensor(y_true)
y_pred_tf = tf.convert_to_tensor(y_pred)

cross_accuracy_score = cf.crossarray(accuracy_score)

In [5]:
%timeit -n 5 -r 5 accuracy_score(y_true, y_pred > 0.5)

59.4 ms ± 174 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)


In [6]:
%timeit -n 5 -r 5 cross_accuracy_score(y_true_tf, y_pred_tf > 0.5)

4.79 ms ± 971 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)


In [11]:
with cf.crossarray:
    accuracy = accuracy_score(y_true, y_pred > 0.5)
    
accuracy

0.500176