# Crossfit

### Why do frameworks like Tensorflow/PyTorch/Jax exist?
Numpy is great but it lacks a few important pieces:
- Lack of hardware acceleration
- Lack of automatic differentation

This notebook focusses on TensorFlow in particular from the DL-frameworks.

In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

import numpy as np
import tensorflow as tf
from crossfit.array import crossnp

### Let's check the impact of hardware acceleration

In [2]:
x_np = np.random.random((5000, 5000))
x_tf = tf.convert_to_tensor(x_np)

In [3]:
%timeit -n 5 -r 5 np.dot(x_np, x_np)

519 ms ± 719 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)


In [4]:
%timeit -n 5 -r 5 tf.matmul(x_tf, x_tf)

The slowest run took 2525.42 times longer than the fastest. This could mean that an intermediate result is being cached.
21.9 ms ± 43.7 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


Tensorflow is definetely faster (since it leverages the GPU). *Note, the API is slightly different: `tf.matmul` vs `np.dot`*. 

This is where crossfit comes in! You can write your code using numpy, and crossfit takes care of running it in various supported backends including pytorch/jax/tensorflow.

### Consistent API using numpy

Crossfit enables writing your code using numpy & run it in a variaty of different backends. 

`crossnp` can be be used as a decorator or as a context-manager

In [5]:
dot = crossnp(np.dot)

In [6]:
%timeit -n 5 -r 5 dot(x_np, x_np)

517 ms ± 2.93 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


In [7]:
%timeit -n 5 -r 5 dot(x_tf, x_tf)

455 ms ± 37.6 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


🤔 Faster but not as fast as `tf.matmul`...

In [8]:
dot_jitted = tf.function(dot, jit_compile=True)

In [9]:
%timeit -n 5 -r 5 dot_jitted(x_tf, x_tf)

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got builtin_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got builtin_function_or_method
The slowest run took 15.14 times longer than the fastest. This could mean that an intermediate result is being cached.
1.59 ms ± 2.35 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


#### So we need jit-compilation in TF to get speed

In [12]:
from crossfit.array.backend import tf_backend

tf_backend.jit_compile = True

In [14]:
%timeit -n 5 -r 5 dot(x_tf, x_tf)

The slowest run took 19.80 times longer than the fastest. This could mean that an intermediate result is being cached.
1.01 ms ± 1.6 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


### Cross-framework numpy-API means: we can use other tools like sklearn

In [14]:
from sklearn.metrics import accuracy_score

size = int(1e6)
y_true = np.random.randint(2, size=size)
y_pred = np.random.rand(size)
y_true_tf = tf.convert_to_tensor(y_true)
y_pred_tf = tf.convert_to_tensor(y_pred)

cross_accuracy_score = crossnp(accuracy_score)

In [15]:
%timeit -n 5 -r 5 accuracy_score(y_true, y_pred > 0.5)

59.7 ms ± 2.4 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)


In [16]:
%timeit -n 5 -r 5 cross_accuracy_score(y_true_tf, y_pred_tf > 0.5)

6.23 ms ± 110 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)


In [19]:
with crossnp:
    accuracy = accuracy_score(y_true, y_pred > 0.5)
    
accuracy

0.499821