# Creating WASM ufuncs for `DataFrames` with `witxcraft` 

Import the `fromwasmmod` function from the  `witxcraft.ufunc` module. This function will read a WASM module and wrap the functions in it for use with `DataFrame/Series.apply` methods. 

In [1]:
import witxcraft as wc

The `fromwasmmod` function takes a filename or URL to a WASM module, or the raw contents of a WASM module. Either compiled WASM or WATX maybe used.

In [2]:
funcs = wc.fromwasmmod('df.wasm')

Using `dir` we can see the WASM functions that are available.

In [3]:
[x for x in dir(funcs) if not x.startswith('_')]

['mult', 'mult_vec', 'square', 'square_vec']

## Using WASM functions with pandas

To demonstrate WASM functions on `pandas` objects, we'll first load some data into a `DataFrame`.

In [4]:
import pandas as pd
import numpy as np

Generate some numeric data to work with.

In [5]:
data_len = 100000
df = pd.DataFrame(dict(X=np.random.randint(0, 50, size=data_len),
                       Y=np.random.randint(0, 50, size=data_len),
                       Z=np.random.randint(0, 50, size=data_len)))
df

Unnamed: 0,X,Y,Z
0,12,43,43
1,29,44,42
2,30,30,6
3,31,3,2
4,17,21,2
...,...,...,...
99995,12,44,29
99996,9,13,38
99997,31,33,37
99998,27,37,31


### Simple example of a WASM function with scalar input and output

Provided Wasm

In [51]:
print("Square {}\n".format(funcs.square(10)))

print("Mult {}\n".format(funcs.mult(10, 5)))

sqv = funcs.square_vec([5, 10])
print("SquareVect\n{}\ntype{}\n".format(sqv, type(sqv)))

mv = funcs.mult_vec([5, 10], [2, 3])
print("MultVect\n{}\ntype{}\n".format(mv, type(mv)))

Square 100

Mult 50

SquareVect
0     25
1    100
dtype: int64
type<class 'pandas.core.series.Series'>

MultVect
0    10
1    30
dtype: int64
type<class 'pandas.core.series.Series'>



### Those same functions in Native Python

In [53]:
class PyFuncs:
    import pandas as pd
    
    def mult(self, a, b):
        return a * b
    
    def square(self, a):
        return a ** 2
    
    def square_vec(self, arr):
        return pd.Series([x ** 2  for x in arr]).astype('int64')
    
    def mult_vec(self, arr1, arr2):
        return pd.Series([a*b for a,b in zip(arr1,arr2)]).astype('int64')

In [54]:
# Printing these to make sure they match the above.
pyfuncs = PyFuncs()  

print("Square {}\n".format(pyfuncs.square(10)))

print("Mult {}\n".format(pyfuncs.mult(10, 5)))

sqv = pyfuncs.square_vec([5, 10])
print("SquareVect\n{}\ntype{}\n".format(sqv, type(sqv)))

mv = pyfuncs.mult_vec([5, 10], [2, 3])
print("MultVect\n{}\ntype{}\n".format(mv, type(mv)))

Square 100

Mult 50

SquareVect
0     25
1    100
dtype: int64
type<class 'pandas.core.series.Series'>

MultVect
0    10
1    30
dtype: int64
type<class 'pandas.core.series.Series'>



### Using the WASM functions with pandas data

This function can be applied to a `pandas.Series` using the `apply` method as follows.

#### <font color='green'>First with C->Wasm </font> 

In [9]:
%time df.X.apply(funcs.mult, args=[10])

CPU times: user 4.77 s, sys: 0 ns, total: 4.77 s
Wall time: 4.77 s


0        120
1        290
2        300
3        310
4        170
        ... 
99995    120
99996     90
99997    310
99998    270
99999     70
Name: X, Length: 100000, dtype: int64

#### <font color='magenta'>Now with Native Python </font> 

In [10]:
%time df.X.apply(pyfuncs.mult, args=[10])

CPU times: user 51.4 ms, sys: 1.27 ms, total: 52.7 ms
Wall time: 51.5 ms


0        120
1        290
2        300
3        310
4        170
        ... 
99995    120
99996     90
99997    310
99998    270
99999     70
Name: X, Length: 100000, dtype: int64

#### Make sure this isn't nonsense

In [58]:
assert( df.X.apply(pyfuncs.mult, args=[10]).all() ==  df.X.apply(funcs.mult, args=[10]).all())

Since the functions support the `ufunc` API, you can also apply them as a function call with the `pandas.Series` as the argument.

#### <font color='green'>First with C->Wasm </font> 

In [11]:
%time funcs.mult(df.X, 10)

CPU times: user 4.69 s, sys: 0 ns, total: 4.69 s
Wall time: 4.69 s


0        120
1        290
2        300
3        310
4        170
        ... 
99995    120
99996     90
99997    310
99998    270
99999     70
Name: X, Length: 100000, dtype: object

#### <font color='magenta'>Now with Native Python </font> 

In [12]:
%time pyfuncs.mult(df.X, 10)

CPU times: user 1.39 ms, sys: 0 ns, total: 1.39 ms
Wall time: 757 µs


0        120
1        290
2        300
3        310
4        170
        ... 
99995    120
99996     90
99997    310
99998    270
99999     70
Name: X, Length: 100000, dtype: int64

#### Make sure not nonsense

In [60]:
assert( pyfuncs.mult(df.X, 10).all() ==  funcs.mult(df.X, 10).all())

Using the vector version of the `mult` func, we can send two `Series` objects in. Rather than allowing the `DataFrame` to iterate over the objects and apply a new function call on each pair, this version will push the data from both objects into the WASM memory and do all of the work in WASM. This eliminates much of the WASM call overhead.

### Using a vectorized WASM function with pandas data

Vector versions of the functions copy the entire data vector into WASM memory before operating on it. This reduces the WASM function call overhead.

#### <font color='green'>First with C->Wasm </font> 

In [61]:
%time funcs.mult_vec(df.X, df.Y)

CPU times: user 56.2 ms, sys: 0 ns, total: 56.2 ms
Wall time: 55 ms


0         516
1        1276
2         900
3          93
4         357
         ... 
99995     528
99996     117
99997    1023
99998     999
99999     154
Length: 100000, dtype: int64

#### <font color='magenta'>Now with Native Python </font> 

In [62]:
%time pyfuncs.mult_vec(df.X, df.Y)

CPU times: user 40.1 ms, sys: 0 ns, total: 40.1 ms
Wall time: 38.8 ms


0         516
1        1276
2         900
3          93
4         357
         ... 
99995     528
99996     117
99997    1023
99998     999
99999     154
Length: 100000, dtype: int64

#### Test

In [64]:
assert( pyfuncs.mult_vec(df.X, df.Y).all() ==  funcs.mult_vec(df.X, df.Y).all())

### Standard python functions compiled to Wasm

#### <font color='red'>TODO</font> 

### Same example above without Wasm (c code called from Python via some witchcraft TBD)

#### <font color='red'>TODO</font> 