# Creating WASM ufuncs for `DataFrames` with `witxcraft` 

Import the `fromwasmmod` function from the  `witxcraft.ufunc` module. This function will read a WASM module and wrap the functions in it for use with `DataFrame/Series.apply` methods. 

In [25]:
import witxcraft as wc

The `fromwasmmod` function takes a filename or URL to a WASM module, or the raw contents of a WASM module. Either compiled WASM or WATX maybe used.

In [26]:
funcs = wc.fromwasmmod('df.wasm')

Using `dir` we can see the WASM functions that are available.

In [27]:
[x for x in dir(funcs) if not x.startswith('_')]

['mult', 'mult_vec', 'square', 'square_vec']

## Using WASM functions with pandas

To demonstrate WASM functions on `pandas` objects, we'll first load some data into a `DataFrame`.

In [28]:
import pandas as pd
import numpy as np

Generate some numeric data to work with.

In [29]:
data_len = 100000
df = pd.DataFrame(dict(X=np.random.randint(0, 50, size=data_len),
                       Y=np.random.randint(0, 50, size=data_len),
                       Z=np.random.randint(0, 50, size=data_len)))
df

Unnamed: 0,X,Y,Z
0,9,6,26
1,24,48,33
2,19,35,27
3,27,15,20
4,36,3,46
...,...,...,...
99995,49,42,17
99996,10,8,15
99997,5,1,38
99998,2,6,8


### Simple example of a WASM function with scalar input and output

The `square` function simply squares the given value.

In [30]:
funcs.square(10)

100

In [31]:
funcs.mult(10, 5)

50

In [32]:
funcs.square_vec([5, 10])

0     25
1    100
dtype: int64

In [33]:
funcs.mult_vec([5, 10], [2, 3])

0    10
1    30
dtype: int64

### Using the WASM functions with pandas data

This function can be applied to a `pandas.Series` using the `apply` method as follows.

In [45]:
%time df.X.apply(funcs.mult, args=[10])

CPU times: user 4.61 s, sys: 0 ns, total: 4.61 s
Wall time: 4.61 s


0         90
1        240
2        190
3        270
4        360
        ... 
99995    490
99996    100
99997     50
99998     20
99999      0
Name: X, Length: 100000, dtype: int64

Since the functions support the `ufunc` API, you can also apply them as a function call with the `pandas.Series` as the argument.

In [46]:
%time funcs.mult(df.X, 10)

CPU times: user 4.44 s, sys: 0 ns, total: 4.44 s
Wall time: 4.44 s


0         90
1        240
2        190
3        270
4        360
        ... 
99995    490
99996    100
99997     50
99998     20
99999      0
Name: X, Length: 100000, dtype: object

Using the vector version of the `mult` func, we can send two `Series` objects in. Rather than allowing the `DataFrame` to iterate over the objects and apply a new function call on each pair, this version will push the data from both objects into the WASM memory and do all of the work in WASM. This eliminates much of the WASM call overhead.

### Using a vectorized WASM function with pandas data

Vector versions of the functions copy the entire data vector into WASM memory before operating on it. This reduces the WASM function call overhead.

In [47]:
%time funcs.mult_vec(df.X, df.Y)

CPU times: user 63.6 ms, sys: 0 ns, total: 63.6 ms
Wall time: 61.9 ms


0          54
1        1152
2         665
3         405
4         108
         ... 
99995    2058
99996      80
99997       5
99998      12
99999       0
Length: 100000, dtype: int64

### Reimplemented as standard python functions

In [37]:
class PyFuncs:
    
    def mult(self, a, b):
        return a * b
    
    def square(self, a):
        return a ** 2
    
    def square_vec(self, arr):
        return [x ** 2  for x in arr] 
    
    def mult_vec(self, arr1, arr2):
        return [a*b for a,b in zip(arr1,arr2)]

In [38]:
import numpy as np

# Printing these to make sure they match the above.
pyfuncs = PyFuncs()  
print(pyfuncs.square(10))
print(pyfuncs.mult(10, 5))
print(pyfuncs.square_vec([5, 10]))
print(pyfuncs.mult_vec([5, 10], [2, 3]))

100
50
[25, 100]
[10, 30]


In [48]:
%time df.X.apply(pyfuncs.mult, args=[10])

CPU times: user 54.1 ms, sys: 0 ns, total: 54.1 ms
Wall time: 52.7 ms


0         90
1        240
2        190
3        270
4        360
        ... 
99995    490
99996    100
99997     50
99998     20
99999      0
Name: X, Length: 100000, dtype: int64

In [54]:
%time pyfuncs.mult(df.X, 10)

CPU times: user 570 µs, sys: 684 µs, total: 1.25 ms
Wall time: 677 µs


0         90
1        240
2        190
3        270
4        360
        ... 
99995    490
99996    100
99997     50
99998     20
99999      0
Name: X, Length: 100000, dtype: int64

In [53]:
%time pyfuncs.mult_vec(df.X, df.Y)

CPU times: user 19.1 ms, sys: 0 ns, total: 19.1 ms
Wall time: 18.5 ms


[54,
 1152,
 665,
 405,
 108,
 13,
 775,
 1075,
 124,
 1320,
 962,
 432,
 975,
 240,
 88,
 225,
 0,
 1520,
 1120,
 351,
 1287,
 2064,
 407,
 217,
 1840,
 798,
 612,
 560,
 360,
 252,
 435,
 990,
 1200,
 875,
 1560,
 432,
 1316,
 704,
 740,
 616,
 1568,
 836,
 1023,
 300,
 0,
 456,
 165,
 1911,
 684,
 684,
 1980,
 40,
 1148,
 1152,
 665,
 343,
 280,
 336,
 255,
 217,
 2115,
 276,
 105,
 696,
 160,
 140,
 0,
 1806,
 12,
 387,
 861,
 798,
 315,
 528,
 94,
 90,
 475,
 336,
 625,
 910,
 117,
 0,
 1548,
 612,
 1170,
 6,
 638,
 168,
 902,
 418,
 570,
 392,
 396,
 416,
 182,
 68,
 576,
 814,
 792,
 1104,
 816,
 165,
 0,
 0,
 780,
 868,
 1247,
 238,
 672,
 924,
 992,
 675,
 840,
 217,
 0,
 80,
 441,
 30,
 12,
 782,
 792,
 348,
 1127,
 1176,
 333,
 24,
 143,
 444,
 1763,
 920,
 760,
 832,
 1200,
 637,
 518,
 1118,
 714,
 76,
 14,
 950,
 1242,
 410,
 630,
 378,
 1482,
 874,
 196,
 494,
 84,
 261,
 0,
 135,
 40,
 0,
 2205,
 1591,
 720,
 968,
 518,
 154,
 680,
 2205,
 704,
 252,
 1584,
 2303,
 1012

### Standard python functions compiled to Wasm

#### <font color='red'>TODO</font> 

### Same example above without Wasm (c code called from Python via some witchcraft TBD)

#### <font color='red'>TODO</font> 