# Torcharrow batching

## Basics

Torcharrow supports batching by simply adding a `df.batch(n)` call returning an iterator which successivly returns `df[0:n]`, then `df[n:2*n]`, etc. The static method `collate` is its inverse; it takes an iterator of columns or dataframes and reassembles it as whole.

In [1]:
import torcharrow as T

c = T.Column([1,2,3,4,5,6,7])
print("Original:", str(c))
print()

for b in c.batch(2):
    print("Batch:", b)

print()
print("Reassemebled original:", str(T.IColumn.collate(c.batch(2))))   

Original: Column([1, 2, 3, 4, 5, 6, 7], id = c0)

Batch: Column([1, 2], id = c1)
Batch: Column([3, 4], id = c2)
Batch: Column([5, 6], id = c3)
Batch: Column([7], id = c4)

Reassemebled original: Column([1, 2, 3, 4, 5, 6, 7], id = c10)


## Leveraging Python functools 
Batching works well with Python iterators. For instance you can use all functools like map, filter reduce  with them. Below we add the batch iteration number to each row of a dataframe.


In [2]:

list(str((d+i)) for i,d in enumerate(c.batch(2)))

['Column([1, 2], id = c12)',
 'Column([4, 5], id = c14)',
 'Column([7, 8], id = c16)',
 'Column([10], id = c18)']

## Leveraging Python's itertools

Similarly all iteration tools work. Below we use the `takewhile` operator to collect the prefix of all batches where each batch sum is smaller 6.

In [5]:
import itertools

list((str(i) for i in itertools.takewhile(lambda c: c.sum()<8, c.batch(2))))

['Column([1, 2], id = c21)', 'Column([3, 4], id = c22)']