# Passing data around

The heart of your machine learning pipeline will often be a component that iteratively presents batches of training examples to your model.

This is frequently referred to as the training loop.

In [1]:
import torch
import numpy as np

xs = np.array([[1.7], [0.3]])
ys = np.array([[2], [4]])
train_set = zip(xs, ys)

class Model(torch.nn.modules.Module):
    def forward(self, x, y):
        print(f'Learning from x={x} and y={y}')
    
m = Model()
for x, y in train_set:
    m(x, y)

Learning from x=[ 1.7] and y=[2]
Learning from x=[ 0.3] and y=[4]


But what if we would like the same training loop to work with models of greater complexity?

For instance, we might have a model that takes in continuous values (floats) and embedding indices (integers) — these cannot coexist in a single Numpy ndarray or a PyTorch Tensor and we need to pass them in separately.

In [2]:
cont_values = np.array([[1.7], [0.3]])
embedding_idxs = np.array([[1], [3]])
ys = np.array([[2], [4]])
train_set = zip(cont_values, embedding_idxs, ys)

class EmbeddingModel(torch.nn.modules.Module):
    def forward(self, cont, idx, y):
        print(f'Learning from cont={cont}, idx={idx} and y={y}')
    
em = EmbeddingModel()
for x, y in train_set:
    em(x, y)

ValueError: too many values to unpack (expected 2)

Extended Iterable Unpacking ([PEP 3132](https://www.python.org/dev/peps/pep-3132/)) to the rescue!

In [3]:
train_set = zip(cont_values, embedding_idxs, ys)

for *x, y in train_set: # <- this is where the magic happens
    em(*x, y)

Learning from cont=[ 1.7], idx=[1] and y=[2]
Learning from cont=[ 0.3], idx=[3] and y=[4]


In essence, this is what is going on

In [4]:
lst = [[0, 1, 2, 3, 'last']]
for *all_but_last, last in lst:
    pass

all_but_last

[0, 1, 2, 3]

But the utility of this does not stop here! You can also do things such as the
following

In [5]:
frst, *mid, last = ['first', 0, 1, 2, 'last']
mid

[0, 1, 2]

As the name suggests (Extended Iterable Unpacking), this will work with any
iterable


In [6]:
frst, *mid, last = (item for item in ['first', 0, 1, 2, 'last'])

mid

[0, 1, 2]

In [7]:
*everything, = range(5)
everything

[0, 1, 2, 3, 4]

When presented using a contrived example, this might seem gimmicky and more of
a party trick than something genuinely useful. In the wild however, this
little asterisk in front of your variable name can be very handy — it can
grant your code flexibility and re-usability that otherwise would be hard to
attain and might require quite a few additional lines of code.