# Pax Workshop
## Inputs in Pax - training

This colab demonstrates how inputs in Pax work.


In [None]:
from praxis import base_input
from praxis import base_hyperparams
from paxml import seqio_input
import numpy as np

Let's start with a SeqIO input using the wsc training data.

In [None]:
import t5.data.tasks
p = seqio_input.SeqIOInput.HParams(
    mixture_name='super_glue_wsc_v102_simple_train',
    split_name='train',
    task_feature_lengths={'targets': 1280},
    feature_converter=seqio_input.LanguageModelFeatures(pack=True),
    is_training=True,
    use_cached=False,
    input_random_seed=123,
    batch_size=4)
inp = base_hyperparams.instantiate(p)

In [None]:
# Get a batch, inspect the spec of the data
batch = inp.get_next()
for k, v in batch.FlattenItems():
  print(k, v.shape, v.dtype)

In [None]:
# The data is packed
for _ in range(4):
  batch = inp.get_next()
  print('segments: ', np.max(batch.segment_ids, axis=1))


We set `input_random_seed=123` on the input hparams. What happens with `inp.reset()`? Does it reproduce the same data?

What about if we re-instantiate the input object?

In [None]:
# Tweak some fields
p2 = p.clone().set(infeed_host_index=0, num_infeed_hosts=2, shuffle=False)
# disable packing
p2.feature_converter = seqio_input.LanguageModelFeatures(pack=False)
inp2 = base_hyperparams.instantiate(p2)

p2_complement = p2.clone().set(infeed_host_index=1)
inp2_complement = base_hyperparams.instantiate(p2_complement)

batches = [inp2.get_next(), inp2_complement.get_next()]

In [None]:
inp.ids_to_strings(batches[0].ids, [1280] * 4)

Now inspect the data from `inp2_complement`. Verify that it does not overlap with the data from `inp2`. Does this hold if we run more batches?

In [None]:
# The data is also no longer packed
np.max(batches[0].segment_ids, axis=1)