# Keras Preproessing Layers and Usages

* [Working with preprocessing layers](https://www.tensorflow.org/guide/keras/preprocessing_layers)

In [1]:
import tensorflow as tf

# Adapt

Preprocessing layers must be fit before uging them either by initializing them from a precomputed constant, or by "adapting" them on data.

## Example

Fit a Word To Vector layer using the TextVectorization.

* [tf.keras.layers.TextVectorization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization): turns raw strings into an encoded representation that can be read by an Embedding layer or Dense layer.


In [17]:
text_dataset = tf.data.Dataset.from_tensor_slices(["foo", "bar", "baz"])
max_features = 5000  # Maximum vocab size.

word2vec = tf.keras.layers.TextVectorization(
    max_tokens=max_features,
    standardize="lower",
    output_mode='tf_idf',
    sparse=True,
)
word2vec.adapt(text_dataset.batch(64))
print(f"layer adapted {word2vec.is_adapted}")
print(f"vocabrary {word2vec.get_vocabulary()}")
print(f"vocabrary size {word2vec.vocabulary_size()}")

layer adapted True
vocabrary ['[UNK]', 'foo', 'baz', 'bar']
vocabrary size 4


In [26]:
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(1,), dtype=tf.string))
model.add(word2vec)

In [27]:
input_data = tf.constant(["foo", "bar", "baz"])
result = model.predict(input_data)

tf.sparse.to_dense(result)



<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0.        , 0.91629076, 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.91629076],
       [0.        , 0.        , 0.91629076, 0.        ]], dtype=float32)>