# AddMetadata

In [1]:
import numpy as np
import cudf
import nvtabular as nvt
from merlin.schema.tags import Tags

In [2]:
purchases = cudf.DataFrame(
    data={'user_id': [0, 1, 2, 2],
          'price': [125.04, 23.07, 101.2, 2.34],
          'color': ['blue', 'blue', 'red', 'yellow'],
          'model': ['deluxe', 'compact', 'regular', 'regular']
})
purchases

Unnamed: 0,user_id,price,color,model
0,0,125.04,blue,deluxe
1,1,23.07,blue,compact
2,2,101.2,red,regular
3,2,2.34,yellow,regular


There are various ways to add metadata. Let's explore them, starting with the most general one.

In [3]:
# we can add `tags` using the `AddMetadata` op
out = ['price'] >> nvt.ops.AddMetadata(tags=[Tags.TARGET])

# there are also shorthands we can use
out += ['price'] >> nvt.ops.AddTags(tags=[Tags.CONTINUOUS])
out += ['user_id'] >> nvt.ops.TagAsUserID()
out += ['color', 'model'] >> nvt.ops.TagAsItemFeatures()
out += ['color', 'model'] >> nvt.ops.AddTags(tags=[Tags.CATEGORICAL])

ds = nvt.Dataset(purchases)
wf = nvt.Workflow(out)

ds_out = wf.fit_transform(ds)
ds_out.schema

Unnamed: 0,name,tags,dtype,is_list,is_ragged
0,price,"(Tags.CONTINUOUS, Tags.TARGET)",float64,False,False
1,user_id,"(Tags.USER, Tags.USER_ID)",int64,False,False
2,color,"(Tags.CATEGORICAL, Tags.ITEM)",object,False,False
3,model,"(Tags.CATEGORICAL, Tags.ITEM)",object,False,False


We can now use these tags to selectively apply preprocessing steps.

In [4]:
cats = nvt.ColumnSelector(tags=[Tags.CATEGORICAL]) >> nvt.ops.Categorify()

wf = nvt.Workflow(cats)

ds_final = wf.fit_transform(ds_out)
ds_final.compute()



Unnamed: 0,color,model
0,1,3
1,1,2
2,2,1
3,3,1


By semantically tagging your data you make your code easier to read and more concise.

Additionally, this information will be picked up and reused by the Merlin Framework in subsequent stages of working on your model.

This translates to faster iteration speed and smaller chance of introducing bugs.

See [here](https://github.com/NVIDIA-Merlin/models/blob/main/examples/07-Train-an-xgboost-model-using-the-Merlin-Models-API.ipynb) for an example of how the information you provide lends itself to constructing and training a model, and [here](https://github.com/NVIDIA-Merlin/systems/blob/main/examples/Serving-Ranking-Models-With-Merlin-Systems.ipynb) for an example of how tagging can streamline model serving.