# Numerical

The way a feature is treated depends on its [semantic](utilities/#Semantic), such as numerical, categorical, boolean, or text. If the semantic is not specified, it is inferred automatically. For example, float and integer features are detected as numerical, while strings are detected as categorical.

A numerical feature can represent a quantity or counts. For example, the age of a person, or the number of items in a bag. Missing numerical values are represented by `math.nan`.

Let's train an example on a floating point feature.

In [2]:
import ydf
import pandas as pd

In [3]:
dataset = pd.DataFrame({
    "label": [True, False, True, False],
    "feature_1": [1, 2, 2, 1],
    "feature_2": [0.1, 0.8, 0.9, 0.1],
})

model = ydf.RandomForestLearner(label="label").train(dataset)

Train model on 4 examples
Model trained in 0:00:00.009728


We can see the feature is detected as numerical in the **Dataspec** tab.

In [3]:
model.describe()

Sometime, you might want to force a feature's semantic to be numerical.

In the next example, "feature_1" and "feature_2" look boolean. However, we want "feature_1" to be numerical.

In the model description, notice that "feature_1" is numerical, while "feature_2" is boolean.

In [4]:
dataset = pd.DataFrame({
    "label": [True, False, True, False],
    "feature_1": [True, True, False, False],
    "feature_2": [True, False, True, False],
})

model = ydf.RandomForestLearner(label="label",
                                features=[ydf.Feature("feature_1", ydf.Semantic.NUMERICAL)],
                                include_all_columns=True,
                                ).train(dataset)
# Note: include_all_columns=True allows the model to use all the
# columns as features, not just the ones in "features".

model.describe()

Train model on 4 examples
Model trained in 0:00:00.004133


Let's create some missing values.

In the **Dataspec** tabe, notice `num-nas:2 (50%)` for "feature_2". It means that "feature_2" contains two missing values (i.e., 50% of the values are missing).

In [6]:
import math

dataset = pd.DataFrame({
    "label": [True, False, True, False],
    "feature_1": [1, 2, 2, 1],
    "feature_2": [0.1, 0.8, math.nan, math.nan],
})

model = ydf.RandomForestLearner(label="label").train(dataset)
model.describe()

Train model on 4 examples
Model trained in 0:00:00.005587
