# Multi-dimensional

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/yggdrasil-decision-forests/blob/main/documentation/public/docs/tutorial/multidimensional_feature.ipynb)

## Setup


In [None]:
pip install ydf -U

In [None]:
import ydf
import numpy as np

## What are Multi-dimensional features?

**Multi-dimensional** features are model inputs with multiple dimensions. For example, feeding multiple timestamps of a time series or the value of different pixels in an image are multi-dimensional features. They are different from single-dimensional features, which only have one dimension. Each dimension of a multi-dimensional feature is treated as an individual single-dimensional feature.

Multi-dimensional features are fed as multi-dimensional arrays such as Numpy array or TensorFlow vectors. The next example shows a toy example of feeding a multi-dimensional feature to a model.

## Create a multi-dimensional dataset

The simplest way to create a multi-dimensional dataset is to use a dictionary of multi-dimensional NumPy arrays.

In [None]:
def create_dataset(num_examples):
  # Generates random feature values.
  dataset = {
      # f1 is a 4 multi-dimensional feature.
      "f1": np.random.uniform(size=(num_examples, 4)),
      # f2 is a single-dimensional feature.
      "f2": np.random.uniform(size=(num_examples)),
  }

  # Add a synthetic label
  noise = np.random.uniform(size=num_examples)
  dataset["label"] = (
      np.sum(dataset["f1"], axis=1) + dataset["f2"] * 0.2 + noise
  ) >= 2.0
  return dataset


print("A dataset with 5 examples:")
create_dataset(num_examples=5)

A dataset with 5 examples:


{'f1': array([[0.5373759 , 0.18098291, 0.74489824, 0.27706572],
        [0.4517745 , 0.37578001, 0.45156836, 0.05413219],
        [0.77036813, 0.1640734 , 0.47994649, 0.06315383],
        [0.44115416, 0.95749836, 0.80662146, 0.78114808],
        [0.40393628, 0.22786682, 0.32477702, 0.18309577]]),
 'f2': array([0.02058218, 0.94332705, 0.25678716, 0.02122367, 0.04498769]),
 'label': array([False,  True, False,  True, False])}

## Train model

Training a model on multi-dimensional features is similar to training a model on single-dimension features.

In [None]:
train_ds = create_dataset(num_examples=10000)
model = ydf.GradientBoostedTreesLearner(label="label").train(train_ds)

Train model on 10000 examples
Model trained in 0:00:02.789326


## Model understanding

When interpreting the model, each dimension of the multi-dimensional feature is treated independently. For example, describing the model would show each dimension individually.


In [None]:
model.describe()

Analyzing the model and predictions also shows each dimension individually.

In [None]:
test_ds = create_dataset(num_examples=10000)
model.analyze(test_ds)