In [1]:
#r "nuget: Microsoft.ML"
using Microsoft.ML;
using Microsoft.ML.Data;
using System.Linq;
using static Microsoft.ML.Transforms.NormalizingTransformer;

This example comes from the ML.NET documentation: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.normalizationcatalog.normalizesupervisedbinning?view=ml-dotnet

In [2]:
class DataPoint
{
    [VectorType(4)]
    public float[] Features { get; set; }

    public string Bin { get; set; }
}

In [3]:
var mlContext = new MLContext();

In [4]:
var samples = new List<DataPoint>()
{
    new DataPoint(){ Features = new float[4] { 8, 1, 3, 0}, Bin ="Bin1" },
    new DataPoint(){ Features = new float[4] { 6, 2, 2, 1}, Bin ="Bin2" },
    new DataPoint(){ Features = new float[4] { 5, 3, 0, 2}, Bin ="Bin2" },
    new DataPoint(){ Features = new float[4] { 4,-8, 1, 3}, Bin ="Bin3" },
    new DataPoint(){ Features = new float[4] { 2,-5,-1, 4}, Bin ="Bin3" }
};

In [5]:
var data = mlContext.Data.LoadFromEnumerable(samples);

In [6]:
data = mlContext.Transforms.Conversion.MapValueToKey("Bin").Fit(data).Transform(data);

NormalizeSupervisedBinning normalizes the data by constructing bins based on correlation with the label column and produce output based on to which bin original value belong.

In [7]:
var normalize = mlContext.Transforms.NormalizeSupervisedBinning("Features", labelColumnName: "Bin", mininimumExamplesPerBin: 1, fixZero: false);

NormalizeSupervisedBinning normalizes the data by constructing bins based on correlation with the label column and produce output based on to which bin original value belong but make sure zero values would remain zero after normalization. Helps preserve sparsity.

In [8]:
var normalizeFixZero = mlContext.Transforms.NormalizeSupervisedBinning("Features", labelColumnName: "Bin", mininimumExamplesPerBin: 1, fixZero: true);

In [9]:
var normalizeTransform = normalize.Fit(data);
var transformedData = normalizeTransform.Transform(data);
var normalizeFixZeroTransform = normalizeFixZero.Fit(data);
var fixZeroData = normalizeFixZeroTransform.Transform(data);

In [10]:
transformedData.GetColumn<float[]>("Features")

index,value
0,"[ 1, 0.5, 1, 0 ]"
1,"[ 0.5, 1, 0, 0.5 ]"
2,"[ 0.5, 1, 0, 0.5 ]"
3,"[ 0, 0, 0, 1 ]"
4,"[ 0, 0, 0, 1 ]"


In [11]:
fixZeroData.GetColumn<float[]>("Features")

index,value
0,"[ 1, 0, 1, 0 ]"
1,"[ 0.5, 0.5, 0, 0.5 ]"
2,"[ 0.5, 0.5, 0, 0.5 ]"
3,"[ 0, -0.5, 0, 1 ]"
4,"[ 0, -0.5, 0, 1 ]"


Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters. If we have multiple columns transformations we need to pass index of InputOutputColumnPair.

In [12]:
normalizeTransform.GetNormalizerModelParameters(0)

UpperBounds,Density,Offset
"[ [ 4.5, 7, Infinity ], [ -2, 1.5, Infinity ], [ 2.5, Infinity ], [ 0.5, 2.5, Infinity ] ]","[ 2, 2, 1, 2 ]",[ ]


In [13]:
normalizeFixZeroTransform.GetNormalizerModelParameters(0)

UpperBounds,Density,Offset
"[ [ 4.5, 7, Infinity ], [ -2, 1.5, Infinity ], [ 2.5, Infinity ], [ 0.5, 2.5, Infinity ] ]","[ 2, 2, 1, 2 ]","[ 0, 0.5, 0, 0 ]"
