In [1]:
#r "nuget: Microsoft.ML"
using Microsoft.ML;
using Microsoft.ML.Data;
using System.Linq;
using static Microsoft.ML.Transforms.NormalizingTransformer;

This example comes from the ML.NET documentation: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.normalizationcatalog.normalizemeanvariance?view=ml-dotnet

In [4]:
class DataPoint
{
    [VectorType(4)]
    public float[] Features { get; set; }
}

In [3]:
var mlContext = new MLContext();

In [5]:
var samples = new List<DataPoint>()
{
    new DataPoint(){ Features = new float[4] { 1, 1, 3, 0} },
    new DataPoint(){ Features = new float[4] { 2, 2, 2, 0} },
    new DataPoint(){ Features = new float[4] { 0, 0, 1, 0} },
    new DataPoint(){ Features = new float[4] {-1,-1,-1, 1} }
};

In [6]:
var data = mlContext.Data.LoadFromEnumerable(samples);

NormalizeMeanVariance normalizes the data based on the computed mean and variance of the data. Uses Cumulative distribution function as output.

In [7]:
var normalize = mlContext.Transforms.NormalizeMeanVariance("Features", useCdf: true);

NormalizeMeanVariance normalizes the data based on the computed mean and variance of the data.

In [8]:
var normalizeNoCdf = mlContext.Transforms.NormalizeMeanVariance("Features", useCdf: false);

In [14]:
var normalizeTransform = normalize.Fit(data);
var transformedData = normalizeTransform.Transform(data);
var normalizeNoCdfTransform = normalizeNoCdf.Fit(data);
var noCdfData = normalizeNoCdfTransform.Transform(data);

In [15]:
transformedData.GetColumn<float[]>("Features")

index,value
0,"[ 0.67262894, 0.67262894, 0.8816018, 0.28187096 ]"
1,"[ 0.9101218, 0.9101218, 0.6939406, 0.28187096 ]"
2,"[ 0.32737106, 0.32737106, 0.4328869, 0.28187096 ]"
3,"[ 0.08987821, 0.08987821, 0.06409359, 0.95839834 ]"


In [13]:
noCdfData.GetColumn<float[]>("Features")

index,value
0,"[ 0.81649655, 0.81649655, 1.5491934, 0 ]"
1,"[ 1.6329931, 1.6329931, 1.0327955, 0 ]"
2,"[ 0, 0, 0.5163978, 0 ]"
3,"[ -0.81649655, -0.81649655, -0.5163978, 2 ]"


Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters. If we have multiple columns transformations we need to pass index of InputOutputColumnPair.

In [16]:
normalizeTransform.GetNormalizerModelParameters(0)

Mean,StandardDeviation,UseLog
"[ 0.5, 0.5, 1.25, 0.25 ]","[ 1.118034, 1.118034, 1.47902, 0.4330127 ]",False


In [17]:
normalizeNoCdfTransform.GetNormalizerModelParameters(0)

Scale,Offset
"[ 0.81649655, 0.81649655, 0.5163978, 2 ]",[ ]
