In [1]:
#r "nuget: Microsoft.ML"
using Microsoft.ML;
using Microsoft.ML.Data;
using System.Linq;
using static Microsoft.ML.Transforms.NormalizingTransformer;

This example comes from the ML.NET documentation: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.normalizationcatalog.normalizelogmeanvariance?view=ml-dotnet

In [2]:
class DataPoint
{
    [VectorType(5)]
    public float[] Features { get; set; }
}

In [3]:
var mlContext = new MLContext();

In [4]:
var samples = new List<DataPoint>()
{
    new DataPoint(){ Features = new float[5] { 1, 1, 3, 0, float.MaxValue } },
    new DataPoint(){ Features = new float[5] { 2, 2, 2, 0, float.MinValue } },
    new DataPoint(){ Features = new float[5] { 0, 0, 1, 0, 0} },
    new DataPoint(){ Features = new float[5] {-1,-1,-1, 1, 1} }
};

In [5]:
var data = mlContext.Data.LoadFromEnumerable(samples);

NormalizeLogMeanVariance normalizes the data based on the computed mean and variance of the logarithm of the data. Uses Cumulative distribution function as output.

In [6]:
var normalize = mlContext.Transforms.NormalizeLogMeanVariance("Features", useCdf: true);

NormalizeLogMeanVariance normalizes the data based on the computed mean and variance of the logarithm of the data.

In [8]:
var normalizeNoCdf = mlContext.Transforms.NormalizeLogMeanVariance("Features", useCdf: false);

In [11]:
var normalizeTransform = normalize.Fit(data);
var transformedData = normalizeTransform.Transform(data);
var normalizeNoCdfTransform = normalizeNoCdf.Fit(data);
var noCdfData = normalizeNoCdfTransform.Transform(data);

In [12]:
transformedData.GetColumn<float[]>("Features")

index,value
0,"[ 0.15869737, 0.15869737, 0.8654407, 0, 0.84130263 ]"
1,"[ 0.84130263, 0.84130263, 0.58371305, 0, 0 ]"
2,"[ 0, 0, 0.09399668, 0, 0 ]"
3,"[ 0, 0, 0, 0, 0.15869737 ]"


In [10]:
noCdfData.GetColumn<float[]>("Features")

index,value
0,"[ 1.88539, 1.88539, 5.2970223, 0, 7.670682E+36 ]"
1,"[ 4.77078, 4.77078, 3.0924528, 0, -7.670682E+36 ]"
2,"[ -1, -1, 0.88788337, 0, -1 ]"
3,"[ -3.88539, -3.88539, -3.5212553, 0, -0.9774579 ]"


Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters. If we have multiple columns transformations we need to pass index of InputOutputColumnPair.

In [13]:
normalizeTransform.GetNormalizerModelParameters(0)

Mean,StandardDeviation,UseLog
"[ 0.3465736, 0.3465736, 0.59725314, 0, 44.36142 ]","[ 0.3465736, 0.3465736, 0.45360336, 0, 44.36142 ]",True


ERF is https://en.wikipedia.org/wiki/Error_function.

Expected output:
- The 1-index value in resulting array would be produce by:
 - y = 0.5* (1 + ERF((Math.Log(x)- 0.3465736) / (0.3465736 * sqrt(2)))

In [14]:
normalizeNoCdfTransform.GetNormalizerModelParameters(0)

Scale,Offset
"[ 2.88539, 2.88539, 2.2045693, 0, 0.02254211 ]","[ 0.3465736, 0.3465736, 0.59725314, 0, 44.36142 ]"
