In [7]:
#r "nuget: Microsoft.ML"
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;

This example comes from the ML.NET documentation: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.conversionsextensionscatalog.mapkeytovalue?view=ml-dotnet

In [2]:
class DataPoint
{
    public string Category { get; set; }
    public uint Age { get; set; }
}

class TransformedDataPoint : DataPoint
{
    public uint CategoryHashed { get; set; }
    public uint AgeHashed { get; set; }
}

In [3]:
var mlContext = new MLContext(seed: 1);

In [4]:
var rawData = new[] {
    new DataPoint() { Category = "MLB" , Age = 18 },
    new DataPoint() { Category = "NFL" , Age = 14 },
    new DataPoint() { Category = "NFL" , Age = 15 },
    new DataPoint() { Category = "MLB" , Age = 18 },
    new DataPoint() { Category = "MLS" , Age = 14 },
};

In [5]:
var data = mlContext.Data.LoadFromEnumerable(rawData);

Construct the pipeline that would hash the two columns and store the results in new columns. The first transform hashes the string column and the second transform hashes the integer column.
            
Hashing is not a reversible operation, so there is no way to retrieve the original value from the hashed value. Sometimes, for debugging, or model explainability, users will need to know what values in the original columns generated the values in the hashed columns, since the algorithms will mostly use the hashed values for further computations. The Hash method will preserve the mapping from the original values to the hashed values in the Annotations of the newly created column (column populated with the hashed values). 

Setting the maximumNumberOfInverts parameters to -1 will preserve the full map. If that parameter is left to the default 0 value, the mapping is not preserved.

In [8]:
var pipeline = mlContext.Transforms.Conversion.Hash(
    new[]
    {
            new HashingEstimator.ColumnOptions(
                "CategoryHashed",
                "Category",
                16,
                useOrderedHashing: false,
                maximumNumberOfInverts: -1),

            new HashingEstimator.ColumnOptions(
                "AgeHashed",
                "Age",
                8,
                useOrderedHashing: false)
    }
);

In [9]:
var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);

In [10]:
mlContext.Data.CreateEnumerable<TransformedDataPoint>(transformedData, true)

index,CategoryHashed,AgeHashed,Category,Age
0,16967,204,MLB,18
1,24263,31,NFL,14
2,24263,72,NFL,15
3,16967,204,MLB,18
4,58334,31,MLS,14
