In [1]:
#r "nuget: Microsoft.ML"
using Microsoft.ML;
using Microsoft.ML.Data;
using System.Linq;

This example comes from the ML.NET documentation: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.transformextensionscatalog.dropcolumns?view=ml-dotnet

In [2]:
class InputData
{
    public int Age { get; set; }
    public string Gender { get; set; }
    public string Education { get; set; }
    public float ExtraColumn { get; set; }
}

class TransformedData
{
    public int Age { get; set; }
    public string Gender { get; set; }
    public string Education { get; set; }
}

In [3]:
var mlContext = new MLContext();

In [4]:
var samples = new List<InputData>()
{
    new InputData(){ Age = 21, Gender = "Male", Education = "BS", ExtraColumn = 1 },
    new InputData(){ Age = 23, Gender = "Female", Education = "MBA", ExtraColumn = 2 },
    new InputData(){ Age = 28, Gender = "Male", Education = "PhD", ExtraColumn = 3 },
    new InputData(){ Age = 22, Gender = "Male", Education = "BS", ExtraColumn = 4 },
    new InputData(){ Age = 23, Gender = "Female", Education = "MS", ExtraColumn = 5 },
    new InputData(){ Age = 27, Gender = "Female", Education = "PhD", ExtraColumn = 6 },
};

In [5]:
var dataview = mlContext.Data.LoadFromEnumerable(samples);

In [6]:
var pipeline = mlContext.Transforms.DropColumns("ExtraColumn");

In [7]:
var transformedData = pipeline.Fit(dataview).Transform(dataview);

Now let's take a look at what the DropColumns operations did. We can extract the transformed data as an IEnumerable of InputData, the class we define below. When we try to pull out the Age, Gender, Education and ExtraColumn columns, ML.NET will raise an exception on the ExtraColumn

In [8]:
try
{
    var failingRowEnumerable = mlContext.Data.CreateEnumerable<InputData>(transformedData, reuseRowObject: false);
}
catch (ArgumentOutOfRangeException exception)
{
    Console.WriteLine($"ExtraColumn is not available, so an exception is thrown: {exception.Message}.");
}

ExtraColumn is not available, so an exception is thrown: Could not find  column 'ExtraColumn' (Parameter 'Schema').


In [9]:
mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: false)

index,Age,Gender,Education
0,21,Male,BS
1,23,Female,MBA
2,28,Male,PhD
3,22,Male,BS
4,23,Female,MS
5,27,Female,PhD
