# Mapping

Using the [DwC-A_dotnet.Mapping](https://www.nuget.org/packages/DwC-A_dotnet.Mapping/) library we can map data from a [DwC-A_dotnet](https://www.nuget.org/packages/DwC-A_dotnet/) IRow to a strongly typed class.  There are two different approaches to mapping data

1. Using the dwca-codegen magic command to generate a class and mapping method from archive metadata.  This requires the least code and is more interactive but can run into more issues.
1. Manually create a class definition and mapping method.  This gives the most control over mapping but requires more effort. 

In [None]:
#r "nuget:DwC-A_dotnet.Interactive,0.1.9-Pre"
#r "nuget:DwC-A_dotnet.Mapping,0.6.3"
#r "nuget:Microsoft.ML"

## Using The dwca-codegen Magic Command

First, we will map using the dwca-codegen magic command.

### Create Configuration

This step is optional but gives more control over how classes are generated and mapped.  If this step is left out all properties will be mapped as string type.

Use the GeneratorConfigurationBuilder to create a configuration to influence how the dwca-codegen command generates the class to be mapped for each file in the archive.

Use the AppProperty method to define the properties that will be added to the generated classes and how they will be mapped to specific terms.  Use the wildcard term __*__ to determine whether properties will be created for all other undefined terms and mapped.

The WithMapMethod creates a static method on the type called MapRow that will be used to map an IRow row to an instance of the generated class.

In [None]:
using DwC_A.Interactive.Mapping;
using DwC_A.Terms;

var config = new GeneratorConfigurationBuilder()
.AddProperty("*", "string", true)
.AddProperty(Terms.decimalLatitude, "double", true, "Latitude")
.AddProperty(Terms.decimalLongitude, "double", true, "Longitude")
.AddProperty(Terms.dateIdentified, "DateTime")
.WithMapMethod(true)
.Build();

config

### dwca-codegen

Use the dwca-codegen command to examine the archive and generate classes to map data into.  The --configName option can be used to specify the name of the variable that contains the configuration information we created earlier. 

In [None]:
#!dwca-codegen -h
#!dwca-codegen -c config ./data/Papilionidae.zip

## Create Mapper

Now that we have class definitions and a mapping method we can define a mapper as follows.

In [None]:
using DwC_A.Mapping;

var mapper = MapperFactory.CreateMapper<Occurrence>(Occurrence.MapRow);


## Map Archive

Finally, we can open the archive and query rows that we can map using the mapper.

**Hint:** Use the Greedy RowStrategy for better performance when mapping the entire class.

There are three different Map extensions for the IFileReader and IRow interfaces returned by the ArchiveReader.

In [None]:
using DwC_A;
using DwC_A.Factories;
using DwC_A.Config;

var factory = new DefaultFactory((cfg) => {
    cfg.Add<RowFactoryConfiguration>(c => c.Strategy = RowStrategy.Greedy);
});

var archive = new ArchiveReader("./data/Papilionidae.zip", factory);

var occurrences = archive.CoreFile
    .DataRows
    .Where(row => row[Terms.decimalLatitude] != null)
    .Where(row => row[Terms.dateIdentified] != null)
    .Map<Occurrence>(mapper);

occurrences.Select(o => new {
    o.ScientificName,
    o.Latitude,
    o.Longitude,
    o.DateIdentified
})

## Manual Mapping

If you already have a class definition or want to create the class definition by hand then use this method.

Classes may be defined two ways.

1. Defined directly in a cell.
1. Loaded from a file on disk using the #load magic command.

In this instance we'll load the class definition from a file.  After that we can define a mapper and map method.

In [None]:
#load "./Code/Multimedia.cs"

using System;

var multimediaMapper = MapperFactory.CreateMapper<Multimedia>((m, row) => {
    m.GbifID = row.Convert<long>("http://rs.gbif.org/terms/1.0/gbifID");
    m.Type = row["http://purl.org/dc/terms/type"];
    m.Identifier = row["http://purl.org/dc/terms/identifier"];
    m.Created = row.Convert<DateTime>("http://purl.org/dc/terms/created");
});

archive.Extensions
    .GetFileReadersByRowType("http://rs.gbif.org/terms/1.0/Multimedia")
    .FirstOrDefault()?
    .Map<Multimedia>(multimediaMapper)

## Using With Microsoft.ML

Now that we have mapped an IEnumerable of Occurrences we can load the data into an IDataView or DataFrame using MLContext from [Microsoft.ML](https://www.nuget.org/packages/Microsoft.ML/).

In [None]:
using Microsoft.ML;
using Microsoft.ML.Data;

var mlContext = new MLContext();

var data = mlContext.Data.LoadFromEnumerable<Occurrence>(occurrences);

data.Schema