# Mapping

Using the [DwC-A_dotnet.Mapping](https://www.nuget.org/packages/DwC-A_dotnet.Mapping/) library we can map data from a [DwC-A_dotnet](https://www.nuget.org/packages/DwC-A_dotnet/) IRow to a strongly typed class.  There are two different approaches to mapping data

1. Using the dwca-codegen magic command to generate a class and mapping method from archive metadata.  This requires the least code and is more interactive but can run into more issues.
1. Manually create a class definition and mapping method.  This gives the most control over mapping but requires more effort. 

In [1]:
#r "nuget:DwC-A_dotnet.Interactive,0.1.12-Pre"
#r "nuget:DwC-A_dotnet.Mapping,0.6.3"
#r "nuget:Microsoft.ML"

Loading extensions from `DwC-A_dotnet.Interactive.dll`

## Using The dwca-codegen Magic Command

First, we will map using the dwca-codegen magic command.

### Create Configuration

This step is optional but gives more control over how classes are generated and mapped.  If this step is left out all properties will be mapped as string type.

Use the GeneratorConfigurationBuilder to create a configuration to influence how the dwca-codegen command generates the class to be mapped for each file in the archive.

Use the AppProperty method to define the properties that will be added to the generated classes and how they will be mapped to specific terms.  Use the wildcard term __*__ to determine whether properties will be created for all other undefined terms and mapped.

The WithMapMethod creates a static method on the type called MapRow that will be used to map an IRow row to an instance of the generated class.

In [2]:
using DwC_A.Interactive.Mapping;
using DwC_A.Terms;

var config = new GeneratorConfigurationBuilder()
.AddProperty("*", "string", true)
.AddProperty(Terms.decimalLatitude, "double", true, "Latitude")
.AddProperty(Terms.decimalLongitude, "double", true, "Longitude")
.AddProperty(Terms.dateIdentified, "DateTime")
.WithMapMethod(true)
.Build();

config

Option,Value
Namespace,
MapMethod,True
Output,
PascalCase,True
TermAttribute,none
Usings,SystemDwC_ADwC_A.Extensions

Name,Type,Include,Term
<null>,string,True,*
Latitude,double,True,http://rs.tdwg.org/dwc/terms/decimalLatitude
Longitude,double,True,http://rs.tdwg.org/dwc/terms/decimalLongitude
<null>,DateTime,True,http://rs.tdwg.org/dwc/terms/dateIdentified


### dwca-codegen

Use the dwca-codegen command to examine the archive and generate classes to map data into.  The --configName option can be used to specify the name of the variable that contains the configuration information we created earlier. 

In [3]:
#!dwca-codegen -h
#!dwca-codegen -c config ./data/Papilionidae.zip

Description:
  Generate strongly typed class files for Darwin Core Archive

Usage:
  #!dwca-codegen <archivePath> [options]

Arguments:
  <archivePath>  Path to archive folder or zip file

Options:
  -c, --configName <configName>  Name of configuration variable []
  -?, -h, --help                 Show help and usage information





Option,Value
Namespace,
MapMethod,True
Output,
PascalCase,True
TermAttribute,none
Usings,SystemDwC_ADwC_A.Extensions

Name,Type,Include,Term
<null>,string,True,*
Latitude,double,True,http://rs.tdwg.org/dwc/terms/decimalLatitude
Longitude,double,True,http://rs.tdwg.org/dwc/terms/decimalLongitude
<null>,DateTime,True,http://rs.tdwg.org/dwc/terms/dateIdentified


## Create Mapper

Now that we have class definitions and a mapping method we can define a mapper as follows.

In [4]:
using DwC_A.Mapping;

var mapper = MapperFactory.CreateMapper<Occurrence>(Occurrence.MapRow);


## Map Archive

Finally, we can open the archive and query rows that we can map using the mapper.

**Hint:** Use the Greedy RowStrategy for better performance when mapping the entire class.

There are three different Map extensions for the IFileReader and IRow interfaces returned by the ArchiveReader.

In [5]:
using DwC_A;
using DwC_A.Factories;
using DwC_A.Config;

var factory = new DefaultFactory((cfg) => {
    cfg.Add<RowFactoryConfiguration>(c => c.Strategy = RowStrategy.Greedy);
});

var archive = new ArchiveReader("./data/Papilionidae.zip", factory);

var occurrences = archive.CoreFile
    .DataRows
    .Where(row => row[Terms.decimalLatitude] != null)
    .Where(row => row[Terms.dateIdentified] != null)
    .Map<Occurrence>(mapper);

occurrences.Select(o => new {
    o.ScientificName,
    o.Latitude,
    o.Longitude,
    o.DateIdentified
})

index,ScientificName,Latitude,Longitude,DateIdentified
0,"Battus philenor (Linnaues, 1771)",32.996571,-97.148685,2021-02-23 02:11:11Z
1,"Eurytides marcellus (Cramer, 1777)",32.754545,-94.483826,2020-09-25 21:31:00Z
2,"Battus philenor (Linnaues, 1771)",29.824338,-104.307482,2021-02-19 04:44:44Z
3,"Battus philenor (Linnaues, 1771)",31.590366,-98.927223,2021-02-23 18:21:20Z
4,"Papilio polibetes Stoll, 1781",33.737175,-96.576508,2021-02-21 03:35:16Z
5,"Papilio glaucus Linnaeus, 1758",30.693989,-97.822322,2021-02-25 02:22:19Z
6,"Papilio palamedes Drury, 1773",28.244524,-96.856559,2021-02-25 03:41:59Z
7,"Battus philenor (Linnaues, 1771)",29.386342,-95.011422,2021-02-25 18:42:07Z
8,"Papilio cresphontes Cramer, 1777",30.924697,-94.003875,2020-08-25 21:01:29Z
9,"Papilio multicaudata Kirby, 1884",29.552946,-98.233367,2021-02-12 15:13:22Z


## Manual Mapping

If you already have a class definition or want to create the class definition by hand then use this method.

Classes may be defined two ways.

1. Defined directly in a cell.
1. Loaded from a file on disk using the #load magic command.

In this instance we'll load the class definition from a file.  After that we can define a mapper and map method.

In [6]:
#load "./Code/Multimedia.cs"

using System;

var multimediaMapper = MapperFactory.CreateMapper<Multimedia>((m, row) => {
    m.GbifID = row.Convert<long>("http://rs.gbif.org/terms/1.0/gbifID");
    m.Type = row["http://purl.org/dc/terms/type"];
    m.Identifier = row["http://purl.org/dc/terms/identifier"];
    m.Created = row.Convert<DateTime>("http://purl.org/dc/terms/created");
});

archive.Extensions
    .GetFileReadersByRowType("http://rs.gbif.org/terms/1.0/Multimedia")
    .FirstOrDefault()?
    .Map<Multimedia>(multimediaMapper)

index,GbifID,Type,Identifier,Created,Creator
0,3044911996,StillImage,https://inaturalist-open-data.s3.amazonaws.com/photos/113743750/original.jpg?1614044752,2020-09-14 12:35:15Z,<null>
1,3044896703,StillImage,https://inaturalist-open-data.s3.amazonaws.com/photos/113394978/original.jpg?1613765341,2020-04-06 19:36:03Z,<null>
2,3044892607,StillImage,https://inaturalist-open-data.s3.amazonaws.com/photos/113344109/original.jpeg?1613707791,2006-04-30 11:47:01Z,<null>
3,3044876974,StillImage,https://inaturalist-open-data.s3.amazonaws.com/photos/113796898/original.jpg?1614103220,2003-06-25 11:55:00Z,<null>
4,3044875694,StillImage,https://inaturalist-open-data.s3.amazonaws.com/photos/113535495/original.jpg?1613878613,2020-10-09 06:45:42Z,<null>
5,3044875210,StillImage,https://inaturalist-open-data.s3.amazonaws.com/photos/113937636/original.jpg?1614218182,2015-05-09 12:53:30Z,<null>
6,3044864241,StillImage,https://inaturalist-open-data.s3.amazonaws.com/photos/113945923/original.jpg?1614224485,2016-04-09 12:37:20Z,<null>
7,3044857233,StillImage,https://inaturalist-open-data.s3.amazonaws.com/photos/113993610/original.jpg?1614278545,2021-02-25 08:55:07Z,<null>
8,3044857233,StillImage,https://inaturalist-open-data.s3.amazonaws.com/photos/113993597/original.jpg?1614278538,2021-02-25 08:55:07Z,<null>
9,3044837653,StillImage,https://static.inaturalist.org/photos/91859858/original.jpg?1598389359,2020-08-13 07:22:56Z,<null>


## Using With Microsoft.ML

Now that we have mapped an IEnumerable of Occurrences we can load the data into an IDataView or DataFrame using MLContext from [Microsoft.ML](https://www.nuget.org/packages/Microsoft.ML/).

In [7]:
using Microsoft.ML;
using Microsoft.ML.Data;

var mlContext = new MLContext();

var data = mlContext.Data.LoadFromEnumerable<Occurrence>(occurrences);

data.Schema

index,Name,Index,IsHidden,Type,Annotations
RawType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Schema,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
RawType,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3
Schema,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4
RawType,Unnamed: 1_level_5,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5
Schema,Unnamed: 1_level_6,Unnamed: 2_level_6,Unnamed: 3_level_6,Unnamed: 4_level_6,Unnamed: 5_level_6
RawType,Unnamed: 1_level_7,Unnamed: 2_level_7,Unnamed: 3_level_7,Unnamed: 4_level_7,Unnamed: 5_level_7
Schema,Unnamed: 1_level_8,Unnamed: 2_level_8,Unnamed: 3_level_8,Unnamed: 4_level_8,Unnamed: 5_level_8
RawType,Unnamed: 1_level_9,Unnamed: 2_level_9,Unnamed: 3_level_9,Unnamed: 4_level_9,Unnamed: 5_level_9
Schema,Unnamed: 1_level_10,Unnamed: 2_level_10,Unnamed: 3_level_10,Unnamed: 4_level_10,Unnamed: 5_level_10
RawType,Unnamed: 1_level_11,Unnamed: 2_level_11,Unnamed: 3_level_11,Unnamed: 4_level_11,Unnamed: 5_level_11
Schema,Unnamed: 1_level_12,Unnamed: 2_level_12,Unnamed: 3_level_12,Unnamed: 4_level_12,Unnamed: 5_level_12
RawType,Unnamed: 1_level_13,Unnamed: 2_level_13,Unnamed: 3_level_13,Unnamed: 4_level_13,Unnamed: 5_level_13
Schema,Unnamed: 1_level_14,Unnamed: 2_level_14,Unnamed: 3_level_14,Unnamed: 4_level_14,Unnamed: 5_level_14
RawType,Unnamed: 1_level_15,Unnamed: 2_level_15,Unnamed: 3_level_15,Unnamed: 4_level_15,Unnamed: 5_level_15
Schema,Unnamed: 1_level_16,Unnamed: 2_level_16,Unnamed: 3_level_16,Unnamed: 4_level_16,Unnamed: 5_level_16
RawType,Unnamed: 1_level_17,Unnamed: 2_level_17,Unnamed: 3_level_17,Unnamed: 4_level_17,Unnamed: 5_level_17
Schema,Unnamed: 1_level_18,Unnamed: 2_level_18,Unnamed: 3_level_18,Unnamed: 4_level_18,Unnamed: 5_level_18
RawType,Unnamed: 1_level_19,Unnamed: 2_level_19,Unnamed: 3_level_19,Unnamed: 4_level_19,Unnamed: 5_level_19
Schema,Unnamed: 1_level_20,Unnamed: 2_level_20,Unnamed: 3_level_20,Unnamed: 4_level_20,Unnamed: 5_level_20
RawType,Unnamed: 1_level_21,Unnamed: 2_level_21,Unnamed: 3_level_21,Unnamed: 4_level_21,Unnamed: 5_level_21
Schema,Unnamed: 1_level_22,Unnamed: 2_level_22,Unnamed: 3_level_22,Unnamed: 4_level_22,Unnamed: 5_level_22
RawType,Unnamed: 1_level_23,Unnamed: 2_level_23,Unnamed: 3_level_23,Unnamed: 4_level_23,Unnamed: 5_level_23
Schema,Unnamed: 1_level_24,Unnamed: 2_level_24,Unnamed: 3_level_24,Unnamed: 4_level_24,Unnamed: 5_level_24
RawType,Unnamed: 1_level_25,Unnamed: 2_level_25,Unnamed: 3_level_25,Unnamed: 4_level_25,Unnamed: 5_level_25
Schema,Unnamed: 1_level_26,Unnamed: 2_level_26,Unnamed: 3_level_26,Unnamed: 4_level_26,Unnamed: 5_level_26
RawType,Unnamed: 1_level_27,Unnamed: 2_level_27,Unnamed: 3_level_27,Unnamed: 4_level_27,Unnamed: 5_level_27
Schema,Unnamed: 1_level_28,Unnamed: 2_level_28,Unnamed: 3_level_28,Unnamed: 4_level_28,Unnamed: 5_level_28
RawType,Unnamed: 1_level_29,Unnamed: 2_level_29,Unnamed: 3_level_29,Unnamed: 4_level_29,Unnamed: 5_level_29
Schema,Unnamed: 1_level_30,Unnamed: 2_level_30,Unnamed: 3_level_30,Unnamed: 4_level_30,Unnamed: 5_level_30
RawType,Unnamed: 1_level_31,Unnamed: 2_level_31,Unnamed: 3_level_31,Unnamed: 4_level_31,Unnamed: 5_level_31
Schema,Unnamed: 1_level_32,Unnamed: 2_level_32,Unnamed: 3_level_32,Unnamed: 4_level_32,Unnamed: 5_level_32
RawType,Unnamed: 1_level_33,Unnamed: 2_level_33,Unnamed: 3_level_33,Unnamed: 4_level_33,Unnamed: 5_level_33
Schema,Unnamed: 1_level_34,Unnamed: 2_level_34,Unnamed: 3_level_34,Unnamed: 4_level_34,Unnamed: 5_level_34
RawType,Unnamed: 1_level_35,Unnamed: 2_level_35,Unnamed: 3_level_35,Unnamed: 4_level_35,Unnamed: 5_level_35
Schema,Unnamed: 1_level_36,Unnamed: 2_level_36,Unnamed: 3_level_36,Unnamed: 4_level_36,Unnamed: 5_level_36
RawType,Unnamed: 1_level_37,Unnamed: 2_level_37,Unnamed: 3_level_37,Unnamed: 4_level_37,Unnamed: 5_level_37
Schema,Unnamed: 1_level_38,Unnamed: 2_level_38,Unnamed: 3_level_38,Unnamed: 4_level_38,Unnamed: 5_level_38
RawType,Unnamed: 1_level_39,Unnamed: 2_level_39,Unnamed: 3_level_39,Unnamed: 4_level_39,Unnamed: 5_level_39
Schema,Unnamed: 1_level_40,Unnamed: 2_level_40,Unnamed: 3_level_40,Unnamed: 4_level_40,Unnamed: 5_level_40
0,GbifID,0,False,RawTypeSystem.ReadOnlyMemory<System.Char>,Schema[ ]
RawType,,,,,
System.ReadOnlyMemory<System.Char>,,,,,
Schema,,,,,
[ ],,,,,
1,Abstract,1,False,RawTypeSystem.ReadOnlyMemory<System.Char>,Schema[ ]
RawType,,,,,
System.ReadOnlyMemory<System.Char>,,,,,
Schema,,,,,
[ ],,,,,

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]

RawType
System.ReadOnlyMemory<System.Char>

Schema
[ ]
