# Features

This notebook will cover some of the features of the DwC-A_dotnet library.  For more documentation see the [DwC-A_dotnet Wiki](https://github.com/pjoiner/DwC-A_dotnet/wiki).

First we download the libraries.

In [None]:
#r "nuget: DwC-A_dotnet, 0.7.0"
#r "nuget: DwC-A_dotnet.Interactive, 0.1.10-Pre"

## Configuration

There are a few options for tuning the behavior and performance of the DwC-A_dotnet library that can be set through configuration.  This configuration can be injected into the ArchiveReader class using the DefaultFactory as follows.  For more information [see the Configuration documentation](https://github.com/pjoiner/DwC-A_dotnet/wiki/Configuration)

Here we are extracting the data/Papilionidae.zip archive into the Papilionidae folder for work.

The Papilionidae.zip archive is a dataset derived from a gbif query of butterflies of the family *Papilionidae*.  This archive contains occurrence and multimedia data which also includes image links.

For more information on this dataset see [https://doi.org/10.15468/dl.zwsssf](https://doi.org/10.15468/dl.zwsssf).

In [None]:
using DwC_A;
using DwC_A.Factories;
using DwC_A.Config;

var archivePath = "./data/Papilionidae.zip";
var factory = new DefaultFactory(cfg => {
    cfg.Add<ArchiveFolderConfiguration>(folderConfig => {
        folderConfig.OutputPath = "./Papilionidae";
    });
});
var archive = new ArchiveReader(archivePath, factory);
$"Archive extracted to path {archive.OutputPath}".Display();
archive.MetaData.Display();

## Data Conversion

Several extensions are available to convert values from data fields from string to any type that has a TypeConverter.  Here we are converting longitude and latitude to double values for further analysis.  Also, for delimited fields we can use the GetListOf or TryGetListOf extensions to return a list of strings in that field.  For more information on data conversion [see the wiki](https://github.com/pjoiner/DwC-A_dotnet/wiki/Type-Conversion).

In [None]:
using DwC_A.Terms;
using DwC_A.Extensions;

var occurrence = archive.CoreFile;
var gbifIDTerm = "http://rs.gbif.org/terms/1.0/gbifID";
var gbifIssue = "http://rs.gbif.org/terms/1.0/issue";

occurrence.DataRows.Take(10)
    .Where(o => o[gbifIssue] is not null)
    .Select(o => new {
        gbifId = o[gbifIDTerm],
        species = o[Terms.scientificName],
        latitude = o.Convert<double>(Terms.decimalLatitude),
        longitude = o.Convert<double>(Terms.decimalLongitude),
        date = o.Convert<DateTime>(Terms.eventDate),
        issues = o.GetListOf(gbifIssue)
    })

## Async FileReaders

Files can be read asynchronously using the async FileReaders returned by the GetAsyncCoreFile and Extensions.GetAsyncFileReaderXXX methods of the ArchiveReader.  Use ToArrayAsync() as shown below to use LinQ queries.  For more information on [asynchronous FileReader see the wiki](https://github.com/pjoiner/DwC-A_dotnet/wiki/Using-the-AsyncFileReaders).

In [None]:
var occurrenceAsync = archive.GetAsyncCoreFile();
await occurrenceAsync.GetDataRowsAsync()
    .Take(10)
    .Where(o => o[gbifIssue] is not null)
    .Select(o => new {
        gbifId = o[gbifIDTerm],
        species = o[Terms.scientificName],
        latitude = o.Convert<double>(Terms.decimalLatitude),
        longitude = o.Convert<double>(Terms.decimalLongitude),
        date = o.Convert<DateTime>(Terms.eventDate),
        issues = o.GetListOf(gbifIssue)
    }).ToArrayAsync()