# Create Archive

This notebook demonstrates creating an archive from existing data.  The source data comes from some sample observations on iNaturalist that is stored in the json file [./data/observations.json](./data/observations.json).

We will create an occurrence.txt file and multimedia.txt extension file with links to images and sound recordings.

To start with import the required libraries.

In [None]:
#r "nuget:DwC-A_dotnet, 0.8.0"
#r "nuget:DwC-A_dotnet.Interactive, 0.1.11-Pre"

### Define Field MetaData

First we'll define the field metadata for the core file, occurrence.txt, and add terms for each of the columns in the file.  Use the `#!terms` magic command to view a list of the default [Darwin Core terms](https://dwc.tdwg.org/terms/).

* Use `AutomaticallyIndex` to automatically number the terms in the order added.
* For non-indexed terms, such as, default values define them before the call to `AutomaticallyIndex`.
* The last term `otherCatalogNumbers` is a list of additional catalog numbers delimited with the `|` character.

In [None]:
using DwC_A;
using DwC_A.Builders;
using DwC_A.Meta;
using DwC_A.Terms;

var fieldMetaDataBuilder = FieldsMetaDataBuilder.Fields()
    .AddField(f => f.Term(Terms.geodeticDatum).Default("WGS84"))
    .AutomaticallyIndex()
    .AddField(f => f.Term("id"))
    .AddField(f => f.Term(Terms.dateIdentified))
    .AddField(f => f.Term(Terms.recordedBy))
    .AddField(f => f.Term(Terms.decimalLatitude))
    .AddField(f => f.Term(Terms.decimalLongitude))
    .AddField(f => f.Term(Terms.license))
    .AddField(f => f.Term(Terms.kingdom))
    .AddField(f => f.Term(Terms.phylum))
    .AddField(f => f.Term(Terms.@class))
    .AddField(f => f.Term(Terms.order))
    .AddField(f => f.Term(Terms.genus))
    .AddField(f => f.Term(Terms.specificEpithet))
    .AddField(f => f.Term(Terms.scientificName))
    .AddField(f => f.Term(Terms.otherCatalogNumbers).Delimiter("|"));

Terms that are not included in the [Darwin Core quick reference](https://dwc.tdwg.org/terms/), such as the identifier below can be added manually.

In [None]:
var identifier = "http://purl.org/dc/terms/identifier";

var multiMediaMetaDataBuilder = FieldsMetaDataBuilder.Fields()
    .AutomaticallyIndex()
    .AddField(f => f.Term("id"))
    .AddField(f => f.Term(Terms.type))
    .AddField(f => f.Term(identifier))
    .AddField(f => f.Term(Terms.references));

### Define File MetaData

The next step is to define file metadata.  This includes the name, delimiter and fields etc.  Unless specified as in the occurrence.txt below the default format is as follows.

* FieldsTerminatedBy (delimiter) - comma.
* FieldsEnclosedBy (quote character) - ".
* LinesTerminatedBy (newlines) - \n.
* IgnoreHeaderLines - 0.
* Encoding - UTF8.


In [None]:
var fileMetaData = CoreFileMetaDataBuilder.File("occurrence.txt")
    .FieldsEnclosedBy("\"")
    .FieldsTerminatedBy("\\t")
    .LinesTerminatedBy("\\n")
    .IgnoreHeaderLines(1)
    .Encoding(Encoding.UTF8)
    .Index(0)
    .RowType(RowTypes.Occurrence)
    .AddFields(fieldMetaDataBuilder);

var multiMediaFileMetaData = ExtensionFileMetaDataBuilder.File("multimedia.txt")
    .CoreIndex(0)
    .RowType(RowTypes.Identification)
    .AddFields(multiMediaMetaDataBuilder);

Data is read from the file observations.json file for this example but this could be a database connection or other data source.

In [None]:
using System.Text.Json;
using System.IO;
var json = "./data/observations.json";
var doc = JsonDocument.Parse(File.ReadAllText(json));

### Write Data Files

The next step is to write the data files that will be included in the archive.  In this example we are creating a BuilderContext to view the completed files.  If the BuilderContext is not specified then a temp directory is used.

To build the rows for the file create a delegate that accepts a RowBuilder and returns the built row.  Fields should be added in the order that they were defined.

In [None]:
//Add a builder context so files are written under a subdirectory here
var context = new BuilderContext("./MyObservations", false);

var fileBuilder = FileBuilder.MetaData(fileMetaData)
    .Context(context)
    .BuildRows(rowBuilder => BuildCoreRows(rowBuilder));

IEnumerable<string> BuildCoreRows(RowBuilder rowBuilder)
{
    foreach(var node in doc.RootElement.EnumerateArray())
    {
        yield return rowBuilder.AddField(node.GetProperty("id"))
            .AddField(node.GetProperty("time_observed_at"))
            .AddField(node.GetProperty("user_name"))
            .AddField(node.GetProperty("latitude"))
            .AddField(node.GetProperty("longitude"))
            .AddField(node.GetProperty("license"))
            .AddField(node.GetProperty("taxon_kingdom_name"))
            .AddField(node.GetProperty("taxon_phylum_name"))
            .AddField(node.GetProperty("taxon_class_name"))
            .AddField(node.GetProperty("taxon_order_name"))
            .AddField(node.GetProperty("taxon_genus_name"))
            .AddField(node.GetProperty("taxon_species_name"))
            .AddField(node.GetProperty("scientific_name"))
            .AddField(node.GetProperty("catalogNumber"))
            .Build();
    }
}

Here we are doing the same for the multimedia.txt file.

In [None]:
var multiMediaFileBuilder = FileBuilder.MetaData(multiMediaFileMetaData)
    .Context(context)
    .BuildRows(rowBuilder => BuildMultiMediaRows(rowBuilder));

public static string NullIfEmpty(this string s)
{
    return string.IsNullOrEmpty(s) ? null : s;
}

IEnumerable<string> BuildMultiMediaRows(RowBuilder rowBuilder)
{
    var mediaRows = doc.RootElement
        .EnumerateArray()
        .Select(n => new {
            id = n.GetProperty("id").ToString(),
            image = n.GetProperty("image_url").ToString(),
            sound = n.GetProperty("sound_url").ToString(),
            url = n.GetProperty("url").ToString()
        });

    foreach(var row in mediaRows)
    {
        var mediaUrl = row.image.NullIfEmpty() ?? row.sound;
        var type = row.image == "" ? "SoundRecording" : "StillImage";
        yield return rowBuilder.AddField(row.id)
            .AddField(type)
            .AddField(mediaUrl)
            .AddField(row.url)
            .Build();
    }
}

### Write Archive

Finally, we write the archive using the ArchiveWriter.  Additional files, such as, license or eml.xml may be added here using the `AddExistingFile` method.

In [None]:
using DwC_A.Writers;

ArchiveWriter.CoreFile(fileBuilder, fileMetaData)
    .AddExtensionFile(multiMediaFileBuilder, multiMediaFileMetaData)
    .AddExtraFile("./data/notes.txt")
    .Context(context)
    .Build("MyObservations.zip");

The above steps should have created an archive named MyObservations.zip.  To check our work we can open the archive and view the data and metadata.

In [None]:
var archive = new ArchiveReader("./MyObservations.zip");
archive.Display();
archive.CoreFile.Display();
archive.CoreFile.DataRows.Display();
archive.Extensions.GetFileReaderByFileName("multimedia.txt").Display();

var multimedia = archive.Extensions.GetFileReaderByFileName("multimedia.txt");
multimedia.DataRows.Display();
