# Using DwC-A_dotnet.Interactive

This notebook describes how to use DwC-A_dotnet and DwC-A_dotnet.Interactive to work with Darwin Core Archive files.

Information on the dotnet libraries used here may be found at 

|Library|Link|
|---|---|
|DwC-A_dotnet|https://github.com/pjoiner/DwC-A_dotnet|
|DwC-A_dotnet.Interactive|https://github.com/pjoiner/DwC-A_dotnet.Interactive|

Information on Darwin Core Archives may be found [here](https://dwc.tdwg.org/).

## Installation

Use the #r magic command to install the libraries from NuGet.

In [1]:
#r "nuget:DwC-A_dotnet,0.6.0"
#r "nuget:DwC-A_dotnet.Interactive,0.1.10-Pre"

Loading extensions from `DwC-A_dotnet.Interactive.dll`

## Open An Archive
Use the `ArchiveReader` class to open the archive and provide the path to your archive.  It is recommended that the archive be unzipped to a directory first to reduce the overhead of creating a temporary folder to unzip the archive.  If you use the zip file remember to dispose of the temporary working directory at the end of your session by calling `archive.Dispose();`

The test data we are using comes from the ["Insects from light trap (1992–2009), rooftop Zoological Museum, Copenhagen"](https://www.gbif.org/dataset/f506be53-9221-4b44-a41d-5aa0905ec216) dataset available for download from [gbif.org](https://www.gbif.org/).

In [2]:
using DwC_A;
using System.IO.Compression;
using System.IO;

var outputPath = "./data/dwca-rooftop-v1.4";
if(Directory.Exists(outputPath)) 
    Directory.Delete(outputPath, true);
ZipFile.ExtractToDirectory("./data/dwca-rooftop-v1.4.zip", outputPath);
var archive = new ArchiveReader(@"./data/dwca-rooftop-v1.4");

## Archive MetaData
The interactive extensions library (`DwC-A_dotnet.Interactive`) registers kernel extensions to display various archive metadata by using the `display()` command or simply entering the object you are interested in at the end of a cell without a semicolon on the end.  For example, to view the metadata for an archive enter `<archiveName>.MetaData` as shown below.  The same can be done for an `IFileReader` instance to get a list of the term metadata for a file.

In [3]:
archive.MetaData

File Type,File Name,Row Type
CoreFile,event.txt,http://rs.tdwg.org/dwc/terms/Event
Extension:,occurrence.txt,http://rs.tdwg.org/dwc/terms/Occurrence
Metadata:,eml.xml,


In [4]:
archive.CoreFile

Index,Name,Term,Vocabulary,Default,Delimiter
0,id,id,<null>,<null>,<null>
1,type,http://purl.org/dc/terms/type,<null>,<null>,<null>
2,license,http://purl.org/dc/terms/license,<null>,<null>,<null>
3,rightsHolder,http://purl.org/dc/terms/rightsHolder,<null>,<null>,<null>
4,ownerInstitutionCode,http://rs.tdwg.org/dwc/terms/ownerInstitutionCode,<null>,<null>,<null>
5,eventID,http://rs.tdwg.org/dwc/terms/eventID,<null>,<null>,<null>
6,samplingProtocol,http://rs.tdwg.org/dwc/terms/samplingProtocol,<null>,<null>,<null>
7,sampleSizeValue,http://rs.tdwg.org/dwc/terms/sampleSizeValue,<null>,<null>,<null>
8,sampleSizeUnit,http://rs.tdwg.org/dwc/terms/sampleSizeUnit,<null>,<null>,<null>
9,samplingEffort,http://rs.tdwg.org/dwc/terms/samplingEffort,<null>,<null>,<null>


In [5]:
archive.Extensions.GetFileReaderByFileName("occurrence.txt")

Index,Name,Term,Vocabulary,Default,Delimiter
0,id,id,<null>,<null>,<null>
1,type,http://purl.org/dc/terms/type,<null>,<null>,<null>
2,license,http://purl.org/dc/terms/license,<null>,<null>,<null>
3,rightsHolder,http://purl.org/dc/terms/rightsHolder,<null>,<null>,<null>
4,institutionCode,http://rs.tdwg.org/dwc/terms/institutionCode,<null>,<null>,<null>
5,ownerInstitutionCode,http://rs.tdwg.org/dwc/terms/ownerInstitutionCode,<null>,<null>,<null>
6,basisOfRecord,http://rs.tdwg.org/dwc/terms/basisOfRecord,<null>,<null>,<null>
7,occurrenceID,http://rs.tdwg.org/dwc/terms/occurrenceID,<null>,<null>,<null>
8,recordedBy,http://rs.tdwg.org/dwc/terms/recordedBy,<null>,<null>,<null>
9,individualCount,http://rs.tdwg.org/dwc/terms/individualCount,<null>,<null>,<null>


## Displaying Data

Data from a file can be displayed using the `DataRows` property of an `IFileReader`.  For example, the first 10 rows of the Core event file from the sample archive can be displayed as follows.

In [6]:
archive.CoreFile.DataRows.Take(50)

id,type,license,rightsHolder,ownerInstitutionCode,eventID,samplingProtocol,sampleSizeValue,sampleSizeUnit,samplingEffort,eventDate,year,month,day,eventRemarks,country,countryCode,locality,decimalLatitude,decimalLongitude,geodeticDatum
urn:zmuc:2006-07-14/2006-07-20,Event,http://creativecommons.org/licenses/by/4.0/legalcode,"Zoological Museum, Natural History Museum of Denmark (ZMUC)",ZMUC,urn:zmuc:2006-07-14/2006-07-20,modified Robinson light trap,6,day,6 trap day(s),2006-07-14/2006-07-20,2006,7,14,"The material sample was collected, and either preserved or destructively processed.",Denmark,DK,"Light trap on rooftop of Zoological Museum, Natural History Museum of Denmark (ZMUC)",55.702512,12.558956,WGS84
urn:zmuc:1993-05-24/1993-05-31,Event,http://creativecommons.org/licenses/by/4.0/legalcode,"Zoological Museum, Natural History Museum of Denmark (ZMUC)",ZMUC,urn:zmuc:1993-05-24/1993-05-31,modified Robinson light trap,7,day,7 trap day(s),1993-05-24/1993-05-31,1993,5,24,"The material sample was collected, and either preserved or destructively processed.",Denmark,DK,"Light trap on rooftop of Zoological Museum, Natural History Museum of Denmark (ZMUC)",55.702512,12.558956,WGS84
urn:zmuc:1997-07-21/1997-07-21,Event,http://creativecommons.org/licenses/by/4.0/legalcode,"Zoological Museum, Natural History Museum of Denmark (ZMUC)",ZMUC,urn:zmuc:1997-07-21/1997-07-21,modified Robinson light trap,0,day,0 trap day(s),1997-07-21/1997-07-21,1997,7,21,"The material sample was collected, and either preserved or destructively processed.",Denmark,DK,"Light trap on rooftop of Zoological Museum, Natural History Museum of Denmark (ZMUC)",55.702512,12.558956,WGS84
urn:zmuc:1998-05-27/1998-06-01,Event,http://creativecommons.org/licenses/by/4.0/legalcode,"Zoological Museum, Natural History Museum of Denmark (ZMUC)",ZMUC,urn:zmuc:1998-05-27/1998-06-01,modified Robinson light trap,5,day,5 trap day(s),1998-05-27/1998-06-01,1998,5,27,"The material sample was collected, and either preserved or destructively processed.",Denmark,DK,"Light trap on rooftop of Zoological Museum, Natural History Museum of Denmark (ZMUC)",55.702512,12.558956,WGS84
urn:zmuc:1998-06-19/1998-06-21,Event,http://creativecommons.org/licenses/by/4.0/legalcode,"Zoological Museum, Natural History Museum of Denmark (ZMUC)",ZMUC,urn:zmuc:1998-06-19/1998-06-21,modified Robinson light trap,2,day,2 trap day(s),1998-06-19/1998-06-21,1998,6,19,"The material sample was collected, and either preserved or destructively processed.",Denmark,DK,"Light trap on rooftop of Zoological Museum, Natural History Museum of Denmark (ZMUC)",55.702512,12.558956,WGS84
urn:zmuc:1999-06-14/1999-06-16,Event,http://creativecommons.org/licenses/by/4.0/legalcode,"Zoological Museum, Natural History Museum of Denmark (ZMUC)",ZMUC,urn:zmuc:1999-06-14/1999-06-16,modified Robinson light trap,2,day,2 trap day(s),1999-06-14/1999-06-16,1999,6,14,"The material sample was collected, and either preserved or destructively processed.",Denmark,DK,"Light trap on rooftop of Zoological Museum, Natural History Museum of Denmark (ZMUC)",55.702512,12.558956,WGS84
urn:zmuc:1999-06-17/1999-06-20,Event,http://creativecommons.org/licenses/by/4.0/legalcode,"Zoological Museum, Natural History Museum of Denmark (ZMUC)",ZMUC,urn:zmuc:1999-06-17/1999-06-20,modified Robinson light trap,3,day,3 trap day(s),1999-06-17/1999-06-20,1999,6,17,"The material sample was collected, and either preserved or destructively processed.",Denmark,DK,"Light trap on rooftop of Zoological Museum, Natural History Museum of Denmark (ZMUC)",55.702512,12.558956,WGS84
urn:zmuc:1999-06-21/1999-06-27,Event,http://creativecommons.org/licenses/by/4.0/legalcode,"Zoological Museum, Natural History Museum of Denmark (ZMUC)",ZMUC,urn:zmuc:1999-06-21/1999-06-27,modified Robinson light trap,6,day,6 trap day(s),1999-06-21/1999-06-27,1999,6,21,"The material sample was collected, and either preserved or destructively processed.",Denmark,DK,"Light trap on rooftop of Zoological Museum, Natural History Museum of Denmark (ZMUC)",55.702512,12.558956,WGS84
urn:zmuc:2001-06-20/2001-06-24,Event,http://creativecommons.org/licenses/by/4.0/legalcode,"Zoological Museum, Natural History Museum of Denmark (ZMUC)",ZMUC,urn:zmuc:2001-06-20/2001-06-24,modified Robinson light trap,4,day,4 trap day(s),2001-06-20/2001-06-24,2001,6,20,"The material sample was collected, and either preserved or destructively processed.",Denmark,DK,"Light trap on rooftop of Zoological Museum, Natural History Museum of Denmark (ZMUC)",55.702512,12.558956,WGS84
urn:zmuc:2002-06-28/2002-07-03,Event,http://creativecommons.org/licenses/by/4.0/legalcode,"Zoological Museum, Natural History Museum of Denmark (ZMUC)",ZMUC,urn:zmuc:2002-06-28/2002-07-03,modified Robinson light trap,5,day,5 trap day(s),2002-06-28/2002-07-03,2002,6,28,"The material sample was collected, and either preserved or destructively processed.",Denmark,DK,"Light trap on rooftop of Zoological Museum, Natural History Museum of Denmark (ZMUC)",55.702512,12.558956,WGS84


## Accessing Individual Fields

The DataRows property of a FileReader can be enumerated using a `foreach` loop or LinQ queries.  The individual fields of each row can be accessed by using an index or the name of the term associated with the field or column.

Use the Terms class of the `DwC_A.Terms` namespace as a shortcut to typing in the fully qualified name of the term.

In [None]:
using DwC_A.Terms;

foreach(var row in archive.CoreFile.DataRows.Take(1))
{
    Console.Write($"type: {row[1]}\t"); //Use the index value to get the type column
    Console.Write($"EventID: {row["http://rs.tdwg.org/dwc/terms/eventID"]}\t"); //USe the fully qualified name of the term
    Console.WriteLine($"Event Date: {row[Terms.eventDate]}"); //Use the Terms class
}

## The Terms Command

Use the `#!terms` magic command to list the available terms and a brief explanation of their use.

In [7]:
#!terms

Name,Term,Description
acceptedNameUsage,http://rs.tdwg.org/dwc/terms/acceptedNameUsage,"The full name, with authorship and date information if known, of the currently valid (zoological) or accepted (botanical) taxon."
acceptedNameUsageID,http://rs.tdwg.org/dwc/terms/acceptedNameUsageID,An identifier for the name usage (documented meaning of the name according to a source) of the currently valid (zoological) or accepted (botanical) taxon.
accessRights,http://purl.org/dc/terms/accessRights,Information about who can access the resource or an indication of its security status.
associatedMedia,http://rs.tdwg.org/dwc/terms/associatedMedia,"A list (concatenated and separated) of identifiers (publication, global unique identifier, URI) of media associated with the Occurrence."
associatedOccurrences,http://rs.tdwg.org/dwc/terms/associatedOccurrences,A list (concatenated and separated) of identifiers of other Occurrence records and their associations to this Occurrence.
associatedOrganisms,http://rs.tdwg.org/dwc/terms/associatedOrganisms,A list (concatenated and separated) of identifiers of other Organisms and the associations of this Organism to each of them.
associatedReferences,http://rs.tdwg.org/dwc/terms/associatedReferences,"A list (concatenated and separated) of identifiers (publication, bibliographic reference, global unique identifier, URI) of literature associated with the Occurrence."
associatedSequences,http://rs.tdwg.org/dwc/terms/associatedSequences,"A list (concatenated and separated) of identifiers (publication, global unique identifier, URI) of genetic sequence information associated with the Occurrence."
associatedTaxa,http://rs.tdwg.org/dwc/terms/associatedTaxa,A list (concatenated and separated) of identifiers or names of taxa and the associations of this Occurrence to each of them.
basisOfRecord,http://rs.tdwg.org/dwc/terms/basisOfRecord,The specific nature of the data record.


## Query Data Using LinQ

The following cell uses LinQ to gather a list of total individual counts of each genus for a specific sampling event.  Change the number in the `.Skip(1)` line to see totals calculated for other events. 

In [8]:
using DwC_A.Terms;

//Retrieve the eventID from the event data file
var eventID = archive.CoreFile.DataRows
    .Skip(5)  //Change this number and run the cell again and to see the data for a new eventID
    .Take(1)
    .First()[Terms.eventID];

//Get an IFileReader for the occurrence data file
var occurrences = archive.Extensions.GetFileReaderByFileName("occurrence.txt");

var data = occurrences.DataRows
    .Where(n => n[Terms.eventID] == eventID)
    .GroupBy(n => n[Terms.genus])
    .Select(g => new{
        Genus = g.Key,
        Count = g.Sum(c => int.Parse(c[Terms.individualCount])) 
    }); 

data

index,Genus,Count
0,Abrostola,1
1,Acronicta,1
2,Agrotis,28
3,Apamea,3
4,Aphomia,1
5,Aplocera,1
6,Argyresthia,7
7,Biston,2
8,Blastodacna,1
9,Borkhausenia,2
