# NIDM: A Summary in many parts

## Part 1: Introduction to NI-DM

[Latest version][notebook] | [Latest slideshow][slideshow]

[notebook]: http://nbviewer.ipython.org/urls/raw.github.com/ni-/notebooks/master/NIDMIntro.ipynb
[slideshow]: http://slideviewer.herokuapp.com/url/raw.github.com/ni-/notebooks/master/NIDMIntro.ipynb

(If viewing as slides, hit Esc to get an overview)

## Summary

This document provides an overview of the neuroimaging data model (NI-DM). It describes what led to the development of NI-DM, how it was derived from the W3C Provenance Data Model, and how it relates to other efforts to create metadata and other standards in brain imaging. In particular we note NI-DM's utility in context of other Semantic Web technologies. This document is a part of a collection of notebooks, which highlight the current work related to NI-DM and provide a tutorial introduction to creating and utilizing NI-DM applications.

## Outline

- Terms used in this document
- What is [PROV-DM](http://www.w3.org/TR/prov-dm/) and [NI-DM](http://nidm.nidash.org)?
- Why NI-DM?
- Object models in NI-DM?
- NI-DM, XCEDE, LONI,  and related efforts
    - Mapping [XCEDE](http://www.xcede.org/) "experiment/research" primitives to NI-DM
    - LONI data and pipeline provenance
    - Dataset schema
- Should there be a minimal metadata standard?
- Recommended reading

## Terms used in this document

### data model

A data model is an abstract conceptual formulation of information that explictly determines the structure of data and allows software and people to communicate and interpret data precisely. [source](http://en.wikipedia.org/wiki/Data_model)

### provenance

Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.

## What is PROV-DM?

[PROV-DM](http://www.w3.org/TR/prov-dm/) is the conceptual data model that forms a basis for the W3C provenance (PROV) [family of specifications](http://www.w3.org/2011/prov/wiki/WorkingDrafts).

PROV-DM is organized in six components, respectively dealing with: (1) [entities and activities](http://www.w3.org/TR/prov-dm/#section-entity-activity), and the time at which they were created, used, or ended; (2) [derivations](http://www.w3.org/TR/prov-dm/#section-derivation) of entities from entities; (3) [agents](http://www.w3.org/TR/prov-dm/#section-agents-attribution-association-delegation) bearing responsibility for entities that were generated and activities that happened; (4) a notion of [bundle](http://www.w3.org/TR/prov-dm/#section-provenance-of-provnance), a mechanism to support provenance of provenance; (5) [properties to link entities](http://www.w3.org/TR/prov-dm/#section-prov-extended-mechanisms) that refer to the same thing; and, (6) [collections](http://www.w3.org/TR/prov-dm/#section-collections) forming a logical structure for its members [Source: PROV-DM](http://www.w3.org/TR/prov-dm/).

## What is NI-DM?

NI-DM is formulated as a domain specific extension of PROV-DM to specifically and simultaneously address data and provenance in neuroimaging. Currently, NI-DM maps identically to PROV-DM and domain specific extensions are captured as object models on top of PROV-DM.

PROV-DM (and therefore NI-DM) provides a generic basis that captures relationships associated creation and modification of entities by activities and agents. 

### Basic Provenance data model

<img src="http://www.w3.org/TR/prov-o/diagrams/starting-points.svg" />

### Extended provenance model

<img src="http://www.w3.org/TR/2013/PR-prov-o-20130312/diagrams/expanded.svg" />


## Why NI-DM

NI-DM builds on the work done by the neuroimaging and neuroscience community on standards ([NIFTI1][nifti], [BIRN][birn] and [LONI][loni] - data and provenance, [MINC2][minc2] metadata, [Neurolex][nif], [Neurolog][neurolog]), Databases ([HID][hid], [XNAT][xnat]) and [Semantic Web and Linked Data][semweb] technologies. NI-DM is not a single isolated technology or effort.

[nifti]: http://nifti.nimh.nih.gov/nifti-1
[birn]: http://www.birncommunity.org/working-with-birn/working-groups/
[loni]: http://www.loni.ucla.edu
[minc2]: http://en.wikibooks.org/wiki/MINC/Reference/MINC2.0_File_Format_Reference
[nif]: http://neurolex.org/wiki/Main_Page
[neurolog]: http://neurolog.i3s.unice.fr/neurolog
[hid]: http://www.birncommunity.org/tools-catalog/human-imaging-database-hid/
[xnat]: http://xnat.org
[semweb]: http://www.cambridgesemantics.com/semantic-university

At this point NI-DM is a community effort led by the [INCF standards for datasharing taskforce][taskforce].

[taskforce]: http://www.incf.org/programs/datasharing/neuroimaging-task-force

NI-DM features:

- Technology agnostic representation of information (can be translated to RDF/XML/JSON)
- Machine-accessible structured representation of data
- Federated queries using [SPARQL][sparql] when represented as RDF
- Trust via provenance
- Provides a vocabulary for encoding brain imaging data

[sparql]: http://www.w3.org/TR/rdf-sparql-query/

### What are the advantages of using NI-DM

- Provenance is not an afterthought
- Rooted in formal terminology
- Derived from PROV-DM, a stable W3C recommendation
- Captures data and metadata (about entities, **activities** and agents) within the same context
- Simplifies app development
- Can be represented as RDF allowing query federation
- Many different types of databases can be used to represent the data
- Doesn't require specialized tools when publishing on the web

### What are the disadvantages of using NI-DM

- A new conceptual framework for brain imagers and database developers
- Requires extensive terminology
- Current databases are not built for provenance
- Current tools are at an infancy
- The transition will require resources, commitment and time

### Capturing entity, activity, agent relations via NI-DM

<img src="https://raw.github.com/satra/nidm-notebooks/77315de422b78f60dcf005f47307361b45ec63c0/figures/nidmexample.png" />

## What is an object model?

An object model represents a collection "through which a program can examine and manipulate some specific parts of its world."

## What are object models in NI-DM?

In the context of NI-DM, object models capture specific relationships between [entities][entity] via [collections][collection] that reflect organized information derived from:

- binary imaging files (e.g., DICOM, Nifti, MINC)
- directory structures (e.g., Freesurfer, OpenFMRI)
- phenotypic data (e.g., neuropsych assessments, csv files)
- binary or text files (e.g., SPM.mat, Feat.fsf, aseg.stats). 

The models are associated with appropriate provenance.

[entity]: http://www.w3.org/TR/prov-o/#Entity
[collection]: http://www.w3.org/TR/prov-o/#Collections


## Using the PROV API to construct object models

We will demonstrate in separate notebooks how to encapsulate FreeSurfer directory structures and statistic files, and CSV files using the PROV API. Updated notebooks will be posted here.

## NI-DM, XCEDE, LONI and related efforts

Many prior efforts have targeted representing structured information. Here we highlight a few different components.

- [XCEDE][xcede] is a schema for capturing rich description associated with a 
- [LONI data][lonidata] and [pipeline provenance][lonipipe] schemas capture information about data and workflow provenance
- Dataset schemas (e.g., [Dataset][dataset], [Void][void]) are generic schemas to provide some information about  arbitrary datasets on the web.

[xcede]: http://www.xcede.org/
[lonidata]: http://www.loni.ucla.edu/~pipeline/provenance.xsd
[lonipipe]: http://www.loni.ucla.edu/~pipeline/pipeline_xsd.xsd
[dataset]: http://schema.org/Dataset
[void]: http://www.w3.org/TR/void/

Since a key aspect of NI-DM is to build on existing technologies, we will extract relevant terminology and descriptors from the prior neuroimaging data representation and provenance efforts.


### Mapping XCEDE primitives to NI-DM

The XCEDE XML schema allows for storing information in the context of a flexible and extensible experiment hierarchy. 

It accommodates arbitrary configurations centered around Project, Subject, Visit, Study, Episode, and Acquisition objects, as well as limited information about data provenance. Effectively defining a hierarchy of relationships.

It is ill-suited for modeling and querying across complex derived data created from many of todayâ€™s workflow systems, including queries incorporating provenance.

### Mappings

These are prototype mappings from XCEDE categories to NI-DM classes

- xcede:Project     -> prov:Activity (prov:type = nidm:Project)
- xcede:Study       -> prov:Activity (prov:type = nidm:Study, dcterms:subpartOf = some_project)
- xcede:Visit       -> prov:Activity (prov:type = nidm:Visit, dcterms:subpartOf = some_study)
- xcede:Episode     -> prov:Activity (prov:type = nidm:Episode, dcterms:subpartOf = some_visit)
- xcede:Acquisition -> prov:Activity (prov:type = nidm:Acquisition, dcterms:subpartOf = some_episode)

## Should there be a minimal metadata standard?

### Yes
- Guides developers and users to know exactly what should be encoded
- Very useful when most information are informal
- Potentially useful when information is sensitive

### No
- Most neuroimaging data are already digital (all info should be encoded)
- Sensitive information can be encrypted
- Most workflow tools are software and therefore should encode all steps

### What should be captured?

What is really required is not a minimal standard but a set of electronic data capture tools. In brain imaging, electronic data capture can perform all the necessary encoding and software and libraries can enable users and developers to quickly encode all the necessary information. The necessary components are:

- Raw data:
    - DICOM or NIFTI + DICOM DUMP
    - Imaging protocols (if not encoded in DICOM)
    - Stimuli
    - Behavioral responses
    - Neuropsych or other assessments
- Analyses and software
    - Stimulus Presentation software
    - Analysis software (processing, statistical analysis, figure generation)
    - Provenance
- Derived data
    - Statistical results
    - Tables
    - Figures
- Research info
    - Prior literature
    - Hypotheses
    - Related work
    - Publications
    - Funding
    - Contributions

Whether these are represented as [Research objects][RO] or [provenance bundles][bundles], they represent the necessary information that can and should be captured.

[RO]: http://www.wf4ever-project.org/research-object-model
[bundles]: http://www.w3.org/TR/2013/REC-prov-o-20130430/#Bundle

## Recommended reading

- [Intro to semantic web concepts](http://www.cambridgesemantics.com/semantic-university)
- [PROV Overview](http://www.w3.org/TR/prov-overview/)
- [Research objects](http://www.wf4ever-project.org/research-object-model)
- [KARMA](http://www.isi.edu/integration/papers/gil11-iswc.pdf)
