 # FAIR Cookbook Recipe: Serializing Assets with DATS

 A case study with KidsFirst is considered for asset metadata serialization as DATS.

 **Authors**: Daniel J. B. Clarke

 **Maintainers**: Daniel J. B. Clarke

 **Version**: 1.0

 **License**: GPLv2+


 ## Motivations

 The [Data Tag Suite (DATS)](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) metadata model as described in [this paper](https://doi.org/10.1093/gigascience/giz165) and fully codefied in [this repository](https://github.com/datatagsuite/schema) strives to model datasets irrespective of their domains. [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) embodies several key elements making it extroudinarily useful for [FAIRification](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#FAIR):

 - Machine Readibility: Datasets described with a consistent [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) format permit machines to resolve [FAIR](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#FAIR) metadata such as identifiers, authorship, funding, citation, license, consent, access, provenance, and ultimatly topic as well.
 - [RDF](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#RDF) Interoperability: Serialized in strict [JSON-LD](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#JSON-LD), the [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) format is renderable as an [RDF](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#RDF) graph permitting interoperability with ontological vocabularies and existing dataset description formats including [schema.org](https://schema.org/) and the [Open Biological and Biomedical Ontology (OBO)](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#OBO).
 - Findability: The utilization of these consistent formats will permit various existing services and endless future ones to be able to identify aspects of the dataset for the purposes of indexing and searching. One such service is [google dataset search](https://datasetsearch.research.google.com/) which [utilizes schema.org metadata](https://developers.google.com/search/docs/data-types/dataset).
 - [CFDE](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#CFDE) Compatibility: Tooling has been created to convert [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) to the [C2M2](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#C2M2) and for automatically evaluating the [FAIRness](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#FAIRness) of Datasets through the [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) metadata model.


 ## Ingredients

 1. Access to a manifest or [API](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#API) for serving your existing datasets.


 ## Objectives

 1. Convert your existing [metadata](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#metadata) into the [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) [metadata](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#metadata) schema
 2. Check the validity of your [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) schema


 ## Preparation

 We need to get the manifest or access to an [API](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#API) serving the existing
 [metadata](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#metadata). In our case study, [KidsFirst](https://commonfund.nih.gov/kidsfirst),
 the assets are browsable in the [file repository](https://portal.kidsfirstdrc.org/search/file)
 After enabling all "Columns" click the "Export TSV" button and save that file to
 `../input/file-table.tsv`.

In [1]:
# Python tool for data table processing
import pandas as pd
# Jupyter Notebook display helper
from IPython.display import display


In [2]:
df = pd.read_csv('../input/file-table.tsv', sep='\t', low_memory=False)
display(df.head())


FileNotFoundError: [Errno 2] File b'../input/file-table.tsv' does not exist: b'../input/file-table.tsv'

 ## [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) Conversion
 The full [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) schema is available [here](https://github.com/datatagsuite/schema),
  it includes a [JSON Schema](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#JSON-Schema) definition as well as a visualization of how things fit together.
 ![DATS Schema Definition](https://raw.githubusercontent.com/datatagsuite/docs/master/source/_static/DATS-revised-overview.jpg)
  The root schema for datasets is: <https://github.com/datatagsuite/schema/blob/master/dataset_schema.json>.

 [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) uses a strict [JSON-LD](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#JSON-LD) serialization.

 There are several ways to get a sense of what the [metadata](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#metadata) model entails, in some cases starting
  with an example is easier, but everything is easier with autocompletion and type-hints. Several
  code editors support [JSON Schema](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#JSON-Schema) for auto completion (see [this](http://schemastore.org/json/)).

 With [visual studio code](https://code.visualstudio.com/), you can set this up by linking to the
  schema with a `$schema` field.

In [3]:
%%sh
# Create a file DATS-Validation.json with the following contents
#  this is a simple json-schema which references the public DATS schema validator
cat > DATS-Validation.json << EOF
{
  "\$schema": "http://json-schema.org/draft-04/schema",
  "type": "object",
  "properties": {
    "dats": {
      "\$ref": "https://raw.githubusercontent.com/datatagsuite/schema/master/dataset_schema.json"
    }
  }
}
EOF

# Create a file to edit which will validate against the file from the DATS-Validation file
cat > my-dats.json << EOF
{
  "\$schema": "./DATS-Validation.json",
  "dats": {
    "title": "My First Json-Schema Validated DATS Object"
  }
}
EOF


 Modifying the created `my-dats.json`, you should be able to explore the fields
  through autocompletion with an editor that supports it.

 ![Validation hints of missing properties](../images/recipes/assets-to-dats/ss1.png)
 ![Auto completion for property fields](../images/recipes/assets-to-dats/ss2.png)

 Another way, or perhaps also, is to learn by example. Several *other* DCC's assets were
  processed and converted to [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) [here](https://github.com/nih-cfde/FAIR/), example files
  and scripts can be found in the `DCC_name/output` and `DCC_name/scripts` directories
  respectively.

 It's now time to convert what we can into [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS), striving to capture as much as possible
  from the original table.

 ### Challenge 1: What do you mean by Dataset?
 Even in this case, the definition of a Dataset becomes problematic and unclear. Remember
  that things we codify are often **models** and as such are not always perfect. Rather than
  thinking about Dataset with your interpretation of what it is, think of it in terms of
  how it will end up being used.

 This is what a 'dataset' looks like on Google Dataset Search; the same fields will be
  used, and more; for your own assets.

 ![](../images/recipes/assets-to-dats/ss3.png)

 In other words, irrespective of what your definition is of a 'dataset', you should
  consider using something that is identifiable enough to have its own unique [metadata](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#metadata)
  including dedicated landing page, unique identifier, citation, license, and more. File
  assets *associated* with that dataset will be listed under the dataset.

 Importantly, Datasets should ideally be associatable with singular biosamples when possible,
  so in some cases, it may make sense to consider each individual file to be its own dataset if
  each individual file is actually established for each biosample.

 Do note that [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) also supports Dataset in Dataset relationships if that becomes necessary.


In [4]:
# The KidsFirst table has 5 primary entity types in this file and a unique identifier
display(df[['File ID', 'Participants ID', 'Study ID', 'Biospecimen ID', 'Latest DID']].head())

# Using JSON-LD and keeping in mind the arbitrary DATS structure,
#  things should end up looking like so:
def dats_from_record(record):
  return {
    '@type': 'Dataset',
    'identifier': {
      '@type': 'Identifier',
      'identifier': record['Latest DID'],
    },
    'producedBy': {
      # The dataset in question was produced as part of a study
      '@type': 'Study',
      'identifier': {
        '@type': 'Identifier',
        'identifier': record['Study ID'],
      },
    },
    'isAbout': [
      {
        # The dataset in question has a biospecimen
        '@type': 'BiologicalEntity',
        'identifier': {
          '@type': 'Identifier',
          'identifier': record['Biospecimen ID'],
        },
      },
      {
        # The dataset in question is about this participant
        '@type': 'StudyGroup',
        'identifier': {
          '@type': 'Identifier',
          'identifier': record['Participants ID'],
        },
      },
    ],
    'distributions': [
      {
        # The dataset in question has this file
        '@type': 'DatasetDistribution',
        'identifier': {
          '@type': 'Identifier',
          'identifier': record['File ID'],
        },
      }
    ],
  }

# Converting each element to DATS
dats = {
  # schema.org context, gives RDF meaning to `@type` and `predicates` as defined by schema.org
  '@context': 'http://w3id.org/dats/context/sdo/dataset_sdo_context.jsonld',
  '@graph': [
    dats_from_record(record)
    for _, record in df.head().iterrows()
  ]
}
display(dats)


NameError: name 'df' is not defined

 There are several improvements we can make to the above:
 1. Give context to our identifiers, which only make sense in the context of KidsFirst
 2. Provide more [metadata](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#metadata) as available in our table

In [5]:
def dats_from_record(record):
  return {
    '@type': 'Dataset',
    'identifier': {
      '@type': 'Identifier',
      'identifier': record['Latest DID'],
      'identifierSource': 'https://portal.kidsfirstdrc.org/',
    },
    'storedIn': {
      '@type': 'DataRepository',
      'name': record['Repository'],
    },
    'producedBy': {
      '@type': 'Study',
      'identifier': {
        '@type': 'Identifier',
        'identifier': record['Study ID'],
        'identifierSource': 'https://portal.kidsfirstdrc.org/',
      },
      'name': record['Study Name'],
    },
    'isAbout': [
      {
        '@type': 'BiologicalEntity',
        'identifier': {
          '@type': 'Identifier',
          'identifier': record['Biospecimen ID'],
          'identifierSource': 'https://portal.kidsfirstdrc.org/',
        },
        'alternateIdentifiers': [
          {
            '@type': 'AlternateIdentifier',
            'identifier': record['Sample External ID'],
            # NOTE: Preferred identifierSource with globally unique semantic URI
          },
          {
            '@type': 'AlternateIdentifier',
            'identifier': record['Aliquot External ID'],
            # NOTE: Preferred identifierSource with globally unique semantic URI
          },
        ],
      },
      {
        '@type': 'StudyGroup',
        'identifier': {
          # NOTE: Ideally, `${identifierSource}${identifier}` resolves to a landing page for this entity
          '@type': 'Identifier',
          'identifier': record['Participants ID'],
          'identifierSource': 'https://portal.kidsfirstdrc.org/participant/',
        },
        'alternateIdentifiers': [
          {
            '@type': 'AlternateIdentifier',
            'identifier': record['Participant External ID'],
            # NOTE: Preferred identifierSource with globally unique semantic URI
          },
        ],
      },
    ],
    'distributions': [
      {
        'identifier': {
          # NOTE: Ideally, `${identifierSource}${identifier}` resolves to a landing page for this entity
          '@type': 'Identifier',
          'identifier': record['File ID'],
          'identifierSource': 'https://portal.kidsfirstdrc.org/file/',
        },
        '@type': 'DatasetDistribution',
        'formats': [
          record['File Format'],
        ],
        'size': record['File Size'],
        'unit': {
          '@type': 'Annotation',
          'value': 'bytes',
          # NOTE: Preferred valueIRI with globally unique semantic URI
        },
        'access': {
          '@type': 'Access',
          'identifier': {
            '@type': 'Identifier',
            'identifier': record['File Name'],
          },
          'alternateIdentifiers': [
            {
              '@type': 'AlternateIdentifier',
              'identifier': record['File External ID'],
              # NOTE: Preferred identifierSource with globally unique semantic URI
            },
          ],
          'landingPage': 'https://portal.kidsfirstdrc.org/file/' + record['File ID'],
          # NOTE: Ideally accessURL would be specified
        }
      }
    ],
    'types': [
      {
        '@type': 'DataType',
        'information': {
          '@type': 'Annotation',
          'value': record['Data Type'],
        },
      },
    ],
    'extraProperties': [*filter(None, [
      # Metadata that doesn't fit anywhere else in DATS but may be relevant
      {
        '@type': 'CategoryValuesPair',
        'category': 'tissue',
        # NOTE: Preferred categoryIRI with globally unique semantic URI
        'values': [
          {
            '@type': 'Annotation',
            'value': record['Tissue Type (Source Text)'],
            # NOTE: Preferred valueIRI with globally unique semantic URI
          }
        ]
      } if record['Tissue Type (Source Text)'] != '--' else None,
      {
        '@type': 'CategoryValuesPair',
        'category': 'diagnosis',
        # NOTE: Preferred categoryIRI with globally unique semantic URI
        'values': [
          {
            '@type': 'Annotation',
            'value': record['Diagnosis (Source Text)'],
            # NOTE: Preferred valueIRI with globally unique semantic URI
          }
        ]
      } if record['Diagnosis (Source Text)'] != '--' else None, # Don't create entries for junk
      {
        '@type': 'CategoryValuesPair',
        'category': 'proband',
        # NOTE: Preferred categoryIRI with globally unique semantic URI
        'values': [
          {
            '@type': 'Annotation',
            'value': record['Proband'],
            # NOTE: Preferred valueIRI with globally unique semantic URI
          },
        ],
      } if record['Proband'] != '--' else None, # Don't create entries for junk,
    ])],
  }

# Converting each element to DATS
dats = {
  # schema.org context, gives RDF meaning to `@type` and `predicates` as defined by schema.org
  '@context': 'http://w3id.org/dats/context/sdo/dataset_sdo_context.jsonld',
  '@graph': [
    dats_from_record(record)
    for _, record in df.head().iterrows()
  ]
}
display(dats)


NameError: name 'df' is not defined

 Now we see, with some mapping effort, we were able to get all of the [metadata](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#metadata)
  from the file manifest table into [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS). It's important to note that there are
  **lots** of fields missing including license, authorship information, and more which
  need to be found from other places to further complete and improve this model.
 With our newly created object, let's check to make sure we didn't make any mistakes!
  For this purpose, just as we can use json-schema for auto completion help in our editor,
  we can also use it for programatic validation of our `dats` object.

In [6]:
from jsonschema import Draft4Validator

# Get the first record
record = dats['@graph'][0]

# Validate it against DATS dataset schema
validator = Draft4Validator({'$ref': 'https://raw.githubusercontent.com/datatagsuite/schema/master/dataset_schema.json'})
for error in validator.iter_errors(record):
  display(error.message)


NameError: name 'dats' is not defined

 Uh-oh; we've got some errors.

 Let's fix them and try again.

 For readability, the changes we had to make below are here:
 ```diff
 @@ -1,6 +1,7 @@
  def dats_from_record(record):
    return {
      '@type': 'Dataset',
 +    'title': record['Study Name'],
      'identifier': {
        '@type': 'Identifier',
        'identifier': record['Latest DID'],
 @@ -10,6 +11,12 @@
        '@type': 'DataRepository',
        'name': record['Repository'],
      },
 +    'creators': [
 +      {
 +        "@type": "Organization",
 +        "name": "KidsFirst",
 +      }
 +    ],
      'producedBy': {
        '@type': 'Study',
        'identifier': {
 @@ -22,6 +29,8 @@
      'isAbout': [
        {
          '@type': 'BiologicalEntity',
 +        # NOTE: name is a required field
 +        'name': record['Biospecimen ID'],
          'identifier': {
            '@type': 'Identifier',
            'identifier': record['Biospecimen ID'],
 @@ -42,6 +51,8 @@
        },
        {
          '@type': 'StudyGroup',
 +        # NOTE: name is a required field
 +        'name': record['Participants ID'],
          'identifier': {
            # NOTE: Ideally, `${identifierSource}${identifier}` resolves to a landing page for this entity
            '@type': 'Identifier',
 @@ -69,7 +80,7 @@
          'formats': [
            record['File Format'],
          ],
 -        'size': record['File Size'],
 +        'size': int(record['File Size']),
          'unit': {
            '@type': 'Annotation',
            'value': 'bytes',
 @@ -141,4 +152,4 @@
          ],
        } if record['Proband'] != '--' else None, # Don't create entries for junk,
      ])],
 -  }
 +  }
 ```

 You can see that we put in an invalid type and were missing some fields.
 In some cases, we need to add [metadata](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#metadata) that wasn't in the original
 table such as [metadata](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#metadata) about our own organization!

 This is relevant in a catalog of many datasets but often isn't present
  in your own data; it's best if you determine how your own data will
  link back to your organization, than us trying to figure it out! That's why
  [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) requires that [metadata](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#metadata).

In [7]:
def dats_from_record(record):
  return {
    '@type': 'Dataset',
    'title': record['Study Name'],
    'identifier': {
      '@type': 'Identifier',
      'identifier': record['Latest DID'],
      'identifierSource': 'https://portal.kidsfirstdrc.org/',
    },
    'storedIn': {
      '@type': 'DataRepository',
      'name': record['Repository'],
    },
    'creators': [
      {
        "@type": "Organization",
        "name": "KidsFirst",
      }
    ],
    'producedBy': {
      '@type': 'Study',
      'identifier': {
        '@type': 'Identifier',
        'identifier': record['Study ID'],
        'identifierSource': 'https://portal.kidsfirstdrc.org/',
      },
      'name': record['Study Name'],
    },
    'isAbout': [
      {
        '@type': 'BiologicalEntity',
        # NOTE: name is a required field
        'name': record['Biospecimen ID'],
        'identifier': {
          '@type': 'Identifier',
          'identifier': record['Biospecimen ID'],
          'identifierSource': 'https://portal.kidsfirstdrc.org/',
        },
        'alternateIdentifiers': [
          {
            '@type': 'AlternateIdentifier',
            'identifier': record['Sample External ID'],
            # NOTE: Preferred identifierSource with globally unique semantic URI
          },
          {
            '@type': 'AlternateIdentifier',
            'identifier': record['Aliquot External ID'],
            # NOTE: Preferred identifierSource with globally unique semantic URI
          },
        ],
      },
      {
        '@type': 'StudyGroup',
        # NOTE: name is a required field
        'name': record['Participants ID'],
        'identifier': {
          # NOTE: Ideally, `${identifierSource}${identifier}` resolves to a landing page for this entity
          '@type': 'Identifier',
          'identifier': record['Participants ID'],
          'identifierSource': 'https://portal.kidsfirstdrc.org/participant/',
        },
        'alternateIdentifiers': [
          {
            '@type': 'AlternateIdentifier',
            'identifier': record['Participant External ID'],
            # NOTE: Preferred identifierSource with globally unique semantic URI
          },
        ],
      },
    ],
    'distributions': [
      {
        'identifier': {
          # NOTE: Ideally, `${identifierSource}${identifier}` resolves to a landing page for this entity
          '@type': 'Identifier',
          'identifier': record['File ID'],
          'identifierSource': 'https://portal.kidsfirstdrc.org/file/',
        },
        '@type': 'DatasetDistribution',
        'formats': [
          record['File Format'],
        ],
        'size': int(record['File Size']),
        'unit': {
          '@type': 'Annotation',
          'value': 'bytes',
          # NOTE: Preferred valueIRI with globally unique semantic URI
        },
        'access': {
          '@type': 'Access',
          'identifier': {
            '@type': 'Identifier',
            'identifier': record['File Name'],
          },
          'alternateIdentifiers': [
            {
              '@type': 'AlternateIdentifier',
              'identifier': record['File External ID'],
              # NOTE: Preferred identifierSource with globally unique semantic URI
            },
          ],
          'landingPage': 'https://portal.kidsfirstdrc.org/file/' + record['File ID'],
          # NOTE: Ideally accessURL would be specified
        }
      }
    ],
    'types': [
      {
        '@type': 'DataType',
        'information': {
          '@type': 'Annotation',
          'value': record['Data Type'],
        },
      },
    ],
    'extraProperties': [*filter(None, [
      # Metadata that doesn't fit anywhere else in DATS but may be relevant
      {
        '@type': 'CategoryValuesPair',
        'category': 'tissue',
        # NOTE: Preferred categoryIRI with globally unique semantic URI
        'values': [
          {
            '@type': 'Annotation',
            'value': record['Tissue Type (Source Text)'],
            # NOTE: Preferred valueIRI with globally unique semantic URI
          }
        ]
      } if record['Tissue Type (Source Text)'] != '--' else None,
      {
        '@type': 'CategoryValuesPair',
        'category': 'diagnosis',
        # NOTE: Preferred categoryIRI with globally unique semantic URI
        'values': [
          {
            '@type': 'Annotation',
            'value': record['Diagnosis (Source Text)'],
            # NOTE: Preferred valueIRI with globally unique semantic URI
          }
        ]
      } if record['Diagnosis (Source Text)'] != '--' else None, # Don't create entries for junk
      {
        '@type': 'CategoryValuesPair',
        'category': 'proband',
        # NOTE: Preferred categoryIRI with globally unique semantic URI
        'values': [
          {
            '@type': 'Annotation',
            'value': record['Proband'],
            # NOTE: Preferred valueIRI with globally unique semantic URI
          },
        ],
      } if record['Proband'] != '--' else None, # Don't create entries for junk,
    ])],
  }

# Converting each element to DATS
dats = {
  # schema.org context, gives RDF meaning to `@type` and `predicates` as defined by schema.org
  '@context': 'http://w3id.org/dats/context/sdo/dataset_sdo_context.jsonld',
  '@graph': [
    dats_from_record(record)
    for _, record in df.head().iterrows()
  ]
}


NameError: name 'df' is not defined

In [8]:

# Let's validate *all records*
record = dats['@graph'][0]

# Validate it against DATS dataset schema
validator = Draft4Validator({
  '$ref': 'https://raw.githubusercontent.com/datatagsuite/schema/master/dataset_schema.json'
})

for record in dats['@graph']:
  for error in validator.iter_errors(record):
    display({ 'title': record['title'], 'error': error.message })


NameError: name 'dats' is not defined

 ## Conclusion
 As hoped, everything validates and we've successfully produced [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS).
  Though we now know our [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) is "valid", we're still not done. As with everything
  there are levels; the more fields we fill out in the [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS) the better off
  we will be. This is where a [FAIR](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#FAIR) assessment comes in -- we can write metrics
  that *also speak [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS)* but are looking for presence of certain fields,
  or checking that our `identifier` can actually be verified against the given `identifierSource`
  [metadata](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#metadata) attributes.

 Nonetheless, we've taken a step in the right direction. Future recipes will discuss
  performing [FAIR](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#FAIR) assessments on this [DATS](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#DATS), converting it to CFDE's [C2M2](https://github.com/nih-cfde/specifications-and-documentation/blob/master/draft-CFDE_glossary/glossary.md#C2M2) Frictionless Metadata model
  and more.