IOD Dataset Spec

This spec is adapted from Data Packages in TOML with some modifications. This is a draft specification. If you have comments, suggestions or modifications visit the issue tracker or submit a pull request.

Specification

A dataset package is encoded in a descriptor file named datapackage.toml

A dataset consists of:

Dataset metadata
Resource metadata

This specification extends Data Packages to introduce multilingual fields, as well as modify the field requirements speficifally for the Iran Open Data project.

Dataset metadata

name

REQUIRED

type: alphanumeric

A unique identifier, alphanumeric, no symbols

name = "writersblock"

license

REQUIRED

type: string

License of the dataset

license = "CC-BY-NC"

keywords

REQUIRED

type: Array of keyword object
keyword.lang: string
keyword.wordlist: Array of string

An array of keyword objects. A keyword object contains the language and a wordlist for that language describing the themes of this dataset.

[[keywords]]
lang = "en" 
wordlist = ["death", "Iran", "diseases"]

[[keywords]]
lang = "fa" 
wordlist = ["مرک" , "ایران" , "مریضی"]

created_at

OPTIONAL

type: date string

Release date of the dataset from the author

created_at = "2016-09-19"

updated_at

AUTOMATIC

type: date string

Date the dataset was last updated, should be at least equal or greater to the date added to IOD indexed_at. This is added automatically by the API.

updated_at = "2016-09-19"

indexed_at

AUTOMATIC

type: date string

Date the dataset was added to the Iran Open Data catalog. This is added automatically by the API.

indexed_at = "2016-09-19"

period

OPTIONAL

type: Array of two numbers

The period of time this dataset covers. This should follow the Iranian calendar dates. If it's the same year, repeat the year twice (e.g [1392, 1392])

period = [1358, 1392]

frequency

OPTIONAL

type: string

How often will this dataset is updated. Values can be daily, weekly, monthly, quarterly, yearly

frequency = "monthly"

maintainer

REQUIRED

type: Array of maintainer object
maintainer.lang: string
maintainer.text: string

Organization that is updating this dataset. An array of organizations where language and translated text is specified. Lang can be en or "fa".

[[maintainer]]
lang = "en" 
text = "Small Media Foundation"

[[maintainer]]
lang = "fa" 
text = "بنیاد رسانه خرد"

author

REQUIRED

type: object
author.name: string
author.web: URL string

Author of the data. Name should be a string. Web should be a URL pointing to the author's homepage.

[author]
name = "Iran Book House" 
web = "http://ketab.ir"

title

REQUIRED

type: Array of title object
title.lang: string
title.text: string An array of titles. The language and translated text should be specified. The lang can be "en" or "fa".

[[title]]
lang = "en" 
text = "Dataset of books from Iran Book House" 

[[title]]
lang = "fa"
text = "Dataset of books from Iran Book House"

description

REQUIRED

type: Array of description object
description.lang: string
description.text: string

An array of descriptions. The language and translated text should be specified. The lang can be "en" or "fa".


[[description]]
lang = "fa"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."

[[description]]
lang = "en"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."

Resource metadata

Resources are an array of resource objects. Each resource object has the following keys:

url

REQUIRED

type: Array of URL object
url.lang: string
url.link: URI string

An array of URLs pointing to clean resource data for each language (such as a processed CSV). The language has to be specified, however only one of the languages "fa" or "en" is required to pass validation. This occurs when there is no possible translation for the resource file.

[[url]] 
lang = "fa"
link = "http://example.com/csv"

code

OPTIONAL

type: URL string

URL pointing to code that processed the raw data into clean data to produce this resource.

code = "http://github.com/examplerepo"

title

REQUIRED

type: Array of title object
title.lang: string
title.text: string

An array of titles. The language and translated text should be specified. The lang can be "en" or "fa".

[[resources.title]]
lang = "en"  
text = "Database"

[[resources.title]]
lang = "fa"
text = "Database"

description

OPTIONAL

type: Array of description object
description.lang: string
description.text: string

An array of descriptions for further description of the resource or the process. The language and translated text should be specified. The lang can be "en" or "fa".

[[description]]
lang = "fa"
text = "CSV extracted from the Iran Book House search and index."

[[description]]
lang = "en"
text = "CSV extracted from the Iran Book House search and index."

sources

REQUIRED

type: Array of source object
source.name: string
source.web: URL string

The sources field is an array of objects, each source having a name and website. The web key points to the original raw data source.

[[resources.sources]]
name = "Iran Book House Search" 
web = "http://ketab.ir/modules.php?name=News&op=infobooksearch"

schema

REQUIRED

type: object
schema.format: string
schema.fields: Array of field objects
schema.fields.name: string
schema.fields.type: string

The schema is an object that has a format. If the format is CSV, it should have a key named fields. fields is an array of field objects each having a name and type.

[resources.schema]
format = "csv"

    [[resources.schema.fields]]
    name = "Author"
    type = "string"

    [[resources.schema.fields]]
    name = "Publisher"
    type = "string"

    [[resources.schema.fields]]
    name = "Title"
    type = "string"

    [[resources.schema.fields]]
    name = "Year"
    type = "date"

Example

name = "writersblock"
license = "CC-BY-NC"
period = [1981, 2016] 
frequency = "monthly"
category= "education"

[[maintainer]]
lang = "en" 
text = "Small Media Foundation"

[[maintainer]]
lang = "fa" 
text = "بنیاد رسانه خرد"

[[keywords]]
lang = "en" 
wordlist = ["death", "Iran", "diseases"]

[[keywords]]
lang = "fa" 
wordlist = ["مرک" , "ایران" , "مریضی"]

[author]
web = "http://ketab.ir"
 
  [author.name]
  lang = "fa"
  text = "Iran Book House"

  [author.name]
  lang = "en"
  text = "Iran Book House"

[[title]]
lang = "en" 
text = "Dataset of books from Iran Book House" 

[[title]]
lang = "fa"
text = "Dataset of books from Iran Book House"

[[description]]
lang = "fa"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."

[[description]]
lang = "en"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."

[[resources]]

  url = "http://example.com/csv"

  [[resources.title]]
  lang = "en" 
  text = "Database"

  [[resources.title]]
  lang = "fa"
  text = "Database"

  [[resources.sources]]
  name = "Iran Book House Search" 
  web = "http://ketab.ir/modules.php?name=News&op=infobooksearch" 

  [resources.schema]
  format = "csv"

      [[resources.schema.fields]]
      name = "Author"
      type = "string"

      [[resources.schema.fields]]
      name = "Publisher"
      type = "string"

      [[resources.schema.fields]]
      name = "Title"
      type = "string"

      [[resources.schema.fields]]
      name = "Year"
      type = "date"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec.md

spec.md

IOD Dataset Spec

Specification

Dataset metadata

name

license

keywords

created_at

updated_at

indexed_at

period

frequency

category

maintainer

author

title

description

Resource metadata

url

code

title

description

sources

schema

Example

Files

spec.md

Latest commit

History

spec.md

File metadata and controls

IOD Dataset Spec

Specification

Dataset metadata

name

license

keywords

created_at

updated_at

indexed_at

period

frequency

category

maintainer

author

title

description

Resource metadata

url

code

title

description

sources

schema

Example