Skip to content

Latest commit

 

History

History
411 lines (300 loc) · 8.49 KB

spec.md

File metadata and controls

411 lines (300 loc) · 8.49 KB

IOD Dataset Spec

This spec is adapted from Data Packages in TOML with some modifications. This is a draft specification. If you have comments, suggestions or modifications visit the issue tracker or submit a pull request.

Specification

A dataset package is encoded in a descriptor file named datapackage.toml

A dataset consists of:

  • Dataset metadata
  • Resource metadata

This specification extends Data Packages to introduce multilingual fields, as well as modify the field requirements speficifally for the Iran Open Data project.

Dataset metadata

name

REQUIRED

  • type: alphanumeric

A unique identifier, alphanumeric, no symbols

name = "writersblock"

license

REQUIRED

  • type: string

License of the dataset

license = "CC-BY-NC" 

keywords

REQUIRED

  • type: Array of keyword object
  • keyword.lang: string
  • keyword.wordlist: Array of string

An array of keyword objects. A keyword object contains the language and a wordlist for that language describing the themes of this dataset.

[[keywords]]
lang = "en" 
wordlist = ["death", "Iran", "diseases"]

[[keywords]]
lang = "fa" 
wordlist = ["مرک" , "ایران" , "مریضی"]

created_at

OPTIONAL

  • type: date string

Release date of the dataset from the author

created_at = "2016-09-19" 

updated_at

AUTOMATIC

  • type: date string

Date the dataset was last updated, should be at least equal or greater to the date added to IOD indexed_at. This is added automatically by the API.

updated_at = "2016-09-19"

indexed_at

AUTOMATIC

  • type: date string

Date the dataset was added to the Iran Open Data catalog. This is added automatically by the API.

indexed_at = "2016-09-19"

period

OPTIONAL

  • type: Array of two numbers

The period of time this dataset covers. This should follow the Iranian calendar dates. If it's the same year, repeat the year twice (e.g [1392, 1392])

period = [1358, 1392]

frequency

OPTIONAL

  • type: string

How often will this dataset is updated. Values can be daily, weekly, monthly, quarterly, yearly

frequency = "monthly"

category

REQUIRED

  • type: Array of string

An array of categories this dataset belongs to. The strings can be one of these values:

  • population
  • energy
  • employment
  • women
  • economics
  • banking
  • budget
  • housing
  • transport
  • trade
  • health
  • education
  • crime
  • environment
  • communications
  • elections
category= ["education"]

maintainer

REQUIRED

  • type: Array of maintainer object
  • maintainer.lang: string
  • maintainer.text: string

Organization that is updating this dataset. An array of organizations where language and translated text is specified. Lang can be en or "fa".

[[maintainer]]
lang = "en" 
text = "Small Media Foundation"

[[maintainer]]
lang = "fa" 
text = "بنیاد رسانه خرد"

author

REQUIRED

  • type: object
  • author.name: string
  • author.web: URL string

Author of the data. Name should be a string. Web should be a URL pointing to the author's homepage.

[author]
name = "Iran Book House" 
web = "http://ketab.ir" 

title

REQUIRED

  • type: Array of title object

  • title.lang: string

  • title.text: string An array of titles. The language and translated text should be specified. The lang can be "en" or "fa".

[[title]]
lang = "en" 
text = "Dataset of books from Iran Book House" 

[[title]]
lang = "fa"
text = "Dataset of books from Iran Book House"

description

REQUIRED

  • type: Array of description object

  • description.lang: string

  • description.text: string

An array of descriptions. The language and translated text should be specified. The lang can be "en" or "fa".


[[description]]
lang = "fa"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."

[[description]]
lang = "en"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."

Resource metadata

Resources are an array of resource objects. Each resource object has the following keys:

url

REQUIRED

  • type: Array of URL object
  • url.lang: string
  • url.link: URI string

An array of URLs pointing to clean resource data for each language (such as a processed CSV). The language has to be specified, however only one of the languages "fa" or "en" is required to pass validation. This occurs when there is no possible translation for the resource file.

[[url]] 
lang = "fa"
link = "http://example.com/csv" 

code

OPTIONAL

  • type: URL string

URL pointing to code that processed the raw data into clean data to produce this resource.

code = "http://github.com/examplerepo"

title

REQUIRED

  • type: Array of title object

  • title.lang: string

  • title.text: string

An array of titles. The language and translated text should be specified. The lang can be "en" or "fa".

[[resources.title]]
lang = "en"  
text = "Database"

[[resources.title]]
lang = "fa"
text = "Database"

description

OPTIONAL

  • type: Array of description object

  • description.lang: string

  • description.text: string

An array of descriptions for further description of the resource or the process. The language and translated text should be specified. The lang can be "en" or "fa".

[[description]]
lang = "fa"
text = "CSV extracted from the Iran Book House search and index."

[[description]]
lang = "en"
text = "CSV extracted from the Iran Book House search and index."

sources

REQUIRED

  • type: Array of source object
  • source.name: string
  • source.web: URL string

The sources field is an array of objects, each source having a name and website. The web key points to the original raw data source.

[[resources.sources]]
name = "Iran Book House Search" 
web = "http://ketab.ir/modules.php?name=News&op=infobooksearch" 

schema

REQUIRED

  • type: object
  • schema.format: string
  • schema.fields: Array of field objects
  • schema.fields.name: string
  • schema.fields.type: string

The schema is an object that has a format. If the format is CSV, it should have a key named fields. fields is an array of field objects each having a name and type.

[resources.schema]
format = "csv"

    [[resources.schema.fields]]
    name = "Author"
    type = "string"

    [[resources.schema.fields]]
    name = "Publisher"
    type = "string"

    [[resources.schema.fields]]
    name = "Title"
    type = "string"

    [[resources.schema.fields]]
    name = "Year"
    type = "date"

Example

name = "writersblock"
license = "CC-BY-NC"
period = [1981, 2016] 
frequency = "monthly"
category= "education"

[[maintainer]]
lang = "en" 
text = "Small Media Foundation"

[[maintainer]]
lang = "fa" 
text = "بنیاد رسانه خرد"

[[keywords]]
lang = "en" 
wordlist = ["death", "Iran", "diseases"]

[[keywords]]
lang = "fa" 
wordlist = ["مرک" , "ایران" , "مریضی"]

[author]
web = "http://ketab.ir"
 
  [author.name]
  lang = "fa"
  text = "Iran Book House"

  [author.name]
  lang = "en"
  text = "Iran Book House"

[[title]]
lang = "en" 
text = "Dataset of books from Iran Book House" 

[[title]]
lang = "fa"
text = "Dataset of books from Iran Book House"

[[description]]
lang = "fa"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."

[[description]]
lang = "en"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."

[[resources]]

  url = "http://example.com/csv"

  [[resources.title]]
  lang = "en" 
  text = "Database"

  [[resources.title]]
  lang = "fa"
  text = "Database"

  [[resources.sources]]
  name = "Iran Book House Search" 
  web = "http://ketab.ir/modules.php?name=News&op=infobooksearch" 

  [resources.schema]
  format = "csv"

      [[resources.schema.fields]]
      name = "Author"
      type = "string"

      [[resources.schema.fields]]
      name = "Publisher"
      type = "string"

      [[resources.schema.fields]]
      name = "Title"
      type = "string"

      [[resources.schema.fields]]
      name = "Year"
      type = "date"