This spec is adapted from Data Packages in TOML with some modifications. This is a draft specification. If you have comments, suggestions or modifications visit the issue tracker or submit a pull request.
A dataset package is encoded in a descriptor file named datapackage.toml
A dataset consists of:
- Dataset metadata
- Resource metadata
This specification extends Data Packages to introduce multilingual fields, as well as modify the field requirements speficifally for the Iran Open Data project.
REQUIRED
type: alphanumeric
A unique identifier, alphanumeric, no symbols
name = "writersblock"
REQUIRED
type: string
License of the dataset
license = "CC-BY-NC"
REQUIRED
type: Array of keyword object
keyword.lang: string
keyword.wordlist: Array of string
An array of keyword objects. A keyword object contains the language and a wordlist for that language describing the themes of this dataset.
[[keywords]]
lang = "en"
wordlist = ["death", "Iran", "diseases"]
[[keywords]]
lang = "fa"
wordlist = ["مرک" , "ایران" , "مریضی"]
OPTIONAL
type: date string
Release date of the dataset from the author
created_at = "2016-09-19"
AUTOMATIC
type: date string
Date the dataset was last updated, should be at least equal or greater to the date added to IOD indexed_at
. This is added automatically by the API.
updated_at = "2016-09-19"
AUTOMATIC
type: date string
Date the dataset was added to the Iran Open Data catalog. This is added automatically by the API.
indexed_at = "2016-09-19"
OPTIONAL
type: Array of two numbers
The period of time this dataset covers. This should follow the Iranian calendar dates. If it's the same year, repeat the year twice (e.g [1392, 1392])
period = [1358, 1392]
OPTIONAL
type: string
How often will this dataset is updated. Values can be daily, weekly, monthly, quarterly, yearly
frequency = "monthly"
REQUIRED
type: Array of string
An array of categories this dataset belongs to. The strings can be one of these values:
- population
- energy
- employment
- women
- economics
- banking
- budget
- housing
- transport
- trade
- health
- education
- crime
- environment
- communications
- elections
category= ["education"]
REQUIRED
type: Array of maintainer object
maintainer.lang: string
maintainer.text: string
Organization that is updating this dataset. An array of organizations where language and translated text is specified. Lang can be en or "fa".
[[maintainer]]
lang = "en"
text = "Small Media Foundation"
[[maintainer]]
lang = "fa"
text = "بنیاد رسانه خرد"
REQUIRED
type: object
author.name: string
author.web: URL string
Author of the data. Name should be a string. Web should be a URL pointing to the author's homepage.
[author]
name = "Iran Book House"
web = "http://ketab.ir"
REQUIRED
-
type: Array of title object
-
title.lang: string
-
title.text: string
An array of titles. The language and translated text should be specified. The lang can be "en" or "fa".
[[title]]
lang = "en"
text = "Dataset of books from Iran Book House"
[[title]]
lang = "fa"
text = "Dataset of books from Iran Book House"
REQUIRED
-
type: Array of description object
-
description.lang: string
-
description.text: string
An array of descriptions. The language and translated text should be specified. The lang can be "en" or "fa".
[[description]]
lang = "fa"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."
[[description]]
lang = "en"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."
Resources are an array of resource objects. Each resource object has the following keys:
REQUIRED
type: Array of URL object
url.lang: string
url.link: URI string
An array of URLs pointing to clean resource data for each language (such as a processed CSV). The language has to be specified, however only one of the languages "fa" or "en" is required to pass validation. This occurs when there is no possible translation for the resource file.
[[url]]
lang = "fa"
link = "http://example.com/csv"
OPTIONAL
type: URL string
URL pointing to code that processed the raw data into clean data to produce this resource.
code = "http://github.com/examplerepo"
REQUIRED
-
type: Array of title object
-
title.lang: string
-
title.text: string
An array of titles. The language and translated text should be specified. The lang can be "en" or "fa".
[[resources.title]]
lang = "en"
text = "Database"
[[resources.title]]
lang = "fa"
text = "Database"
OPTIONAL
-
type: Array of description object
-
description.lang: string
-
description.text: string
An array of descriptions for further description of the resource or the process. The language and translated text should be specified. The lang can be "en" or "fa".
[[description]]
lang = "fa"
text = "CSV extracted from the Iran Book House search and index."
[[description]]
lang = "en"
text = "CSV extracted from the Iran Book House search and index."
REQUIRED
type: Array of source object
source.name: string
source.web: URL string
The sources field is an array of objects, each source having a name and website. The web key points to the original raw data source.
[[resources.sources]]
name = "Iran Book House Search"
web = "http://ketab.ir/modules.php?name=News&op=infobooksearch"
REQUIRED
type: object
schema.format: string
schema.fields: Array of field objects
schema.fields.name: string
schema.fields.type: string
The schema is an object that has a format.
If the format is CSV, it should have a key named fields.
fields
is an array of field objects each having a name and type.
[resources.schema]
format = "csv"
[[resources.schema.fields]]
name = "Author"
type = "string"
[[resources.schema.fields]]
name = "Publisher"
type = "string"
[[resources.schema.fields]]
name = "Title"
type = "string"
[[resources.schema.fields]]
name = "Year"
type = "date"
name = "writersblock"
license = "CC-BY-NC"
period = [1981, 2016]
frequency = "monthly"
category= "education"
[[maintainer]]
lang = "en"
text = "Small Media Foundation"
[[maintainer]]
lang = "fa"
text = "بنیاد رسانه خرد"
[[keywords]]
lang = "en"
wordlist = ["death", "Iran", "diseases"]
[[keywords]]
lang = "fa"
wordlist = ["مرک" , "ایران" , "مریضی"]
[author]
web = "http://ketab.ir"
[author.name]
lang = "fa"
text = "Iran Book House"
[author.name]
lang = "en"
text = "Iran Book House"
[[title]]
lang = "en"
text = "Dataset of books from Iran Book House"
[[title]]
lang = "fa"
text = "Dataset of books from Iran Book House"
[[description]]
lang = "fa"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."
[[description]]
lang = "en"
text = "Over 880,000 books extracted from the Iran Book House website. The Iran Book House maintains a database of all the published books that get sent to the National Library."
[[resources]]
url = "http://example.com/csv"
[[resources.title]]
lang = "en"
text = "Database"
[[resources.title]]
lang = "fa"
text = "Database"
[[resources.sources]]
name = "Iran Book House Search"
web = "http://ketab.ir/modules.php?name=News&op=infobooksearch"
[resources.schema]
format = "csv"
[[resources.schema.fields]]
name = "Author"
type = "string"
[[resources.schema.fields]]
name = "Publisher"
type = "string"
[[resources.schema.fields]]
name = "Title"
type = "string"
[[resources.schema.fields]]
name = "Year"
type = "date"