Import and Export CSV Data With a YAML Metadata Header
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github
R
docs
inst
man
tests
.Rbuildignore
.gitignore
.travis.yml
DESCRIPTION
Makefile
NAMESPACE
NEWS.md
README.Rmd
README.md
appveyor.yml

README.md

Import and Export CSV Data With a YAML Metadata Header

CSVY is a file format that combines the simplicity of CSV (comma-separated values) with the metadata of other plain text and binary formats (JSON, XML, Stata, etc.). The CSVY file specification is simple: place a YAML header on top of a regular CSV. The yaml header is formatted according to the Table Schema of a Tabular Data Package.

A CSVY file looks like this:

#---
#profile: tabular-data-resource
#name: my-dataset
#path: https://raw.githubusercontent.com/csvy/csvy.github.io/master/examples/example.csvy
#title: Example file of csvy 
#description: Show a csvy sample file.
#format: csvy
#mediatype: text/vnd.yaml
#encoding: utf-8
#schema:
#  fields:
#  - name: var1
#    type: string
#  - name: var2
#    type: integer
#  - name: var3
#    type: number
#dialect:
#  csvddfVersion: 1.0
#  delimiter: ","
#  doubleQuote: false
#  lineTerminator: "\r\n"
#  quoteChar: "\""
#  skipInitialSpace: true
#  header: true
#sources:
#- title: The csvy specifications
#  path: http://csvy.org/
#  email: ''
#licenses:
#- name: CC-BY-4.0
#  title: Creative Commons Attribution 4.0
#  path: https://creativecommons.org/licenses/by/4.0/
#---
var1,var2,var3
A,1,2.0
B,3,4.3

Which we can read into R like this:

library("csvy")
str(read_csvy(system.file("examples", "example1.csvy", package = "csvy")))
## 'data.frame':	2 obs. of  3 variables:
##  $ var1: chr  "A" "B"
##  $ var2: int  1 3
##  $ var3: num  2 4.3
##  - attr(*, "profile")= chr "tabular-data-resource"
##  - attr(*, "title")= chr "Example file of csvy"
##  - attr(*, "description")= chr "Show a csvy sample file."
##  - attr(*, "name")= chr "my-dataset"
##  - attr(*, "format")= chr "csvy"
##  - attr(*, "sources")=List of 1
##   ..$ :List of 3
##   .. ..$ name : chr "CC-BY-4.0"
##   .. ..$ title: chr "Creative Commons Attribution 4.0"
##   .. ..$ path : chr "https://creativecommons.org/licenses/by/4.0/"

Optional comment characters on the YAML lines make the data readable with any standard CSV parser while retaining the ability to import and export variable- and file-level metadata. The CSVY specification does not use these, but the csvy package for R does so that you (and other users) can continue to rely on utils::read.csv() or readr::read_csv() as usual. The import() function in rio supports CSVY natively.

Export

To create a CSVY file from R, just do:

library("csvy")
library("datasets")
write_csvy(iris, "iris.csvy")

It is also possible to export the metadata to separate YAML or JSON file (and then also possible to import from those separate files) by specifying the metadata field in write_csvy() and read_csvy().

Import

To read a CSVY into R, just do:

d1 <- read_csvy("iris.csvy")
str(d1)
## 'data.frame':	150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...
##   ..- attr(*, "levels")= chr  "setosa" "versicolor" "virginica"
##  - attr(*, "profile")= chr "tabular-data-package"
##  - attr(*, "name")= chr "iris"

or use any other appropriate data import function to ignore the YAML metadata:

d2 <- utils::read.table("iris.csvy", sep = ",", header = TRUE)
str(d2)
## 'data.frame':	150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Package Installation

The package is available on CRAN and can be installed directly in R using:

install.packages("csvy")

The latest development version on GitHub can be installed using devtools:

if(!require("remotes")){
    install.packages("remotes")
}
remotes::install_github("leeper/csvy")

CRAN Version Downloads Travis-CI Build Status Appveyor Build status codecov.io