-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package API #1
Comments
The next thing to consider is the structure of the I think
The data dictionary will use a Restriction on typesI think there needs to be a restriction on the types possible for the columns. This limitation is provided by the flat-files themselves, given that we will want to write, then read, without any loss.
|
We can also think about the structure about the In paractice, a It will have a:
|
Incidentally, a few years back, I've implemented a naive solution to that, in either a dcf file (remember these?) or the data frame's attributes, documents the data's origin, owner, field names, dimension, and a copy of processing code that transformed the original data to the current state. I believe I can dig that out and take a look as my 2c. I wonder what would be an appropriate metadata schema to be implemented here for data sets? The Schema.org quickly comes to mind. A quick search also gives me DataCite, which appears to be a newer initiative. There is also one from the Federal CIO Council. |
Hi @iqis Thanks for your interest! Looking forward to seeing what you have done already. I had a quick look at the links you provided - I saw ways to describe entire datasets, but I did not see (maybe because I looked too quickly) ways to describe variables within a dataset. |
@ijlyttle, Apparently, it is very hard to get people to agree on things, I just think it is is better to adopt a prevailing schema, use the necessary attributes, and extend upon it, rather than inventing something entirely new. What are your thoughts on this? |
I think this package will revolve around an S3 class,
tbl_stw
that will extend tibble'stbl_df
.There are a few things I have in mind for a
tbl_stw
to do:Reading/writing flat-files:
stw_write(x, name, path)
writes out a csv file and a yaml filestw_read(name, path)
builds atbl_stw
from a csv file and a yaml filestw_colspec(file)
given a yaml-file, generate areadr::read_csv()
column-specificationstw_get_function_read(file)
given a yaml-file, return a function (based onreadr::read_csv()
) that will read a csv file and return atbl_df
Perhaps the metadata "wants" its own (perhaps internal) class:
stw_meta
...Reading/writing package-data:
stw_use_data(x)
wrapsusethis::use_data()
to publish data to a package, also writes out the data-documentationstw_read_data(data, package)
parses the documentation for a package-dataset into atbl_stw
Accessing attributes:
stw_get_dict(x)
returns a tibble for the data-dictionary, containing variables:name
,type
,description
stw_get_title(x)
returns the titlestw_get_synopsis(x)
returns the synopsisstw_get_source(x)
returns the sourceHelpers:
stw_format_gt()
helper function that returns a gt format (?)Building a
tbl_stw
:stw(.data)
constructorstw_title(x, title)
used to provide a title for a datasetstw_source(x, source)
used to note the source of the datasetstw_describe(x, ...)
used to provide a longer description of a dataset, or to build a data-dictionary. Has syntax likedplyr::mutate()
, but used to provide a character description of a column or columns. Maybe named arguments apply to the variables in the data frame, unnamed apply to the dataset itself.stw_validate()
used to make sure that all the columns are described.The text was updated successfully, but these errors were encountered: