A package for higher-level R metadata extraction #41
Comments
\o/ |
Should we wrap Some example usage:
Is
Just some ideas... |
Should we have hooks that validate certain metadata? |
A place to get started: https://github.com/vsbuffalo/pathfindr fork, branch, code, and be merry! EDIT: the name is terrible, but it's a placeholder. |
Rather topic specific, but in the Aroma Framework we utilize such "meta" data encoded in file and directory names, cf. http://aroma-project.org/docs/HowDataFilesAndDataSetsAreLocated/. It's been in place since 2006. It also support on-the-fly reparsing, e.g. filtering out character sequences without information and reordering etc. The essence of it is in the R.filesets package. It's been a long-term wish to extend this to a generic framework based on regular expressions, but there's never been an urgent need for it. |
How about we use a different data format for genomic data that encodes the metadata within the file. :) I can't really see it happening, but I'm not really joking either. HDF5? |
We were talking about this with @jarrodmillman but I personally don't see that happening unless Illumina (and/or another big player) pushes for it. It would be cool, though... |
This reminds me of two other thoughts I've had in the past, related to what sort of larger package this functionality might fit in. [1] A package that implements some standard unix commands but gives the result in the most sensible R native format and anticipates piping them together with [2] It also feels like R needs a package for operating on files and paths. Google and the hotel wifi are keeping me from pointing to a great example from another language but I know such exist. Something that goes beyond |
E.g. https://docs.python.org/2/library/os.path.html in Python. node.js has a bunch of them, they are pretty basic. I guess this is what you need most of the time. E.g. https://www.npmjs.com/package/fs-extra Edit: also, http://cran.r-project.org/web/packages/pathological/index.html |
I started a direct port of os.path and will push that up to gh - it's mostly just going to be grunt work to get all the bits filled in, so if other people have a need we'd get a naive R version done pretty quickly. |
@richfitz That sounds great. Happy to help fill in bits. |
Basically there should be a higher-level way to extract metadata from directory and filenames. For example:
There's a ton of metadata in this file that we should be able to quickly extract and work with (e.g. in a dataframe). The (initial) goals of this package are:
list.files()
that's way more powerful.The text was updated successfully, but these errors were encountered: