Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract appropriate R objects from EML dataTable #6

Closed
cboettig opened this issue Jun 26, 2013 · 3 comments
Closed

Extract appropriate R objects from EML dataTable #6

cboettig opened this issue Jun 26, 2013 · 3 comments
Assignees

Comments

@cboettig
Copy link
Member

Given the EML file defining a CSV and metadata types, extract the R object information. This should allow the user to reconstruct the following R objects from EML generated by #2

 dat = data.frame(river=c("SAC", "SAC", "AM"), 
                   spp = c("king", "king", "ccho"), 
                   stg = c("smolt", "parr", "smolt"),
                   ct =  c(293L, 410L, 210L))

with the following accompanying metadata:

 col_metadata = c(river = "http://dbpedia.org/ontology/River",
                  spp = "http://dbpedia.org/ontology/Species", 
                  stg = "Life history stage",
                  ct = "count")
 unit_metadata = 
  list(river = c(SAC = "The Sacramento River", AM = "The American River"),
       spp = c(king = "King Salmon", ccho = "Coho Salmon"),
       stg = c(parr = "third life stage", smolt = "fourth life stage"),
       ct = "number")

Ensure that all objects have the correct object type: e.g. (ordered) factors should be (ordered) factors, etc.

@cboettig
Copy link
Member Author

Read should be able to handle local files, not just online files. Likewise an issue for eml_write #2

@cboettig
Copy link
Member Author

Bug in date time parsing:

out <- eml_read("http://harvardforest.fas.harvard.edu/data/eml/hf205.xml")
head(out$dataframe)
out$col_metadata
out$unit_metadata

In other reading checks, note that this EML file also arbitrarily declares that the data has 9999 columns, while we see that it has

dim(out$dataframe)[1]
[1] 279224

Issues and next steps

Of course eml_read is still a work in progress. In particular:

(Also note that eml_read still needs a dateTime parser on the unit_metadata....)

  • How do we deal with dateTime when defined across multiple columns?

@cboettig
Copy link
Member Author

  • dateTime bug fixed in 8b7ffee by coercing to character. remaining dateTime issues are now in Dealing with dateTimes: when dates are defined over multiple columns #17
  • The Harvard Forest hf205.xml example still does some wierd things, e.g. the run id numbers (1-6) are encoded as character strings, where a ordered factor might have been more appropriate. Guess we should just obey the metadata we are given...
  • The basic example proposed at the top of this issue does not demonstrate different encodings well, e.g. it uses only a trivial ratio unit (count) and no ordered factors or characters...

@ghost ghost assigned cboettig Jul 1, 2013
cboettig added a commit that referenced this issue Feb 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant