-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract & generate metadata from data objects (e.g. spatial sp
, raster
, etc)
#144
Comments
Sorry about that, quick answer first:
The trick here is that EML permits multiple geographicCoverage elements, so We need a better way to document & let the user discover when an element is a list, but for now, try this. If you use RStudio or have tab-completion enabled in your editor, you can start typing the I know that's not elegant, we'll try and think of a better way (maybe with a higher-level Re Sorry this is confusing, your questions are great. You are just a bit ahead of me still, I'm hoping to get some finished examples and real documentation up soon, and then work out some more Thanks again! |
And just a quick note that many sites and people use the repeating geographic, temporal, and taxonomic coverage elements in EML, so its great to have this support in the R package now. Its common for people to give bounding boxes for all of their sampling sites, as well as specific temporal ranges for when they did sampling, rather than overall boxes, especially when they only sampled a small part of the larger region. |
ok thank you, @cboettig for your patience in answering these questions. the .Data part when using tab complete was throwing me off. So this means the information is stored / accessed via a list. Re: attributeList - i was confused. It makes perfect sense that it defines data table units. I thought for some reason it was a means to index eml elements. My mistake! |
+1 om @mbjones comment above as well! NEON is VERY likely to have those use cases. |
ok - to clarify - to grab the actual coordinate value, I need:
I"m attempting to demonstrate how EML can be ingested and used in an automated workflow thus i need the coordinate values. |
@lwasser yup that'll work. If you don't like eml_HARV@dataset@coverage@geographicCoverage[[1]]@boundingCoordinates@westBoundingCoordinate[[1]] It might be nice for us to hear/think more about the big picture workflow you have in mind too. While a user can always subset in this way, there may be a role for a helper function that could, say, extract all the bounding boxes out of a coverage node like @mbjones describes and return the information in a concise data.frame or |
@cboettig Great - Thank you! i'm still sorting that out but i have some ideas! The first use case that i came up with was, wanting to quickly create a base map of the site location - pulling from an EML. This would be a part of early site exploration where you collect a bunch of base files and need to look at things spatially. In this case, i wanted to plot the extent of the site. SO if there are several coverage elements, it would be good to be able to extract either the x,y point or x,y extent box, convert to a number as.numeric and then plot or use this information for something else. Here is an example in this lesson of that use case where i created a map. http://neoninc.github.io/NEON-Data-Skills-Development/R/why-metadata-are-important/ if the data were spatial - and in .asc or some other format like H5 where the extent may or may not come in automatically, i might use that numeric information to spatially "place" the grid itself! coverage seems like an important element (of course i am biased being a spatial science type :) ) |
@cboettig @lwasser It strikes me that, once the core part of this package is done, it might be really useful to have a little mini-hackathon where EML/R users could propose use cases, and we could review and code solutions to make those use cases for metadata creation straightforward. This would probably really rapidly advance the cause. Towards that end, maybe we should start a use case markdown document, or maybe collect issues that are labeled as 'use_case'. |
I think that's a great idea! It would also allow you to ID the more common use cases of interest to the community! :) |
Definitely! Using the issues tracker for this sounds good to me. This also seems like an area where having some good examples of what can be done and how could be essential (a la the Henry Ford quote, "ask people what they want and they say 'faster horses'"). @mbjones I really like the idea of a hackathon as a way to bridge that gap in connecting what is possible to what needs doing. @lwasser Thanks for the description; that's definitely helpful. Like you say, this highlights an interesting line between what is "data" (read: files described by but external to EML) and what is "metadata" (read: inside an EML doc itself). For instance, the use case: "I have this data file and I want to visualize its extent" is probably a common one, but it may not be obvious what role EML plays in this picture. Best to leverage a standard spatial data format and a standard visualization tool for that data than re-invent the wheel. If we can just automate the map between the EML representation of this information (i.e. in Still, to me the use cases for EML that really shine happen only once we start considering more than one EML file at a time; particularly EML files documenting very different kinds of data. Consider the example @mbjones & team have built with the KNB data repository, where a user can search for a data term and see where on the map the available data comes from; or search for all data files falling within a particular region. I think this kind of use case really highlights the advantage of having, say, spatial coverage described in EML rather than only available in specialized data files, even if those files are generally more standard and compatible with existing visualization & other tools. So I am interested in developing use cases that show this kind of application over a whole repository of data, even when the underlying information described in each of the data files may be very different. Anyway, just brainstorming use cases here. |
This is cool. Let me know how I can help in the brainstorming!
I will be largely offline / fairly busy for the next two weeks (Teaching our spatial lessons in Norway next week!) but then I'll be back online and able to provide input. TO me, the ability to pull out things like "scale factor", spatial extent, etc as usable values is a key use case! And then of course over multiple files / datasets.
How that is implemented - I leave that up to the pros - i.e. You guys :)
|
I like where this is going. I want to voice support for using R to generate EML from the data, leveraging from I strongly feel that unlike in Another often found case was when there were errors in what they provided for the geographicCoverage, and the end result was EML spatial references on the portal that were nowhere near the data. I feel the need to strongly emphasize automatically extracting metadata from the data. morpho_bounding_box <- function(x){
## TODO Check if spatial obj and proj4string is valid first
bb <- x@bbox
# TODO the following is only for southern hemisphere (Oz)
loc <- data.frame(
rbind(
c(NA, round(abs(bb[2,2]), 5), NA),
c(round(bb[1,1], 5), NA, round(bb[1,2],5)),
c(NA, round(abs(bb[2,1]),5), NA)
)
)
# make something to print in the shape Morpho wants it
loc$X2[c(1,3)] <- sprintf("%s S", abs(loc$X2[c(1,3)]))
loc[2,c(1,3)] <- sprintf("%s E", abs(loc[2,c(1,3)]))
return(loc)
} HTH |
@mbjones re #144 (comment) |
The Aussie ropensci unconference is in two weeks and led to this idea re attributes for functions/data that would be "retrievable as first-class objects via some method, or printable" ropensci/auunconf#18 (comment). I wonder if you have thoughts on that? I am not sure if this is compatible with the EML package approach. I won't be able to attend Brisbane in person but instead will try to engage remotely and set aside the two days to work on implementing EML functions into the public health observatory at my university. |
merging this into issue #150 |
sp
, raster
, etc)
closing in favor of 150 |
Hey @cboettig - i'm trying to wrap up a lesson and think you probably know the quick fix to this. I still am confused about accessing slots. I have created, a new, smaller eml file
previously i could access the x,y values using
You removed the resource group component but now this still isn't working.
I have tried get attributes however i'm a bit confused as to how to index that properly to grab an x,y location.
What am I doing wrong?
Thank you!
The text was updated successfully, but these errors were encountered: