-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Case: Research Dataset #11
Comments
Is "Organized by Hierarchy" an implementation detail? Can you provide the motivation for the organizing into this structure? Is this structure a standard for all research datasets? Can you elaborate about "X/Y/Z axis?" An example project structure would be enlightening. |
Organized by hierarchy is an implementation detail, but one that needs support from the model. I don't think these conventions are common enough to warrant modeling -- I'm happy to have a generic Object/Component/File classes and use descriptive metadata to describe the hierarchy. Here's an example component hierarchy:
Each of the lowest-level components would have a file attached. |
@escowles are you saying "X/Y/Z" in the sense that you have a 3D hierarchy with 1:N relationships to each so as to encompass all possible implementations needs? Did that question even make sense? |
@awead Yes, the dataset in question has spatial data and visualizations of it in 3 axes -- we model that in the hierarchy above. But I want to note that I'm not suggesting that we create Ruby classes to model spatial data. We use generic Component classes with titles like "Visualizations of X-axis", "Visualizations of Y-axis", etc. to label the containers of files. |
@escowles So if I follow, each axis is an instance of the abstract Component class? I'm using the term "instance" vaguely here. |
@awead Yes, each axis would be an instance of the Component class, and contain one or more files that were visualizations in that axis. So "Visualizations of X-axis" would be a Component containing the Components "X-axis file 1" and "X-axis file 2". The Component "X-axis file 1" would contain Files of source image (e.g., high-res TIFF), a thumbnail JPEG, etc. |
+1 to this use case, and +1 to the overall model (give or take a 0.1 detail here and there) 👍 |
@awead if it's helpful, here's an example of what @escowles has described in our current system http://library.ucsd.edu/dc/object/bb2322141x |
Can we create a sub-component off of the lowest-level component that has a file? In other words, are we saying that we can have files attached at any level of the graph? Or are we stating that we have a Node and Leaf construct that once a Leaf you can't become a Node? |
@jeremyf I think the model should support nodes with both child nodes and files. At UCSD we don't typically do that, but it seems like a good idea to support it. If we were modelling a filesystem, for example, that would allow both subdirectories and files. |
+1 to any level being able to have associated bitstreams |
👍 @escowles excellent, its different than what we've been working from, but that is an implementation detail (its easier to hide the ability to add a bitstream to a Curate::Work than it would be add that functionality) |
Can someone help translate @escowles's issue into a pull request? Someone with an ICLA? |
I'm seeing some enthusiasm about @escowles 's model. And he said:
Does this mean we're good with the Work (holds descMD) -> GenericFile (holds bitstream, and optionally holds its own file-specific descMD) model, since I believe @escowles has said that maps well to his model? If so, that seems like it'd bring together Sufia, Worthwhile, Curate, and UCSD, and possibly a bunch more of us without introducing a bunch of new concerns and concepts. |
I think this is existing functionality, but want to confirm: Sufia & Worthwhile GenericFiles can link to each other, right? That's the one thing we'd need to encode a hierarchy with a flat set of GenericFiles. |
Sorry, I'm catching up, but what does
mean?
Are you saying a that Work couldn't contain a Work? If I'm correct, I'd be 😞 if after all this we find ourselves back at requiring METS/MODS/FOXML/EAD/whatever (including an RDF rendition thereof) to impose order rather than letting the model itself reflect relationships that are idiomatic to the constituent parts (streams, constituent models) that make up the object. Other/alternative orders or hierarchies, sure, to me that's what descriptive metadata is for, but if there are relationships that are integral to the parts that comprise the whole, I think they should be reflected in the model itself. Apologies if I'm misreading. |
@escowles I'm not sure we've built out the capability to make those links -- unless this is what the recently excised Worthwhile LinkedResources do -- but IMO that is "a small ask" and a reasonable addition to the functionality we already have if that is the cost of making our current model to UCSD-compliant! |
@jpstroop I'm not saying that it's inconceivable that a Work contain another Work -- I'm just not sure how many of our IR-like use cases require this functionality currently. As far as the first phase of Hydra::Works goes, I'm in favor of restricting the scope to IR-like use cases and solving for the 80%. So if we have those use cases, and they seem like commonly needed use cases, let's include Works containing Works in the initial model. If not, I might suggest we defer to the next phase, once we've got a common model for the Sufia/Worthwhile/Curate apps out there. |
(Alternatively, I think it may also be OK to allow this (Works containing Works) in the model we develop if we also provide some guidance on how implementations like ScholarSphere might avoid/ignore/hide/disallow this complexity.) |
If the intent is only to solve the very basic single list of files associated in an unordered set, please let's rename it far far away from Work or any of the other terms that imply there's a data model behind it. I suggest Hydra::BasicGroupOfFiles |
Well...I have a PR in for one use case (or four, depending on how you look at it), all of which we've made a dog's dinner of w/ METS (valid XML != good modeling; I could show you but it would burn your eyes. 🔥 😎).
Absolutely! @jcoyne said the same thing here. Maybe there's Hydra::IRWork model that extends Hydra::Work to include validations (or whatever the best approach is) the keep it from ever including a Work. |
I don't think DigitalObjectSlashWorkSlashWhatever -> GenericFile -> bitstream is just a single unordered set of files.
IMHO, this is not just simpler than having infinite recursion of GenericFiles/Components, it's also more flexible since it can express relationships other than containment. |
@azaroth42 I was thinking the intent, based on what I was hearing at the Sufia Futures discussion on Friday, was to come up with a model that can underlie Hydrus-based, Sufia-based, Worthwhile-based, and Curate-based apps. Whether we call it a Work or a BasicGroupOfFiles or an IRWork, how good of a fit do you judge this for Hydrus's needs? |
What @escowles said was more articulate and more succinct than what I was saying. |
Would the DigitalObjectSlashWorkSlashWhatever -> GenericFile -> bitstream approach mean you couldn't use the AF API to manage those 'more flexible' relationships? |
I would need to defer to other Stanford folk on the appropriateness for Hydrus. Once there's a proposal, I'm happy to take it back and discuss with them :) |
@azaroth42 Fair enough! @jpstroop I would think those relationships would be manageable via AF but I defer to folks whose heads are in the code more frequently than mine. @escowles @jcoyne etc. |
@escowles, 👍 to ordered list ontology. I'm also assuming that these aspects would be baked-in to the model but easy to ignore of you weren't worried about order. Also, might sort fields be implementation-specific? @mjgiarlo yeah, this shouldn't be hard to map, although we'll mint a bunch more pids to create the additional "works" for each existing GenericFile. |
@awead If we decide to make use of the Batch objects we already have in the system such that every Batch of GenericFiles is a Work -- not saying we should, but it's one migration decision we could make -- we may also need to create Components to hold descriptive metadata about Files. (Since in the @escowles model, a File object cannot hold descMD.) Still pretty easy to map. |
@awead I'm not sure about the mechanics, but I've heard use cases in #17 and #18 for both Sets (unordered, non-duplicated) and Arrays (ordered, duplication allowed). So maybe the default collection is one of those, and there's a subclass that overrides the members to use the other. So if you don't care about ordering, you'd just need to use the right class and then the members would be an unordered set. |
I completely agree with this, however, we often have administrative documents that we stash with collections, and I'd like to be able to know which collection those go with--like a canonical collection that a Wortem has_one of. Maybe this suggests that we need a Projgroup (@escowles could probably think of a better name 😄) model that can hold those. Is this what Stanford folks (and maybe others) call an APO? This may be out of scope for this discussion and something we'd just need to refine locally, but I thought I'd mention it. |
We'd like to move away from the current conflation between APO (as a permissions holding thing) and Collection (as a structural thing). |
+1 on @awead 's assertions. I am still interested in this notion of a LinkedResource and whether it is descriptive metadata or an object inside of a Work? |
Updated diagram per agreement on nomenclature in #8: |
@escowles GenericWork -> GenericComponent and GenericComponent -> GenericComponent are 0:m, no? |
I interpreted 1:m to mean that GenericComponents have one and only one GenericWork, and GenericWorks have many (zero, one, or more) GenericComponents. @jpstroop @escowles I was thining of Rails has_many here: you include a has_many assertion in your model but that doesn't mean your object doesn't validate if you don't have one, right? |
Me too, I think: |
Also: |
Yup. |
For the use case of a collection having a Thumbnail, which may not be a derivative of any particular page but instead a filmstrip of multiple pages ... would that require GenericCollection to have at least one GenericFile? |
My instinct is to leave that up to extending GenericCollection...things start to look too similar otherwise, and assuming the image is somewhere else in your GenericCollection graph, you might store a pointer instead anyway, right? |
👍 to @jpstroop |
Could the descriptive metadata of the collection include an assertion that covers this? E.g., maybe I create a new GenericComponent with a GenericFile (xyz123) that is the filmstrip derivative, and then assert collection_id :hasRepresentativeImage xyz123. (That was very rough and simplistic but I think you catch my 💨 ) |
Then Collections would need to contain Components as well as Works? The filmstrip isn't a Work /within/ the collection, it's a derivative created from the member Works. So long as it's not prevented, then fine, it can be a NotSoGenericCollection, but just throwing it out there. |
I'm glad you threw it out there, @azaroth42. So, for @escowles et al., the diagram does not connect GenericCollection with a GenericFile. Should we interpret that as Hydra::Works asserting that a GenericCollection can not contain a GenericFile, or that Hydra::Works remains silent on everything about GenericCollections except that GenericWorks may have a many-to-many relationship with them? Or something else? |
Without wanting to get too specific about impl, it seems like having a GenericFile is a concern that could be mixed in. GColletion wouldn't do it, but RobCollection < GColletion might, and then at least the code for concern would be reusable/behave in an expected way. |
🚧 it! (Sorry, didn't find a shovel emoji. ;) ) |
I definitely agree with @jpstroop that you could extend GenericCollection to add whatever links to GenericFiles or whatever you wanted. But having a preview image seems like a common-enough use case that we should at least try to come up with a standard way of doing it. In our discussions at UCSD, we had planned on making a subproperty of dc:relation called something like ucsd:thumbnail that would link to the thumbnail image URL (which could be repository URL, or could be an image on a generic webserver, depending on the collection). Either way, this seems related to @scherztc's RelatedResource to me -- basically a typed link to a URL. We had a similar data structure in our old data model, but decided to simplify that to about 10 predicates, since we found that all of our related resources boiled down to those. |
+1 on these relationships: A GenericWork has 0..n GenericFiles This would cover the RelatedResource option for a GenericWork. Here is our code that we used in Curate to cover the preview image on a GenericCollection: |
Getting back to @jpstroop's comment about special collection-type objects where a Work can be related to only one of them: We have this at UCSD too. In our old data model, we had a special class for this, and used it for the top-level browse, driving access control groups, etc. Our plan was to get rid of them in our new data model and just use GenericCollection for representing them. Maybe we could create a GenericCollection subclass called AdminCollection where each Work belongs_to one AdminCollection? Does anybody else have this kind of relationship? |
Your AdminCollection idea sounds pretty similar to the notion of Administrative Sets described here: https://wiki.duraspace.org/display/hydra/Collections%2C+Admin+Sets%2C+Display+Sets Also to the notion of an APO which is written about here: https://wiki.duraspace.org/pages/viewpage.action?pageId=64325483 |
Did the lessons & ideas from this thread make it into some other documentation or specs? Can we close this ticket? |
Yes, I think the info in this thread informed the discussions in Portland and the subsequent documentation, so this ticket can be closed. |
A research dataset containing a set of files organized into top-level categories of preparatory materials, raw data files, statistics, and visualization images, with multiple files in each category. The visualization images are further organized in a hierarchy by type, and then by X/Y/Z axis.
The text was updated successfully, but these errors were encountered: