Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start planning on how to structure the CSV to account for complex objects #56

Closed
mjordan opened this issue Aug 19, 2019 · 7 comments
Closed
Labels
enhancement New feature or request question Further information is requested

Comments

@mjordan
Copy link
Owner

mjordan commented Aug 19, 2019

The paged content sprint for Islandora 8 is first half of September. Workbench should be ready to ingest paged and compound objects soon after the sprint is done. The sprint will determine how the relationship between the parent and its children (and the ordering of children) will be instantiated, but likely, given discussion so far, it will be via one or more node fields on each child pointing to the parent.

Assuming that the parent<->child relationship will be expressed in node fields, Workbench can handle populating those fields if it can translate that relationship from the structure of the metadata CSV file. For example, a structure like this for non-complex images:

id,file,title,description
001,image1.jgp,Title 1,First image
002,image2.jpg,Title 2,Second image

could be expanded to something like this for compound objects, where the parent field contains the id of the object's parent:

id,parent_id,file,title,description
001,,,Postcard 1,The first postcard
003,001,front.jpg,Front of postcard 1,The first postcard's front
004,001,back.jpg,Back of postcard 1,The first postcard's back
002,,,Postcard 2,The second postcard
006,002,front2.jpg,Front of postcard 2,The second postcard's front
007,002,back2.jpg,Back of postcard 2,The second postcard's back

An important ability will be to include parent and child-level metadata in the same CSV file, so they can be ingested during the same task.

@mjordan mjordan added enhancement New feature or request question Further information is requested labels Aug 19, 2019
@mjordan
Copy link
Owner Author

mjordan commented Aug 19, 2019

It would be useful to not have to have a row for each page in a newspaper issue, for example. In that case, all page files could be grouped into a directory and their order expressed in their filenames. However, in that case, we'd need some sort of way to define a minimal set of field values such as title and identifier; in other words, if these values are not in the CSV file, how do we derive them when populating the child nodes?

@seth-shaw-unlv
Copy link
Contributor

seth-shaw-unlv commented Aug 19, 2019

The method you first describe is what I had in mind (and what I was thinking of when I posted issue #18).

As to the "use a directory" method: that is one of the ways CONTENTdm deals with compound objects: each compound object has a directory with the parent record's identifier. Each file in that directory corresponds to a child object with the only metadata being the file name (sans extension). E.g.

root/
  | - 001/ (Parent record has the identifier "001")
  |     | - 001_001.tif (Child record of "001" with identifier "001_001")
  |     | - 001_002.tif (Child record of "001" with identifier "001_002")  
  |     | - 001_003.tif (Child record of "001" with identifier "001_003")  
  |
  | - 002/ (Parent record has the identifier "002")
  |     | - 002_recto.tif (Child record of "002" with identifier "002_recto")
  |     | - 002_verso.tif (Child record of "002" with identifier "002_verso")
  |
  | - 003.tif (Simple object with the identifier "003")

@mjordan
Copy link
Owner Author

mjordan commented Aug 19, 2019

@seth-shaw-unlv yes, I think we can support both "with metadata" and "directory" methods.

mjordan added a commit that referenced this issue Mar 29, 2020
mjordan added a commit that referenced this issue Mar 29, 2020
mjordan added a commit that referenced this issue Mar 29, 2020
@mjordan
Copy link
Owner Author

mjordan commented Mar 29, 2020

@seth-shaw-unlv I've got a working implementation of the first method described above in the "issue-56" branch. How it works is described in the "Creating paged content" section of that branch's README. Can you take a look to see if it's clear?

@seth-shaw-unlv
Copy link
Contributor

@mjordan, your description looks good to me; although I may not be the best test-subject.

@mjordan
Copy link
Owner Author

mjordan commented Mar 30, 2020

Thanks for looking. The required fields in the CSV are pretty standard, so they shouldn't pose any problems for the spreadsheet editor.

No sweat on the testing, I included an integration test that passes. I think I'll merge so that branch doesn't get stale.

@mjordan
Copy link
Owner Author

mjordan commented Apr 5, 2020

As of 2e94036, Workbench support both of the methods for creating paged content described above. Closing for now, we can reopen this one or new issues as needed.

@mjordan mjordan closed this as completed Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants