Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement simple metadata format #7

Closed
n8fr8 opened this issue Oct 25, 2018 · 16 comments
Closed

implement simple metadata format #7

n8fr8 opened this issue Oct 25, 2018 · 16 comments
Assignees
Labels
For Testing feature for testing and review

Comments

@n8fr8
Copy link
Member

n8fr8 commented Oct 25, 2018

when OA uploads a file over WebDAV, it needs to also include a second file that contains metadata.

For OA Android, we've started by just exporting a JSON file from the fields of the Media model class we use to persist data in the app. These fields map to the user interface form that a user can edit when they add a new media file into OA.

Here is an example:

{"author":"Nathan F.","createDate":"Oct 17, 2018 4:30:01 PM","description":"aren\u0027t they so orange","licenseUrl":"https://creativecommons.org/licenses/by/4.0/","location":"pumpkin patch","mediaHash":[],"mimeType":"image/jpeg","originalFilePath":"content://com.android.providers.media.documents/document/image%3A66971","serverUrl":"https://cloud.guardianproject.info/remote.php/dav/files/n8fr8/OpenArchive/hcd5-kids+love+pumpkins.jpg/jqnr-kids+love+pumpkins.jpg","status":3,"tags":"autumn;;Halloween","title":"kids love pumpkins","id":15}

This clearly has some android specific fields, but the key ones are:

author, title, createDate (based on file system on the device), description, license (CC etc), location, tags

@foundscapes should chime in here, as well, about what we think are the essential fields, if there are more.

As you can see my metadata json is about as simple as can get, and really is meant as a simple way to capture data from OA in an intermediate format.

@n8fr8
Copy link
Member Author

n8fr8 commented Nov 14, 2018

Now looking into PBCore http://pbcore.org/

@n8fr8 n8fr8 added the help wanted Extra attention is needed label Nov 14, 2018
@foundscapes
Copy link
Contributor

foundscapes commented Nov 14, 2018

@n8fr8 these fields (author, title, createDate (based on file system on the device), description, license (CC etc), location, tags) look good, but we need to add: Country (researcher is located in), Subject (interviewee), researcher (first, last names/pseudonym) - HRW needs the filename to be formatted like this: 2018-10-29_Country_Subject_ResearcherFirstName_LastName

@n8fr8
Copy link
Member Author

n8fr8 commented Nov 14, 2018

2018-10-18_USA_kids+love+pumpkins_Nathan_F

@n8fr8
Copy link
Member Author

n8fr8 commented Nov 21, 2018

@n8fr8
Copy link
Member Author

n8fr8 commented Nov 21, 2018

We should (perhaps) mirror the bucket format from Internet Archive... maybe even the metadata XML!

https://archive.org/download/CanadaEh

CanadaEh_archive.torrent 18-Mar-2017 16:23 1.6K
CanadaEh_files.xml 18-Mar-2017 16:23 1.4K
CanadaEh_meta.xml 01-Jan-2017 08:40 808.0B
canadaehlogo.png 18-Mar-2017 16:22 34.9K
canadaehlogo_thumb.jpg 18-Mar-2017 16:22 5.7K

@tladesignz
Copy link
Contributor

I wouldn't go so far as to create the thumbnails, too. :-)

Regarding the metadata: I think it's totally fine to use our own (JSON?) format. It's just, that we maybe should keep an eye on what metadata item Internet Archive collects and how it matches to what we have and want.

@n8fr8
Copy link
Member Author

n8fr8 commented Nov 21, 2018

There is still the question of renaming the files themselves using our new naming convention, or just the they are stored in. In the case of Archive.org, the bucket is named using the title "slug", in the example of "CanadaEh", while the uploaded file(s) retained their original name "canadaehlogo.png".

@tladesignz
Copy link
Contributor

tladesignz commented Nov 29, 2018

Ok, I looked at the doc and sent you a pull request for markdown improvements.

Here are my comments/questions:

  • LOCATION: For our purposes, Country is the most logical, but any location related data provide
    that can be encoded appropriately (not GPS coordinates), would be useful here.

Why not GPS coordinates? Or only not in the filename? But if we have GPS coordinates in the metadata (which seems very viable to me, e.g. for locations far out in a desert or something), what to put in the filename? Just nothing?

.
+-- Collection-DATE_LOCATION_TILE_SUBMITTER_FLAG/
    +-- Collection-DATE_LOCATION_TILE_SUBMITTER_WARNING.meta.json
    +-- Entry-DATE_LOCATION_TITLE_SUBMITTER_FLAG/
        +-- DATE_LOCATION_TITLE_SUBMITTER_FLAG.jpg
        +-- DATE_LOCATION_TITLE_SUBMITTER_FLAG.jpg.meta.json

Why a subdirectory for every entry? That seems very uneconomic, since there are only ever going to be 2 files in it, which, when name-sorted, will always stick together. Also, this won't help in matching Internet Archive's buckets.

  1. BTW - Internet Archive: How to match their metadata format? Our "collection" maps to their "item". And an item has metadata. Most of the metadata we have, maps to theirs, and what not can still be added as custom metadata. (See https://internetarchive.readthedocs.io/en/latest/metadata.html)

Shouldn't we use that instead of our own additional file when uploading to IA?

  1. File naming: Wouldn't it be enough to have all the metadata stuff in the collection directory's name? Do you expect that metadata to change that much over the files in a collection?

  2. Wouldn't it be possible to have only 1 metadata file per collection? I would argue, that if multiple files in a collection have a different submitter, title, location and date, they should maybe not be in the same collection? That way, we could also show only 1 scene per collection for metadata which is shared accross all the files in the collection, which would automatically nudge the user to create different collections for such disparate content. And the issue of having to repeat the same metadata over and over would be gone.

@n8fr8
Copy link
Member Author

n8fr8 commented Jan 9, 2019

  1. Sure, if GPS coordinates are desired, sure they can be used. I think I meant to say this is more a human readable value, ideally, as opposed to data

  2. Each item in a collection is equivalent to a bucket. A folder = bucker

3, 4 5. I will think about this more!

@foundscapes
Copy link
Contributor

Waiting on you @n8fr8

@n8fr8
Copy link
Member Author

n8fr8 commented Jul 9, 2019

If a media item is flagged in the client UI, then media file (jpg, mp4, etc) and the associated metadata (json) should be uploaded into a subfolder within the current upload batch folder called "flagged".

@n8fr8
Copy link
Member Author

n8fr8 commented Jul 9, 2019

Otherwise, what you have implemented now is fine, as long as the .meta.json contains all the fields entered into the app for each item.

I will update the spec document.

@n8fr8
Copy link
Member Author

n8fr8 commented Jul 9, 2019

@tladesignz
Copy link
Contributor

Implemented as specified.

Please note:

The flag is named inside the app as "SIGNIFICANT CONTENT" and is used as a tag. That takes shows up in in the .meta.json file.

That is different than "FLAGGED".

@n8fr8 n8fr8 added For Testing feature for testing and review and removed help wanted Extra attention is needed labels Jul 10, 2019
@foundscapes
Copy link
Contributor

Change subfolder on NextCloud to read "Significant Content"

tladesignz added a commit that referenced this issue Jul 24, 2019
@tladesignz tladesignz removed their assignment Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
For Testing feature for testing and review
Projects
None yet
Development

No branches or pull requests

3 participants