implement simple metadata format #7

n8fr8 · 2018-10-25T20:55:31Z

when OA uploads a file over WebDAV, it needs to also include a second file that contains metadata.

For OA Android, we've started by just exporting a JSON file from the fields of the Media model class we use to persist data in the app. These fields map to the user interface form that a user can edit when they add a new media file into OA.

Here is an example:

{"author":"Nathan F.","createDate":"Oct 17, 2018 4:30:01 PM","description":"aren\u0027t they so orange","licenseUrl":"https://creativecommons.org/licenses/by/4.0/","location":"pumpkin patch","mediaHash":[],"mimeType":"image/jpeg","originalFilePath":"content://com.android.providers.media.documents/document/image%3A66971","serverUrl":"https://cloud.guardianproject.info/remote.php/dav/files/n8fr8/OpenArchive/hcd5-kids+love+pumpkins.jpg/jqnr-kids+love+pumpkins.jpg","status":3,"tags":"autumn;;Halloween","title":"kids love pumpkins","id":15}

This clearly has some android specific fields, but the key ones are:

author, title, createDate (based on file system on the device), description, license (CC etc), location, tags

@foundscapes should chime in here, as well, about what we think are the essential fields, if there are more.

As you can see my metadata json is about as simple as can get, and really is meant as a simple way to capture data from OA in an intermediate format.

n8fr8 · 2018-11-14T15:31:38Z

Now looking into PBCore http://pbcore.org/

n8fr8 · 2018-11-14T15:37:35Z

Some samples here:
https://github.com/WGBH/PBCore_2.1/blob/master/example_records/simple_description_document.xml
https://github.com/WGBH/PBCore_2.1/blob/master/example_records/pbcore_digital_preservation.xml
https://github.com/WGBH/PBCore_2.1/blob/master/example_records/pbcore_asset_management.xml

Also a nice format for a collection:
https://github.com/WGBH/PBCore_2.1/blob/master/example_records/pbcore_collection.xml

foundscapes · 2018-11-14T17:06:46Z

@n8fr8 these fields (author, title, createDate (based on file system on the device), description, license (CC etc), location, tags) look good, but we need to add: Country (researcher is located in), Subject (interviewee), researcher (first, last names/pseudonym) - HRW needs the filename to be formatted like this: 2018-10-29_Country_Subject_ResearcherFirstName_LastName

n8fr8 · 2018-11-14T17:28:39Z

2018-10-18_USA_kids+love+pumpkins_Nathan_F

n8fr8 · 2018-11-21T19:43:06Z

This is now being documented here: https://github.com/OpenArchive/openarchive-android/blob/master/docs/OpenArchiveSpaceCapsuleSpec.md

n8fr8 · 2018-11-21T20:15:39Z

We should (perhaps) mirror the bucket format from Internet Archive... maybe even the metadata XML!

https://archive.org/download/CanadaEh

CanadaEh_archive.torrent	18-Mar-2017 16:23	1.6K
CanadaEh_files.xml	18-Mar-2017 16:23	1.4K
CanadaEh_meta.xml	01-Jan-2017 08:40	808.0B
canadaehlogo.png	18-Mar-2017 16:22	34.9K
canadaehlogo_thumb.jpg	18-Mar-2017 16:22	5.7K

tladesignz · 2018-11-21T20:24:48Z

I wouldn't go so far as to create the thumbnails, too. :-)

Regarding the metadata: I think it's totally fine to use our own (JSON?) format. It's just, that we maybe should keep an eye on what metadata item Internet Archive collects and how it matches to what we have and want.

n8fr8 · 2018-11-21T20:46:31Z

There is still the question of renaming the files themselves using our new naming convention, or just the they are stored in. In the case of Archive.org, the bucket is named using the title "slug", in the example of "CanadaEh", while the uploaded file(s) retained their original name "canadaehlogo.png".

tladesignz · 2018-11-29T14:53:11Z

Ok, I looked at the doc and sent you a pull request for markdown improvements.

Here are my comments/questions:

LOCATION: For our purposes, Country is the most logical, but any location related data provide
that can be encoded appropriately (not GPS coordinates), would be useful here.

Why not GPS coordinates? Or only not in the filename? But if we have GPS coordinates in the metadata (which seems very viable to me, e.g. for locations far out in a desert or something), what to put in the filename? Just nothing?

.
+-- Collection-DATE_LOCATION_TILE_SUBMITTER_FLAG/
    +-- Collection-DATE_LOCATION_TILE_SUBMITTER_WARNING.meta.json
    +-- Entry-DATE_LOCATION_TITLE_SUBMITTER_FLAG/
        +-- DATE_LOCATION_TITLE_SUBMITTER_FLAG.jpg
        +-- DATE_LOCATION_TITLE_SUBMITTER_FLAG.jpg.meta.json

Why a subdirectory for every entry? That seems very uneconomic, since there are only ever going to be 2 files in it, which, when name-sorted, will always stick together. Also, this won't help in matching Internet Archive's buckets.

BTW - Internet Archive: How to match their metadata format? Our "collection" maps to their "item". And an item has metadata. Most of the metadata we have, maps to theirs, and what not can still be added as custom metadata. (See https://internetarchive.readthedocs.io/en/latest/metadata.html)

Shouldn't we use that instead of our own additional file when uploading to IA?

File naming: Wouldn't it be enough to have all the metadata stuff in the collection directory's name? Do you expect that metadata to change that much over the files in a collection?
Wouldn't it be possible to have only 1 metadata file per collection? I would argue, that if multiple files in a collection have a different submitter, title, location and date, they should maybe not be in the same collection? That way, we could also show only 1 scene per collection for metadata which is shared accross all the files in the collection, which would automatically nudge the user to create different collections for such disparate content. And the issue of having to repeat the same metadata over and over would be gone.

n8fr8 · 2019-01-09T20:50:45Z

Sure, if GPS coordinates are desired, sure they can be used. I think I meant to say this is more a human readable value, ideally, as opposed to data
Each item in a collection is equivalent to a bucket. A folder = bucker

3, 4 5. I will think about this more!

foundscapes · 2019-07-02T19:39:15Z

Waiting on you @n8fr8

n8fr8 · 2019-07-09T18:51:22Z

If a media item is flagged in the client UI, then media file (jpg, mp4, etc) and the associated metadata (json) should be uploaded into a subfolder within the current upload batch folder called "flagged".

n8fr8 · 2019-07-09T18:52:41Z

Otherwise, what you have implemented now is fine, as long as the .meta.json contains all the fields entered into the app for each item.

I will update the spec document.

n8fr8 · 2019-07-09T19:05:45Z

I have updated the spec document herE: https://github.com/OpenArchive/openarchive-android/blob/master/docs/OpenArchiveSpaceCapsuleSpec.md

…lder.

tladesignz · 2019-07-10T16:24:39Z

Implemented as specified.

Please note:

The flag is named inside the app as "SIGNIFICANT CONTENT" and is used as a tag. That takes shows up in in the .meta.json file.

That is different than "FLAGGED".

foundscapes · 2019-07-24T19:30:49Z

Change subfolder on NextCloud to read "Significant Content"

…ags and folder name.

n8fr8 assigned tladesignz and foundscapes Nov 14, 2018

n8fr8 added the help wanted Extra attention is needed label Nov 14, 2018

n8fr8 added this to the UX Implementation: Sprint 1 milestone Dec 19, 2018

n8fr8 mentioned this issue Jan 9, 2019

File overwrite when uploading two files w/ same name from multiple devices #28

Closed

tladesignz added a commit that referenced this issue Feb 11, 2019

Towards #7: Store metadata alongside original file on WebDav spaces.

7c080da

n8fr8 modified the milestones: Dev Sprint 1, Dev Sprint 2, Dev Sprint 3 Mar 6, 2019

tladesignz mentioned this issue Apr 10, 2019

Flagged items not routing into subfolder #71

Closed

n8fr8 modified the milestones: Dev Sprint 3, Dev Sprint 4 Apr 24, 2019

foundscapes modified the milestones: Dev Sprint 4, Dev Sprint 5 May 15, 2019

tladesignz added a commit that referenced this issue Jul 10, 2019

Issue #7: Store flagged files in subfolder "FLAGGED" of collection fo…

f11daac

…lder.

n8fr8 added For Testing feature for testing and review and removed help wanted Extra attention is needed labels Jul 10, 2019

tladesignz added a commit that referenced this issue Jul 24, 2019

Issue #7: Streamlined "Significant Content" aka. "flagged" usage in t…

1f1f467

…ags and folder name.

tladesignz closed this as completed Jul 24, 2019

tladesignz removed their assignment Jul 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement simple metadata format #7

implement simple metadata format #7

n8fr8 commented Oct 25, 2018

n8fr8 commented Nov 14, 2018

n8fr8 commented Nov 14, 2018

foundscapes commented Nov 14, 2018 •

edited

Loading

n8fr8 commented Nov 14, 2018

n8fr8 commented Nov 21, 2018

n8fr8 commented Nov 21, 2018

tladesignz commented Nov 21, 2018

n8fr8 commented Nov 21, 2018

tladesignz commented Nov 29, 2018 •

edited

Loading

n8fr8 commented Jan 9, 2019

foundscapes commented Jul 2, 2019

n8fr8 commented Jul 9, 2019

n8fr8 commented Jul 9, 2019

n8fr8 commented Jul 9, 2019

tladesignz commented Jul 10, 2019

foundscapes commented Jul 24, 2019

implement simple metadata format #7

implement simple metadata format #7

Comments

n8fr8 commented Oct 25, 2018

n8fr8 commented Nov 14, 2018

n8fr8 commented Nov 14, 2018

foundscapes commented Nov 14, 2018 • edited Loading

n8fr8 commented Nov 14, 2018

n8fr8 commented Nov 21, 2018

n8fr8 commented Nov 21, 2018

tladesignz commented Nov 21, 2018

n8fr8 commented Nov 21, 2018

tladesignz commented Nov 29, 2018 • edited Loading

n8fr8 commented Jan 9, 2019

foundscapes commented Jul 2, 2019

n8fr8 commented Jul 9, 2019

n8fr8 commented Jul 9, 2019

n8fr8 commented Jul 9, 2019

tladesignz commented Jul 10, 2019

foundscapes commented Jul 24, 2019

foundscapes commented Nov 14, 2018 •

edited

Loading

tladesignz commented Nov 29, 2018 •

edited

Loading