Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP


Normalize dates in extracted metadata from binaries #71

ryangrimm opened this Issue · 3 comments

3 participants


Many binary formats include something along the lines of a creation date or a modification date. These dates can be under various names for various file formats. In order to support various queries and range indexes on this metadata, normalizing these dates into xs:dateTime values would be required.

To do so, the current plan is to attempt to parse the value of any piece of metadata that has date or time in its name. The parsing can be accomplished via the date parser that's already in use. New formats can easily be added if need be.


Will want to normalize the element names as well. Content extracted from PDF ends up with corona:modDate while Word ends up with corona:lastSavedDate (which I believe are conceptually the same thing). I did a quick inventory of a half dozen other formats and that's the main one I saw.


Normalizing last modification metadata to a corona:modDate element. Also running any piece of metadata that has "date" in the name through the date parser. If a date is extracted it's stored in a normalized-date attribute.

@ryangrimm ryangrimm closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.