Add new Document and DocumentPermission types #1048

Closed
vholland opened this Issue Mar 21, 2016 · 22 comments

Projects

None yet

6 participants

@vholland
Contributor
vholland commented Mar 21, 2016 edited

Documents often get shared over email, including access rights (read, write, comment) for the document.

I propose adding a new DigitalDocument (originally Document) type with the following subtypes:

  • SpreadsheetDigitalDocument
  • PresentationDigitalDocument
  • TextDigitalDocument
  • NoteDigitalDocument

To describe the permissions, a new DigitalDocumentPermission type would have the following properties:

  • grantee
  • permissionType

See pull request #1049 for en example.

Note: this description updated by @danbri to reflect naming changes agreed below (SpreadsheetDigitalDocument was SpreadsheetDocument, etc.).

@vholland vholland self-assigned this Mar 21, 2016
@Dataliberate
Contributor

Document has much broader generally accepted meaning than a [file] document that this proposal seems to be focused upon.

Libraries, archives, museums, etc. have shelving full of physical Documents.

Potentially there are many more document types than Presentation, Text, Spreadsheet and Note. Would not a documentType property be more flexible than individual subtypes? Types such as Government Report, Pamphlet, Medical Document, Legal Document, etc. could then be accommodated in this basic structure.

As a minimum physical Documents should be recognised in an example.

@danbri
Contributor
danbri commented Mar 22, 2016

Is /Users/danbri/Desktop/my_spreadsheet1.xls a SpreadsheetDocument ? Could it have DocumentPermission properties? Would those describe filesystem access to the bytes, file-format related access mechanisms e.g. who has the password (see http://askubuntu.com/questions/223153/decrypting-a-password-protected-libreoffice-calc-ods-file-forgotten-password etc., haven't found the relevant list of standards yet); or they might just describe who 'should' have access, or who has access via some Web service or Cloud API.

I share @RichardWallis's concern about the scope of "Document". Currently people come to schema.org looking for document, poke around then realise that the closest match is really CreativeWork. If we are to subset CreativeWork and use up such a major term ("Document") we will need to be pretty clear what is included and what is excluded. Especially if we are touching on access control issues.

Are the Document types proposed here essentially cloud-hosted online things? or is the intent to cover free-floating standalone office-style files too?

@jasondouglas

We can find better names, but I don't see what's controversial about this proposal. File may be better than Document because in this day and age I think it implies digital more than document does. DigitalFile seems silly and PresentationDigitalFile even sillier.

@Dataliberate
Contributor

@jasondouglas If the scope is constrained, by appropriate naming, to spreadsheets, wp documents, etc. I also see that in itself this proposal is not controversial.

However as you imply, better naming is key - 'Document' as well as 'File' have far broader meaning than this currently aims to satisfy.

Yes PresentationDigitalFile is very silly. How about something like ApplicationFile?

@jasondouglas

Would the subclass still be SpreadsheetFile vs. SpreadsheetApplicationFile?

On Tue, Mar 22, 2016 at 10:28 AM Richard Wallis notifications@github.com
wrote:

@jasondouglas https://github.com/jasondouglas If the scope is
constrained, by appropriate naming, to spreadsheets, wp documents, etc. I
also see that in itself this proposal is not controversial.

However as you imply, better naming is key - 'Document' as well as 'File'
have far broader meaning than this currently aims to satisfy.

Yes PresentationDigitalFile is very silly. How about something like
ApplicationFile?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1048 (comment)

@danbri
Contributor
danbri commented Mar 22, 2016

Ok so we clarified that we are talking about classes of digital file (rather than e.g. a physical signed paper document in a cabinet somewhere). Are we covering equally the case of "the .xls file is attached" and the case of "go to this Web or app UI url to edit the doc"? the permissions piece seems more relevant to the latter

@vholland
Contributor

It could be used for an attached .xls file, but the permissions part only makes sense in a cloud environment where I need access rights.

@danbri
Contributor
danbri commented Mar 23, 2016

@vholland perhaps we could make that limited scope clearer?

Meanwhile on the access control front, I don't think "grantee" as proposed says quite what you want it to. Perhaps directly using email accounts rather than indirecting through "person" will come closer.

Consider a scenario in which I (underspecified here but let's say via my work email, danbri@google.com), share a doc to you:

<script type="application/ld+json">
 {
   "@type": "Document",
   "name": "2016 Plans and Secrets (rough draft)",
   "author": "Dan Brickley",
   "hasDocumentPermission": [
     {
       "@type": "DocumentPermissionType",
       "permissionType": "http://schema.org/ReadPermission",
       "grantee": {
         "@type": "Person",
         "email": "vtardif@google.com"
       }
     }
   ]
 }
 </script>

On the basis of the proposal for 'grantee' ('The person, organization, or audience that has been granted this permission.') there's no distinction between it having been shared to your work (email-identified) account, versus your personal one. This isn't obvious since the Person is described only by their work email address, but the definition of the property implies that this solely serves to identify the person.

Since in practice sharing is most often account-based rather than person-based, maybe grantee should also allow a way of making that distinction?

Since our current property for 'email' takes simple strings, I supposed we'd need to use http://schema.org/ContactPoint - so my suggestion is to allow ContactPoint as values for 'grantee' to allow clearer description of the manner in which the permission has been granted.

@vholland
Contributor

Adding ContactPoint sounds good to me.

@danbri
Contributor
danbri commented Mar 23, 2016

ping @chaals w.r.t. https://disk.yandex.com/download/#pc - can you review?

@vholland
Contributor

@jasondouglas suggested File as an alternative to Document. @RichardWallis can you live with that? DigitalFile feels oddly verbose.

@RichardWallis
Contributor

I agree it does feel verbose, but unfortunately File is a very generic term that has different meanings to several communities. To archivists and legal folks it can be a pyhisical container of documents or an action in a legal process.

So I think that some qualification of the type of file is needed in the name.

@vholland
Contributor
vholland commented Apr 1, 2016

I agree it is generic, but in the realm of schema.org and its authors, we experience electronic files of this type far more frequently than the archival and legal use cases.

@RichardWallis
Contributor

It maybe true that in the [current] realm of schema.org and its authors, we experience electronic files of this type far more frequently than the archival and legal use cases.

However there are whole domains (including libraries, archives, museums) that have physical files as one component of the resources they share with the world. OK some of them maybe be a little behind the curve in drinking the Schema.org kool aid as to how to share their metadata about them, but it is likely they will join us in the end.

If the proposal ends up defining:

  • SpreadsheetFile
  • PresentationFile
  • TextFile
  • NoteFile

I don't see much of a problem in having DigitalFile as their superclass.

That would leave things free for future DigitalFile subtypes, and the ability to create an equivalent set of types for physical files.

Equally for the legal action use case, if it arose, FileAction would satisfy that need.

@danbri
Contributor
danbri commented Apr 5, 2016

I find "File" awkward since cloud-hosted documents (with web platform UI for editing etc.) seem in scope. It's ugly but "digital document" comes to mind.

@RichardWallis
Contributor

I get the feeling that whatever we come up with will 'feel awkward' to a significant minority. ;-)

Coming from the background I do, I would expect DigitalDocument to encompass any mostly text based file (as in computer file stored locally or viewed in cloud apps). This would include, in addition to the four examples above, PDFs - including images, eBook data files, source code files in your favourite programming language, web server logfiles, etc.

If that is the case and the description, of such a CreativeWork based super-type, lays out such generality that will be fine. I get the the feeling that if we try to restrict the coverage, other than by having subtypes for spreadsheet etc., it will not be long before we are revisiting this.

@danbri
Contributor
danbri commented Apr 5, 2016

how about "office"? in sense of https://en.m.wikipedia.org/wiki/List_of_office_suites

  • OfficeDocument (file or remote)
    • SpreadsheetOfficeDocument
    • PresentationOfficeDocument
    • TextOfficeDocument
    • NoteOfficeDocument
    • OfficeDocumentPermission

Plus maybe rename grantee, permissionType.

Or is this too narrow?

@RichardWallis
Contributor

Too narrow and too ugly.

I think that were nearly there a step back.

  • CreativeWork
    • DigitalDocument - A digital document or file containing text, or data for an application, on a computer, device, or in a cloud service.
      • SpreadsheetDocument
      • PresentationDocument
      • TextDocument
      • NoteDocument

The description of DigitalDocument would need a bit of crafting

@philbarker
Contributor

"The description of DigitalDocument would need a bit of crafting" Yup. Many other subtypes of CreativeWork would fall under that definition of DigitalDocument : as well as http://schema.org/EmailMessage http://schema.org/WebPage , which are [text] documents, the "or file ... containing data for an application" captures things like an mp3 file

Does there need to be a super-type?

@vholland
Contributor
vholland commented Apr 5, 2016

DigitalDocument works for me.

@vholland
Contributor

I think I have captured the comments in pull request #1103.

@danbri danbri pushed a commit that referenced this issue Apr 15, 2016
Dan Brickley Added ContactPoint as another rangeIncludes on 'grantee' per #1048 di…
…scussion.
c7af157
@danbri danbri added this to the sdo-deimos release milestone Apr 19, 2016
@danbri danbri closed this Apr 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment