New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIME type standardization #2096

Open
michael-conway opened this Issue May 21, 2014 · 5 comments

Comments

Projects
None yet
2 participants
@michael-conway
Member

michael-conway commented May 21, 2014

per [iROD-Chat:11797] Re: MIME madness

I think both appoaches would work well, (using column or avu), but I think from a meta-programming point of view, using avu seems like a good idea. Because we can treat all avus in a uniform fashion, such as ACLs, under one code path. Besides, iplant is already using avu, this would require minimum change in their already working system. AVU also allows externally indexing to be done in an extensible way. (fore example, when we may add other properties of data in the future)

On Tue, May 20, 2014 at 5:26 PM, Conway, Mike michael_conway@unc.edu wrote:
Hi Jason I think there is a micro service that allows this client update, and I’m going to see if I can make that work. That field right now is indeed used for mime type, but not expressed as standard mime type names, and that might present a problem for your proposed use?

MIME types may be set outside of irods, by file format recognition services such as DROID or Tika, as well as internally, and that’s indeed something to dig into, whether a client can utilize that field. It is as likely that there are policies on the server side that would key on the MIME type of a file.

The other option would be to pick a standard AVU that denotes MIME type, and orient all of our client libraries to honor the same one.

MC

On May 20, 2014, at 5:04 PM, Jason Coposky wrote:

i have been considering internally mapping that column to our concept of a First Class Object in the new object model in order to simplify the understanding of what flavor of data object we are dealing with internally. from a client perspective i do not recall a code path which allows users to modify columns in r_data_main without admin privileges so i think that we would need to add an API interface unless the MIME type detection is done automatically on the Agent side and the column populated appropriately. was that your thinking?

J

On May 20, 2014, at 4:31 PM, "Tony Edgin" wrote:

I like Mike's proposal. We could definitely take advantage of data type information based on MIME Types. The build in ones don't have many of the types we support, so we store the data type as an AVU, like you stated.

On Tue, May 20, 2014 at 1:15 PM, Wayne Schroeder wrote:
Hi Mike,

The name of the column you're interested in is COL_DATA_TYPE, not R_PDATA_TYPE. The page you used, https://wiki.irods.org/index.php/icat_schema_notes , is out a date and has a link (which I added years ago) to more a current page: https://wiki.irods.org/index.php/icatAttributes
altho the code base is the most accurate source of information and, at least for 3.3.1, for this that would be iRODS/server/icat/src/icatSysTables.sql.pp which is used in the installation. For the most part, 4.0 has the same columns and I believe this is unchanged. COL_DATA_TYPE is the name used in general-queries. Internally, it's the data_type_name column in the R_DATA_MAIN table.

There's a set of data-types that can be used that is installed initially and can be extended (or reduced) by the admin. The command 'iadmin lt data_type' lists them. The iadmin 'at data_type' (add token) command can be used to add to the set. The set that was chosen seemed fairly standard to us. I think was mostly Mike Wan and Raja picked the names we used, and was a set that various people found useful.

Like a lot of things, how this is used, if at all, is a site choice. For the client API you're developing, I would think you'd want to display certain icons for well-known types. You might also include a script to add the available data-types from the iana site, for iRODS sites that want to use those. We'd probably want to include those for the DFC instance.

  • Wayne -

On Tuesday, May 20, 2014 11:05:45 AM UTC-7, Mike Conway wrote:
I'm working on client API and interface code, and realizing that we need to settle on standard file type mappings for lots of reasons, the simplest being for responsive interfaces that can display data based on file type.

What I'm proposing is that we consider a standard use of the iRODS Data Object field R_PDATA_TYPE (see https://wiki.irods.org/index.php/icat_schema_notes)

And in this field, we use an actual MIME type (see http://www.iana.org/assignments/media-types/media-types.xhtml)

And we ensure that client libraries and any interfaces we build honor and use a mime type stored there. I think we can provide backward compatibility for the non-standard file type notations that have been used in iRODS itself.

So that's a small deal and a big deal at the same time, meaning if you use custom AVUs to note file types, you might not get certain default automatic behavior for apps built using the standard client libraries.

That's a suggestion, I'm immediately confronted with the problem of resolving file types to get custom behaviors though, and I wonder what everyone else is doing?

Cheers
MC

@michael-conway

This comment has been minimized.

Member

michael-conway commented May 21, 2014

Hi Mike, Hao and Raja,
For iPlant we have tackled this at 2 levels:

  1. We use Apache tika http://tika.apache.org/ to basic mime type detection
  2. We have custom grammer that is used for detecting "info type" which is a special content type that is of relevance for our users i.e having a tab delimited file or mime time of text is not extremely helpful but identifying info type of "VCF" , "GFF3" etc gives us the richer context for further operations.

Most of our file formats do not exist in http://www.iana.org/assignments/media-types/media-types.xhtml

We store all of this in user space with 2 AVU's (mime and info type) and would prefer to keep it in system space/metadata. We do this in user space to allow users to choose their own overriding mime/info types. Having ACL on AVU would be a dream come true !

Regards,
Nirav

On Tue, May 20, 2014 at 4:10 PM, Arcot Rajasekar wrote:
Another point in using AVUs is that files can have more than one data type and this can be inserted as multiple AVUs. What I mean by multiple data types is that some of it might be mime-type ut others might be more nuanced.

Like .xml is a mime type but does not capture the internal schema type....

Also there might be nuanced data types .jpg and .jpg2 etc....

thanks

raja

From: irod-chat@googlegroups.com [irod-chat@googlegroups.com] on behalf of Hao Xu
Sent: Tuesday, May 20, 2014 7:05 PM
To: irod-chat@googlegroups.com
Subject: Re: [iROD-Chat:11797] Re: MIME madness

I think both appoaches would work well, (using column or avu), but I think from a meta-programming point of view, using avu seems like a good idea. Because we can treat all avus in a uniform fashion, such as ACLs, under one code path. Besides, iplant is already using avu, this would require minimum change in their already working system. AVU also allows externally indexing to be done in an extensible way. (fore example, when we may add other properties of data in the future)

@michael-conway

This comment has been minimized.

Member

michael-conway commented May 21, 2014

The upshot seem so be that we prefer AVU for MIME type, we should find a standard AVU scheme for MIME type, and perhaps a way of registering standard AVUs.

In Jargon to start, I’ll be adding MIME type code to store and retrieve from a standard MIME AVU, supporting multiple. What I can do is provide some flexibility to designate the MIME type ‘source’ so a custom AVU scheme would work in any interfaces or libraries we do.

@lstillwe

This comment has been minimized.

Contributor

lstillwe commented May 21, 2014

I am interested in keeping up with this.
For DataONE member node support, I need to read return a data type for DataONE registered data.
I had been thinking about using an AVU for this too.

Lisa

On May 21, 2014, at 7:32 AM, Mike Conway wrote:

The upshot seem so be that we prefer AVU for MIME type, we should find a standard AVU scheme for MIME type, and perhaps a way of registering standard AVUs.

In Jargon to start, I’ll be adding MIME type code to store and retrieve from a standard MIME AVU, supporting multiple. What I can do is provide some flexibility to designate the MIME type ‘source’ so a custom AVU scheme would work in any interfaces or libraries we do.

@michael-conway

This comment has been minimized.

Member

michael-conway commented May 23, 2014

On May 21, 2014, at 10:38 AM, Lisa Stillwell notifications@github.com wrote:

I am interested in keeping up with this.
For DataONE member node support, I need to read return a data type for DataONE registered data.
I had been thinking about using an AVU for this too.

OK we can add this to jargon-data-utils. I’d suggest we have a standard registry (on github) of AVU reserved names, which already exist for starred folders, shared folders, tags, user profiles, etc. Set/get data type methods can be put into a file format package.

This is related standardizing on .irods folders as I’m also working on virtual collections, metadata templates, and user submitted rule templates which will also have AVU markup.

MC

Lisa

On May 21, 2014, at 7:32 AM, Mike Conway <notifications@github.commailto:notifications@github.com> wrote:

The upshot seem so be that we prefer AVU for MIME type, we should find a standard AVU scheme for MIME type, and perhaps a way of registering standard AVUs.

In Jargon to start, I’ll be adding MIME type code to store and retrieve from a standard MIME AVU, supporting multiple. What I can do is provide some flexibility to designate the MIME type ‘source’ so a custom AVU scheme would work in any interfaces or libraries we do.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2096#issuecomment-43743155.

Reply to this email directly or view it on GitHub.

@michael-conway

This comment has been minimized.

Member

michael-conway commented Jul 14, 2014

revisiting....

Thinking about listings, it really seems necessary to carry mime type in the data object table, otherwise, generating a listing is going to require an AVU outer join on mime type. Maybe we reconsider this and add any secondary mime types as AVUs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment