Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value Specification as datatype #868

Closed
Aqua1ung opened this issue Oct 13, 2017 · 9 comments
Closed

Value Specification as datatype #868

Aqua1ung opened this issue Oct 13, 2017 · 9 comments
Assignees

Comments

@Aqua1ung
Copy link

I have been trying to understand the motivation behind the positing of the Value Specification (VS) class as it is currently defined in OBI. I think it is important to come clean about this before work on VS piles up to the effect that VS becomes “too big to fall”--which constitutes, in my opinion at least, a very dangerous attitude (namely refraining from radically restructuring and/or abandoning a modeling route just because too much work has been done on it by too many respectable people). The current definition of VS as recorded in the official OBI release does little to allow a clean demarcation of VS from some of the related ICE classes. Be that as it may, my main observation with respect to the way in which OBI attempts to define VS is the following:

Modeling value specifications as entities requires, at the very least, the capacity to model the set of real numbers (“the power of the continuum”) as entities.

This fact obviously constitutes a very powerful formal argument against treating VS as entities, hence it effectively kills the VS-as-entities route. Note that I have not made any mention of the ICE aspect of the matter: if anything, slamming the breaks on the VS ICE enthusiasm emerges as an added bonus of disposing of VS-as-entities.

Aside from the formal aspect of the issue, attempting to model the elements of the continuum as entities is a dead giveaway that poor modeling decisions have happened somewhere along the way, and that, quite likely, the modeling philosophy behind one’s modeling work needs serious reassessment (to put it mildly). In particular, one must have done something wrong if one is compelled to use VS as triple subjects. A healthy modeling endeavor should never lead one to attempt to model the continuum using the standard discrete tools of a modeling language like OWL. Standard OWL resources such as classes, properties, and individuals, have emphatically not been designed with this aim in mind. There are, indeed, tools in OWL that do allow one to represent infinite sets, though these tend to be more obscure, and, as such, less utilized even by experienced ontologists. These tools, however, do not represent infinite sets as regular classes of individuals.

It is, thus, my opinion that whoever introduced the VS class was actually looking to use it in a manner that is characteristic of Datatypes, though he/she was quite possibly unaware that OWL 2 allows users to define their custom datatypes.

In conclusion, I strongly recommend that VS be replaced with a datatype (be it pre-existing or custom-designed).

@bpeters42
Copy link
Contributor

bpeters42 commented Oct 14, 2017 via email

@Aqua1ung
Copy link
Author

Aqua1ung commented Oct 15, 2017 via email

@cstoeckert
Copy link
Contributor

Cristian, sorry but I am not persuaded by your arguments for doing away with all value specifications because of the points raised about numbers. I do agree that it needs more work, but I don't see data types working for categorical value specifications that I need for the tumor TNM classifications (and need to get in OBI now!). I also don't see these pointing directly to real stuff as TNM stages are defined (by the pathologists who use them) as combinations of T, N, and M values (see for example: https://staging.seer.cancer.gov/tnm/input/1.0/ovary/path_stage_group_direct/). The values for T, N, and M are conditional on different scenarios (see pT2 for example in https://staging.seer.cancer.gov/tnm/input/1.0/lung/path_t/). Each and every one of these can and should have an IRI. This is essentially what I am proposing in #856
Thanks
Chris

@Aqua1ung
Copy link
Author

Hi Chris, I can see that I have failed to make myself understood, and I can only blame myself for that. I will try to keep my arguments extremely brief, as I know you guys are awfully busy. Please read below inline.

Chris: I am not persuaded by your arguments for doing away with all value specifications

Christian: I have never proposed "doing away" with Value Specifications. All I am proposing is to represent them using the proper representation techniques. Entities have not been designed for what you are trying to use them for. Datatypes, on the other hand, have. That is precisely why datatypes have been added to OWL, so people do not have to add classes that are, in effect, duplicates of (or isomorphic to) standard mathematical objects.

Chris: I don't see data types working for categorical value specifications that I need for the tumor TNM classifications (and need to get in OBI now!).

Christian: TNM was one of the focal points of our work for IFOMIS (just ask Mathias). As such, I happen to possess some good insight into how TNM entities can be captured in very much Barry Smith-approved, ICE-free, datatype-free, real-Independent Continuant manner. (There was no ICE/IAO in those times.) Not only that, but this can be done pretty quickly--no longer, in fact, than it would take you to capture them as VS/ICE.

Chris: I also don't see these pointing directly to real stuff as TNM stages are defined (by the pathologists who use them) as combinations of T, N, and M values (see for example: https://staging.seer.cancer.gov/tnm/input/1.0/ovary/path_stage_group_direct/). The values for T, N, and M are conditional on different scenarios (see pT2 for example in https://staging.seer.cancer.gov/tnm/input/1.0/lung/path_t/). Each and every one of these can and should have an IRI. This is essentially what I am proposing in #856

Christian: Yes, TNM entities will have an IRI each, though they will not be value specifications, nor will they be datatypes either. (OWL did not allow custom user-designed datatypes at the time, nor did we feel that we needed them for TNM.) Also, as I mentioned in my previous post, datatypes are useful mostly for representing infinite sets. Feel free to ask me how to capture TNM entities as entities under the Independent Continuant umbrella.

Christian: This being said, I realize that pushing this angle can be counterproductive, hence this has been my last intervention on any matter pertaining to ICEs, Value Specifications, and Datatypes--barring, of course, explicit requests that I continue. I thank you and Bjoern for considering my proposals, and for replying to my posts.

C

@Public-Health-Bioinformatics
Copy link

As a relative newcomer to OBO/OBI I find these discussions interesting and am willing to learn the issues this way (though short on time too). However, is there background reference material (on OBI's side or in general philosophy) where OBO/OBI's position might be stated about VS and real numbers. If it doesn't exist, a summary position on the topic and decision would be good for all newcomers.

  • As I understand it, OBI and the OBOFoundry family of ontologies is fine with - and encourages - use of 'has specified value' to associate an entity with a real (number-line) value via OWL's xml data types (inherited from RDF/XML)? e.g. "[some entity id] 'has specified value' > xsd:real: 70.0" A formal logic olive-branch to the number-line. It allows other aspects of an observation to be added - units, precision, etc.

  • Is someone arguing that an entity could be created for above number, having a label "70.0" and placed within a mathematical category such that is interpreted as the real number 70.0, and such that it can become the subject of a "s p o" relation? Or is it that such a thing is being implied by a more complex statement somehow?

@jamesaoverton
Copy link
Contributor

Yes we use RDF literals with the appropriate datatype to represent numbers, so for a scalar value specification X we could have a triple X ‘has specifified numeric value’ ”70.0”^^xsd:real and another triple for the units. We can create such value specifications as needed, identifying them with new IRIs or blank nodes as required. We can compare two scalar value specifications by their units and numerical values. All this is easy and common in OWL, RDF, or SPARQL, and has been sufficient for all my modelling needs since we developed the approach a few years ago.

I might want to write a lot of triples with value specification X as the subject. In particular, I foresee that we will want to add information about the precision of X, either as measured or as a required tolerance for a setting. I haven’t run across a case where I need a number to be a subject, but even then I don’t see the necessity of giving IRIs to numbers. Literal numbers are fine.

From an ontological perspective, BFO carefully avoids mention of abstracta such as numbers. Other upper ontologies do include abstracta, but they are difficult to handle, and I don’t expect BFO to incude them any time soon. Following BFO, in IAO and OBI we talk about concrete representations of numbers (in writing, in RAM) without talking about numbers in the abstract. Again, RDF literals suit this purpose well.

@Public-Health-Bioinformatics

ok, thanks for explaining BFO/OBI & RDF literals.

@Aqua1ung
Copy link
Author

Aqua1ung commented Mar 24, 2018

I intend to do a little demonstration on Monday during my chairing of the OBO meeting, on how custom-designed datatypes work. The (very) short answer is: they work no differently than any of the built-in datatypes (xsd:string, xsd:float, etc.). Until Monday however, I will endeavor to answer some of the issues raised on the #879 thread.

[Digression]
[Short version] Being compelled to use value specifications in subject position is very possibly the result of a questionable modeling decision that has boxed one into this corner.[/short version]

[Long version]I have to confess that I have not been able to figure out a way to represent ordered pairs--i.e. value specifications made up of two or more literals ((5, kg), (21, mg), (37, degree Celsius), etc.)--as datatypes. I toyed around with the idea of making a datatype out of lists (rdf:List) of literals, though it turns out that you cannot, at least not in the current OWL incarnation: the usual Kuratowski definition has thus not yet been assimilated into OWL. However, while the desideratum of having datatypes made out of ordered pairs (of literals) may be a legitimate concern, the puzzling issue remains the question "why would anyone need that?" Why would anyone need bi-dimensional datatypes anyway? I can, as a matter of fact, imagine situations where ordered pairs of literals might be required, though my impression is that, at least as far as Value Specifications are concerned, if you've boxed yourself into a corner where appeal to either entities or multi-dimensional datatypes appears to be the only way out, you must have done something wrong on the way there. There must have been some "less fortunate" modelling decision made somewhere in the past that has led to "having to" represent outcomes of measurement processes as entities or multi-dimensional datatypes. One such decision that comes to mind is the idea of capturing/modeling speech about units of measure in OBI, as opposed to handling that in the software, somewhere "outside." Nevertheless, should you guys be hell-bent on capturing speech about measurement units within OBI (which, again, I strongly advise against), one can think of different ways to handle measurement units, that do not require representing value specifications as entities (or, horribile dictu, ordered pairs of literals, or God knows what other funky contraption), such as attaching measurement units to the measurement process itself, etc. etc. As a physicist, this one seems to me pretty reasonable: once you've decided to carry out an experiment, surely you must've settled on a measurement unit to express your results as preparation for said experiment. I know I would. At the very least, your tools must have been calibrated in some unit or other. Again, I find it pretty natural to think of the measurement unit as a property of the experimental setup (and hence of the assay itself), and derivatively, of the output. In case one does not like the idea of speaking about assays as being characterized by a measurement unit, (and, again, I, for one, can't see why one would not), one should be free to move on to the next target, the measurement datum. Speak about the measurement datum as being characterized by a measurement unit. No need to push it further along, hence no need to turn value specifications into entities. Let value specifications be strings, numbers, or whatever other literals there may be.[/long version]
[/digression]

  • @bpeters42 : "It is not at all clear how you quantify that entities are more 'expensive'." @Aqua1ung: each entity requires at the very least (a) an IRI, (b) a type declaration, and (c) very likely a label--hence at least two triples. Datatypes made out of literals (for it is that that I am selling here) require none of that. Literals require just themselves: no IRIs, no labels.

  • @bpeters42 : "how to query for datatypes that are the output of mass spectrometry experiments." @Aqua1ung: One surefire way is to query precisely for the output of mass spectrometry experiments, like so:

SELECT DISTINCT ?msa ?vs
WHERE {?msa a :MassSpectrometryAssay ; :has_specified_output ?md . ?md :has_value_specification ?vs}

You'll get a table with mass spectrometry assay IRIs in one column, and numbers (or strings) in the other. The one rule of thumb is, as long as you don't require value specifications in subject position (and why would anyone want that?), you should be safe. If, on the other hand, one feels compelled to use value specifications in subject position, this, in my experience, is the likely result of a questionable modeling decision made somewhere else in the model, decision that that has boxed you into this corner. (About that, see more in the "digression" above.)

@GullyAPCBurns
Copy link

Christian,

After looking into this for myself, I think we can find a compromise. I agree that we probably don't want to overload data too much with this sort of representation. However, in terms of describing the classes of data that are likely to be generated by experiments, Value Specifications are likely to be useful.

G

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants