New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value Specification as datatype #868
Comments
Briefly: OBI started with modeling information as entities over 10 years
ago, with explicit approval from Barry (which took a while). The original
motivation was that we have to routinely deal with 'data items' that are
generated as outputs of experiments. Originally Barry took your stance that
we should only model truth, so we should not have to worry about
information, and rather model 'what is real'. But the whole point of OBI is
to model the reality about *how investigations are performed*. And one
crucial element of investigations is that different experiments can
generate conflicting data; that data is transformed (averaged, outliers
removed), and that there is a step of going from instance level 'data
items' to class level 'conclusion statements about reality'.
I will be the first to admit that OBI is a far way off doing these things
perfectly, and there is a definitively a problem that what is in OBI proper
has not been completely updated to reflect the overall goals that we have
outlined.
- we are not planning to model real numbers as entities; we are using a
relation 'has value' from instance to xsd:float or whatever the OWL
formalism was to allow for numbers etc.
- the point of 'value specification' is that we want to compare for example
the value "10 g" when it is used in data items (such as the outputs from
experiments e.g. "the mouse weighed 10 g" that have links to existing
physical instances) to when it is used in experimental protocols (such as
"Add 10 g of sugar to the solution"), or predictions ("after drug
treatment, we predict that the mouse will weigh less than 10g")
I am hoping this is useful. Without wanting to stiffle discussion, I am
worried about how much resources you and we are spending explaining
something that in its current form is not documented to the degree that it
is completely consistent. If you are frustrated by this response and by our
unwillingness to reconsider modeling decisions (which I would very much
understand), I would ask you to allow us time to clean up a consistent
modeling before asking for your feedback again.
Thank you for your input,
Bjoern
We will not fundamentally question over 10 years of work. Especially as it
se
…On Fri, Oct 13, 2017 at 11:25 AM, Cristian Cocos ***@***.***> wrote:
I have been trying to understand the motivation behind the positing of the
Value Specification (VS) class as it is currently defined in OBI. I think
it is important to come clean about this before work on VS piles up to the
effect that VS becomes “too big to fall”--which constitutes, in my opinion
at least, a very dangerous attitude (namely refraining from radically
restructuring and/or abandoning a modeling route just because too much work
has been done on it by too many respectable people). The current definition
of VS as recorded in the official OBI release does little to allow a clean
demarcation of VS from some of the related ICE classes. Be that as it may,
my main observation with respect to the way in which OBI attempts to define
VS is the following:
Modeling value specifications *as entities* requires, at the very least,
the capacity to model the set of real numbers (“the power of the
continuum”) as entities.
This fact obviously constitutes a very powerful *formal* argument against
treating VS *as entities*, hence it effectively kills the VS-as-entities
route. Note that I have not made any mention of the ICE aspect of the
matter: if anything, slamming the breaks on the VS ICE enthusiasm emerges
as an added bonus of disposing of VS-as-entities.
Aside from the formal aspect of the issue, attempting to model the
elements of the continuum as entities is a dead giveaway that poor modeling
decisions have happened somewhere along the way, and that, quite likely,
the modeling philosophy behind one’s modeling work needs serious
reassessment (to put it mildly). In particular, one must have done
something wrong if one is compelled to use VS as triple *subjects*. A
healthy modeling endeavor should never lead one to attempt to model the
continuum using the standard discrete tools of a modeling language like
OWL. Standard OWL resources such as classes, properties, and individuals,
have emphatically *not* been designed with this aim in mind. There are,
indeed, tools in OWL that do allow one to represent infinite sets, though
these tend to be more obscure, and, as such, less utilized even by
experienced ontologists. These tools, however, do not represent infinite
sets as regular classes of individuals.
It is, thus, my opinion that whoever introduced the VS class was actually
looking to use it in a manner that is characteristic of Datatypes, though
he/she was quite possibly unaware that OWL 2 allows users to define their
custom datatypes.
In conclusion, I strongly recommend that VS be replaced with a datatype
(be it pre-existing or custom-designed).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#868>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ANN9Ijab3Cuiqj4SxFzrzSj8JHa6AD8uks5sr6sWgaJpZM4P44Au>
.
--
Bjoern Peters
Associate Professor
La Jolla Institute for Allergy and Immunology
9420 Athena Circle
La Jolla, CA 92037, USA
Tel: 858/752-6914
Fax: 858/752-6987
http://www.liai.org/pages/faculty-peters
|
Hi Bjoern, see my reply inline please.
On 10/14/2017 11:48, bpeters42 wrote:
Briefly: OBI started with modeling information as entities over 10 years
ago, with explicit approval from Barry (which took a while). The original
motivation was that we have to routinely deal with 'data items' that are
generated as outputs of experiments.
No argument there. If you read my postings carefully, you'll see that I
am explicitly arguing that representing ICEs is unavoidable,
especially in situations in which measurements can only be deemed as
approximate. Many ICEs, however, are dispensable in favor of whatever
they represent--basically in situations in which the representation is
100% accurate, with absolutely no chance of misrepresentation. My
worry is that, caught in the fever of building the ICE scaffolding, OBI
is in danger of losing sight of the fact that a great deal of ICEs are
completely (and easily) dispensable in favor of their real
counterparts. The basic situation that comes to mind is assays whose
results have values in a discrete set, as opposed to assays with
values in a set that has the power of the continuum (this distinction
may or may not coincide with what you guys call "categorical" vs.
"scalar"). The latter situation can*not* avoid appeal to ICEs, while the
former *can* afford to actually point *directly* to the real stuff,
without the ICE middleman. (Yes, the former may have to use ICEs as
well, in case the result of the measurement needs to have an ID (or
whatever) that needs to be preserved in some official record etc. etc.
etc.--again, i tried to explain these options in my previous postings.)
- we are not planning to model real numbers as entities; we are using a
relation 'has value' from instance to xsd:float or whatever the OWL
formalism was to allow for numbers etc.
While you may not be *planning* to model real numbers as entities, you
will effectively *have to*. Value Specifications (VS) as they are
currently defined, and as have been presented to me by Chris among
others (you yourself are taking this stance in the paragraph below),
include *potential* results of measurement processes. As such, OBI will,
in effect, *have* to allocate an IRI for *every (VS "containing" a) real
number*! If 10.1g is a *potential* measurement result, OBI will *have
to* have an IRI in stock for it. If 10.0002584g is a *potential*
measurement result (and I do not see why not), OBI will have to give
it an IRI. And so on and so forth. Every conceivable *potential*
measurement result will have to have an IRI in OBI right off the bat!
Not only that, but OBI will also have to have unique IRIs ready for the
same numbers only on a length (and volume, and Amperes, and Volts etc.)
scale, such as, say, millimeters (mm): 10.1mm, 10.0002584mm,
10.0002585mm etc. But wait, it gets better! What if someone wants to use
*meters* instead of *millimeters*? Not only will you have to have a
continuum-power infinity of IRIs to represent values in millimeters, but
also another continuum-power infinity of IRIs to capture values in
meters. And so on and so forth. Needless to say that not only is that
*impossible* using the standard *discrete* tools of OWL, but this is not
even how these standard resources (the OWL entities) have been *designed*
to be used! Using OWL entities to "model" this constitutes *patent
misuse of OWL resources*. But there is hope, so rejoice: OWL2
fortunately *does* include resources whose aim is *precisely* to
capture the stuff that OBI has so far been trying to cram and shoehorn
into the terribly inappropriate framework of OWL entities. Those
resources are called *datatypes*. More precisely, custom, user-designed
datatypes. Datatypes have been purposely designed to capture *potential*
values of ... anything. Datatypes is the OWL equivalent of the world of
potentialities. While OWL entities (have been designed to) represent the
actualia, OWL datatypes (have been designed to) represent the potentia.
Now, I anticipate a response along the following lines: well, while it
is true that the VS class encompasses all values of *potential*
measurements, we will *not* have to represent all these values right at
the outset, but instead we will add them as they "happen," or "as we
need them." Here are two reasons why this would be wrong:
1. The stronger reason: This defeats the purpose of having a class of
*potential* measurement values. In OWL, if an entity is known to
exist, it needs to be represented (today, tomorrow, next year etc.).
On the other hand, all the "entities" that represent potential
measurement values are known to "exist." How will you ever represent
these "entities" knowing that it is *logically* impossible to
represent them? Granted, I'd probably argue that, should OBI have
no choice but to proceed as it has so far, I would not even bother
raising this issue: If OBI can *only* be a hack job, then hack job
it is! My point, however, is that it does *not* have to be a hack
job! There *is* a perfectly reasonable, and elegant, and purposely
designed solution to dealing with these "value specifications," only
ontologists need to (a) be made aware that it exists, and (b) have
the will (and openness) to embrace it.
2. The weaker reason: It also looks to me that this is roughly what the
Measurement Datum class was intended to capture--namely measurement
results of actually performed assays--hence the class of Value
Specifications emerges as a duplicate: you either have the class of
Value Specifications fully populated with known *potential*
measurement values (which is physically and logically
impossible--see #1 above), or you have the VS class that achieves
largely the same objectives as the Measurement Datum class. Either
way, the VS class is not needed.
- the point of 'value specification' is that we want to compare for
example
the value "10 g" when it is used in data items (such as the outputs from
experiments e.g. "the mouse weighed 10 g" that have links to existing
physical instances) to when it is used in experimental protocols (such as
"Add 10 g of sugar to the solution"), or predictions ("after drug
treatment, we predict that the mouse will weigh less than 10g")
See comments above. The only minor inconvenience that I see in using
datatypes to represent VS is that datatypes cannot be used in subject
place. That should easily be fixable by making sure that never happens.
All the examples you just mentioned can easily be rephrased so as to
avoid having "10g" in subject position. Problem solved.
I am hoping this is useful. Without wanting to stiffle discussion, I am
worried about how much resources you and we are spending explaining
something that in its current form is not documented to the degree that it
is completely consistent.
I usually do my homework pretty thoroughly before attempting to change
peoples' minds. Not only that, but I usually prefer to err on the side
of letting sleeping dogs lie if I don't think I have a very acute issue
to raise. In short, I don't usually speak without a damn' good reason
:-) I know how much academics prize consensus, and I very much hate to
be on the dissenting side. I am not relishing the posture of dissenting
party.
If you are frustrated by this response and by our
unwillingness to reconsider modeling decisions (which I would very much
understand), I would ask you to allow us time to clean up a consistent
modeling before asking for your feedback again.
Yes, I certainly understand the burden of an evolving model, though the
trouble is that I actually have to work with actual concrete data sets
that need to be captured in this mold. As it is right now, I am afraid
that I will not be able to, and that may impact very concrete deadlines.
Thanks,
C
|
Cristian, sorry but I am not persuaded by your arguments for doing away with all value specifications because of the points raised about numbers. I do agree that it needs more work, but I don't see data types working for categorical value specifications that I need for the tumor TNM classifications (and need to get in OBI now!). I also don't see these pointing directly to real stuff as TNM stages are defined (by the pathologists who use them) as combinations of T, N, and M values (see for example: https://staging.seer.cancer.gov/tnm/input/1.0/ovary/path_stage_group_direct/). The values for T, N, and M are conditional on different scenarios (see pT2 for example in https://staging.seer.cancer.gov/tnm/input/1.0/lung/path_t/). Each and every one of these can and should have an IRI. This is essentially what I am proposing in #856 |
Hi Chris, I can see that I have failed to make myself understood, and I can only blame myself for that. I will try to keep my arguments extremely brief, as I know you guys are awfully busy. Please read below inline. Chris: I am not persuaded by your arguments for doing away with all value specifications Christian: I have never proposed "doing away" with Value Specifications. All I am proposing is to represent them using the proper representation techniques. Entities have not been designed for what you are trying to use them for. Datatypes, on the other hand, have. That is precisely why datatypes have been added to OWL, so people do not have to add classes that are, in effect, duplicates of (or isomorphic to) standard mathematical objects. Chris: I don't see data types working for categorical value specifications that I need for the tumor TNM classifications (and need to get in OBI now!). Christian: TNM was one of the focal points of our work for IFOMIS (just ask Mathias). As such, I happen to possess some good insight into how TNM entities can be captured in very much Barry Smith-approved, ICE-free, datatype-free, real-Independent Continuant manner. (There was no ICE/IAO in those times.) Not only that, but this can be done pretty quickly--no longer, in fact, than it would take you to capture them as VS/ICE. Chris: I also don't see these pointing directly to real stuff as TNM stages are defined (by the pathologists who use them) as combinations of T, N, and M values (see for example: https://staging.seer.cancer.gov/tnm/input/1.0/ovary/path_stage_group_direct/). The values for T, N, and M are conditional on different scenarios (see pT2 for example in https://staging.seer.cancer.gov/tnm/input/1.0/lung/path_t/). Each and every one of these can and should have an IRI. This is essentially what I am proposing in #856 Christian: Yes, TNM entities will have an IRI each, though they will not be value specifications, nor will they be datatypes either. (OWL did not allow custom user-designed datatypes at the time, nor did we feel that we needed them for TNM.) Also, as I mentioned in my previous post, datatypes are useful mostly for representing infinite sets. Feel free to ask me how to capture TNM entities as entities under the Independent Continuant umbrella. Christian: This being said, I realize that pushing this angle can be counterproductive, hence this has been my last intervention on any matter pertaining to ICEs, Value Specifications, and Datatypes--barring, of course, explicit requests that I continue. I thank you and Bjoern for considering my proposals, and for replying to my posts. C |
As a relative newcomer to OBO/OBI I find these discussions interesting and am willing to learn the issues this way (though short on time too). However, is there background reference material (on OBI's side or in general philosophy) where OBO/OBI's position might be stated about VS and real numbers. If it doesn't exist, a summary position on the topic and decision would be good for all newcomers.
|
Yes we use RDF literals with the appropriate datatype to represent numbers, so for a scalar value specification X we could have a triple I might want to write a lot of triples with value specification X as the subject. In particular, I foresee that we will want to add information about the precision of X, either as measured or as a required tolerance for a setting. I haven’t run across a case where I need a number to be a subject, but even then I don’t see the necessity of giving IRIs to numbers. Literal numbers are fine. From an ontological perspective, BFO carefully avoids mention of abstracta such as numbers. Other upper ontologies do include abstracta, but they are difficult to handle, and I don’t expect BFO to incude them any time soon. Following BFO, in IAO and OBI we talk about concrete representations of numbers (in writing, in RAM) without talking about numbers in the abstract. Again, RDF literals suit this purpose well. |
ok, thanks for explaining BFO/OBI & RDF literals. |
I intend to do a little demonstration on Monday during my chairing of the OBO meeting, on how custom-designed datatypes work. The (very) short answer is: they work no differently than any of the built-in datatypes (xsd:string, xsd:float, etc.). Until Monday however, I will endeavor to answer some of the issues raised on the #879 thread. [Digression] [Long version]I have to confess that I have not been able to figure out a way to represent ordered pairs--i.e. value specifications made up of two or more literals ((5, kg), (21, mg), (37, degree Celsius), etc.)--as datatypes. I toyed around with the idea of making a datatype out of lists (rdf:List) of literals, though it turns out that you cannot, at least not in the current OWL incarnation: the usual Kuratowski definition has thus not yet been assimilated into OWL. However, while the desideratum of having datatypes made out of ordered pairs (of literals) may be a legitimate concern, the puzzling issue remains the question "why would anyone need that?" Why would anyone need bi-dimensional datatypes anyway? I can, as a matter of fact, imagine situations where ordered pairs of literals might be required, though my impression is that, at least as far as Value Specifications are concerned, if you've boxed yourself into a corner where appeal to either entities or multi-dimensional datatypes appears to be the only way out, you must have done something wrong on the way there. There must have been some "less fortunate" modelling decision made somewhere in the past that has led to "having to" represent outcomes of measurement processes as entities or multi-dimensional datatypes. One such decision that comes to mind is the idea of capturing/modeling speech about units of measure in OBI, as opposed to handling that in the software, somewhere "outside." Nevertheless, should you guys be hell-bent on capturing speech about measurement units within OBI (which, again, I strongly advise against), one can think of different ways to handle measurement units, that do not require representing value specifications as entities (or, horribile dictu, ordered pairs of literals, or God knows what other funky contraption), such as attaching measurement units to the measurement process itself, etc. etc. As a physicist, this one seems to me pretty reasonable: once you've decided to carry out an experiment, surely you must've settled on a measurement unit to express your results as preparation for said experiment. I know I would. At the very least, your tools must have been calibrated in some unit or other. Again, I find it pretty natural to think of the measurement unit as a property of the experimental setup (and hence of the assay itself), and derivatively, of the output. In case one does not like the idea of speaking about assays as being characterized by a measurement unit, (and, again, I, for one, can't see why one would not), one should be free to move on to the next target, the measurement datum. Speak about the measurement datum as being characterized by a measurement unit. No need to push it further along, hence no need to turn value specifications into entities. Let value specifications be strings, numbers, or whatever other literals there may be.[/long version]
You'll get a table with mass spectrometry assay IRIs in one column, and numbers (or strings) in the other. The one rule of thumb is, as long as you don't require value specifications in subject position (and why would anyone want that?), you should be safe. If, on the other hand, one feels compelled to use value specifications in subject position, this, in my experience, is the likely result of a questionable modeling decision made somewhere else in the model, decision that that has boxed you into this corner. (About that, see more in the "digression" above.) |
Christian, After looking into this for myself, I think we can find a compromise. I agree that we probably don't want to overload data too much with this sort of representation. However, in terms of describing the classes of data that are likely to be generated by experiments, Value Specifications are likely to be useful. G |
I have been trying to understand the motivation behind the positing of the Value Specification (VS) class as it is currently defined in OBI. I think it is important to come clean about this before work on VS piles up to the effect that VS becomes “too big to fall”--which constitutes, in my opinion at least, a very dangerous attitude (namely refraining from radically restructuring and/or abandoning a modeling route just because too much work has been done on it by too many respectable people). The current definition of VS as recorded in the official OBI release does little to allow a clean demarcation of VS from some of the related ICE classes. Be that as it may, my main observation with respect to the way in which OBI attempts to define VS is the following:
Modeling value specifications as entities requires, at the very least, the capacity to model the set of real numbers (“the power of the continuum”) as entities.
This fact obviously constitutes a very powerful formal argument against treating VS as entities, hence it effectively kills the VS-as-entities route. Note that I have not made any mention of the ICE aspect of the matter: if anything, slamming the breaks on the VS ICE enthusiasm emerges as an added bonus of disposing of VS-as-entities.
Aside from the formal aspect of the issue, attempting to model the elements of the continuum as entities is a dead giveaway that poor modeling decisions have happened somewhere along the way, and that, quite likely, the modeling philosophy behind one’s modeling work needs serious reassessment (to put it mildly). In particular, one must have done something wrong if one is compelled to use VS as triple subjects. A healthy modeling endeavor should never lead one to attempt to model the continuum using the standard discrete tools of a modeling language like OWL. Standard OWL resources such as classes, properties, and individuals, have emphatically not been designed with this aim in mind. There are, indeed, tools in OWL that do allow one to represent infinite sets, though these tend to be more obscure, and, as such, less utilized even by experienced ontologists. These tools, however, do not represent infinite sets as regular classes of individuals.
It is, thus, my opinion that whoever introduced the VS class was actually looking to use it in a manner that is characteristic of Datatypes, though he/she was quite possibly unaware that OWL 2 allows users to define their custom datatypes.
In conclusion, I strongly recommend that VS be replaced with a datatype (be it pre-existing or custom-designed).
The text was updated successfully, but these errors were encountered: