Skip to content

20170327 Ontology Change Improvement Call

hauschke edited this page Apr 7, 2017 · 5 revisions

Date: Monday, March 27

Attendees: Mike Conlon, Brian Lowe, Graham Triggs, Juliane Schneider, Christian Hauschke, Damaris Murry, Muhammed Javed, DJ Lee, Marijane White, Violeta Ilik, Tenille Johnson,

Goals Create a domain definition for the VIVO ontology Make an ontology change - that is expected to be no / low impact; e.g. spelling correction, additional class; doesn't involve a change in the VIVO software Make an ontology change where there is a software change (preferably a small change)

Notes at the VIVO wiki: https://wiki.duraspace.org/display/VIVO/2017-03-27+Ontology+Improvement+Meeting+notes

Mike: Changes that create no impact, changes that have a destroying impact, etc. Let's find some examples of each kind of change and discuss them.

Let's start with the domain definition idea. Previously we discussed having a domain definition for the ontology, which means in this context, the collection of ontological statements required to do stuff in the VIVO software, which we know is not just one ontology. What do the various ontologies speak about, and what do they speak about, and are they accurate. What part of the world are they speaking about, and at what level of detail?

An idea: VIVO appears to be about representing scholarship. Creating, extending, preserving knowledge. So we need an ontology to represent that. The work that was done, who did it, etc.

Not interested in the details of clinical trials, protocols for clinical trials, they belong in an ontology about clinical trials. VIVO is interested in a person's role in a clinical trial, what was the clinical trial, who funded, etc, but not protocol-level information.

Along those lines, it is important for VIVO to identify the things it's talking about. A person, a clinical trial, grey literature, whatever it is, we need enough info to identify what we're talking about. A name, relationships to people, etc., all these things are normal for VIVO to talk about.

Javed: LD4L is creating one specific LD4L ontology, but in phase 2 they are creating extension ontologies for things like archiving data. So maybe we have some modules that go in all VIVO software, and then also have optional modules, and selection of those modules can be selected during the install.

Mike: So an example -- scholarship is a very broad topic, and VIVO wants to be very broad when it comes to scholarship. The way VIVO was originally developed it was not strong in the arts. Some confusion about things like screenplays, scripts, authoring vs performing in a play, whether the work is a textual work, or a performance. Duke has done a bunch of work describing this kind of stuff -- that could be a module, for people who need to represent works in the arts. But a cancer center won't need it, and won't load it, because they have no academic work in that area. Goes the other way too, arts programs won't load biomedical stuff. Have had similar comments from legal people/law schools.

Christian: Very appealing idea. We have a german ontology that could be used as such a module, too. Would be nice to be able to integrate it, others could ignore it if they wanted.

Mike: Want to clarify -- this is not just for german language? German ontology would be recording things of interest in Germany, yes?

Christian: Maybe, yes.

Mike: But if it's language, that's a bit orthogonal to what we're talking about. The concepts in the ontology should be universal, and represented in multiple languages. Could see some things that are only relevant in a local context.

Christian: Example of a concept, Fakultät (sp?) could not find in UK/US, but we also don't have things like Colleges in Germany

Violeta: Fakultät might be kind of like a college.

Christian: Perhaps, we've had years of discussion on the subject.

Mike: Have heard similar things from people in the UK. And this is why we've created local extensions. What we would like to avoid is installation-specific customizations that are not really required, we want to create an interchangeable representation of scholarship. Local extensions limit data exchange. So there's some work that needs to be done on the ontology. And for a while, we have not done ontology work, and as a result local extensions have been proliferating. So, a module of German ontology extensions, that's great, but watch out for things that should be added to the general ontology.

Another question: when we say scholarship/research, do we want to include museums, etc.

Violeta: yes we do.

Mike: Some people find the word "scholarship" exclusionary.

Violeta: Could call it intellectual activity.

Mike: Yeah, I guess, we're not really talking about corporate documents, for example. Some people have found the boundary confusing. Some have an idea that scholarship is a broad term for what universities do, that would include what non-academic libraries/museums/etc, that don't consider their work as scholarship. Their work

Violeta: Why do we worry about these people?

Mike: because I want them to use VIVO

Violeta: No, why are we worried about people who think not including museums, etc? Doesn't everyone on the call agree we should include them?

Mike: Yes, but it becomes a marketing problem. People don't think VIVO is for them.

Marijane: This came up while writing the abstract for the OpenRIF poster last year -- Research excludes the humantities, Scholarship excludes people in industry. I settled on the word "expertise", though that kind of elides the work being done. But also, lots of VIVO sites call their profiling systems things like experts.school.edu, and there's also the experts.gov system. And the guy from HHS who works on experts.gov would eventually like a profiling system that represents things like the expertise of someone who works in an IT department.

Damaris: I think this idea of expertise is very important. A lot of the doctors at Duke don't necessarily focus on it, but maybe they have an interest or experience, it might not be something they research on, but it is something people like to mention.

Marijane: ODG project classified doctors by diagnosis. Another angle of expertise.

Mike: UFL worked on that with Melissa. Diagnosis, drugs prescribed, histogram.

Damaris: Questions about entities. How much detail does VIVO want to go into about things like publications? Do we care that the publication was part of a grant, etc?

Mike: So the focus on expertise does lead to a focus on the person. Scholarship focus means the project or the funding may take the center stage. Currently we happen to have more detail about people, but the system is "invertable" you can have data at the center. Because everything is bidirectional in the model. We just don't have much detail around the data. Which is interesting, because we are going to want that detail, but it's not going to be in the VIVO ontology.

Marijane: so like a separate ontology for say grants?

Mike: Take CASRAI dictionary, for example, they are about research administration. VIVO doesn't want to be about that. EuroCRIS has a sophisticated model of funding, which overlaps VIVO significantly, but it is a funding-centric model.

Javed: in terms of the relationship, creating a link between a grant and and article, we already have that in the ontology. If you want to say more, you need to extend the ontology.

Mike: and that seems to be where we're going. Take Publication, that's another focus in VIVO, and there's a lot of work there, SPAR, etc. and are those models deeper than what we have? and of course, eagle-i is a deep model for resources, and it's reconciled for VIVO, and we haven't really taken advantage of it. Eagle-i and VIVO each have their domains, and they borrow concepts from each other, but we haven't done much with that.

Damaris: another issue: how do we feel about representing things that are current vs in the past. Duke only focuses on current appointments. Some people are interested in say, who previously held an appointment. Sometimes people want to show things they haven't done yet, but they are interested in? If goal is for people to take their VIVO profile from school to school... should you only have one profile? Or past/present/future?

Mike: The academic record is supposed to be complete. Think about it like your CV. VIVO is capable of representing things in the past, and things going on now. We are a bit clumsy about it, DateTimeIntervals, when things open or close, sometimes no closure which can mean it's ongoing or maybe we just don't know when it closed. The ontology can represent work in the past.

Damaris: Do you see the word CV is part of the domain definition?

Mike: we might describe it as like a CV, but we don't want to get people hung up on it. but CV is a really useful example, people know what it is and what it means. Some people use the word portfolio, but that often means the actual work is included, a copy of the work.

Javed: Can you write this definition?

Mike: Yes, I'll have it at the next meeting.

Let's talk about ontological changes. Changes that have little to no impact, and then there are changes that have significant community impact. Might be changes in the middle somewhere. Some examples?

Christian: Including a new class, like an archive for a botanical garden:

Mike: Yes. If you don't have any data like that it doesn't affect you.

Javed: Changing the label of a class or entity. For example, SKOS, broader/narrower. Has a list of things.

Mike: metadata about the ontology, where statements came from,

Marijane: broadly, these could be characterized as adding things without changing the structure, and changing annotation of things in the ontology.

Graham: Going to throw in a slight curve ball. So yes, adding things is going to be fairly straightforward, but there might be cases where there might be a higher impact than expected. People might hide some things in their VIVO, like how we restricted what could be edited in OpenVIVO. There might be things that people will have to actively disable, if they don't want it in their VIVO. So adding things might have more of an impact than we realize.

Javed: Additions can be semantic and structural.

Marijane: adding relationships seem like they are not low impact.

Mike: Christian's example is a good one. Seems almost no impact. But what if it's in a local extension, and we incorporate it into the general ontology, and ask sites to change their data to match. That will have impact.

When can we add hierarchical assertions, something gets a parent where it didn't either. Example: the things that have skos:Concept as their parent, that is wrong, what impact would changing that have on the system? If we change something from a skos:Concept to say a bfo:disposition, then what happens? Does everything still work? Changes that require software changes -- many situations where you want to change domain and range, which will definitely change the performance of the software.

Graham: one of the more interesting examples of that is what might be done about VCards. We could easily transform them into foaf:Person. Application will work for the most part, because it handles foaf:Person wherever it handles vcard. Might need to add types of people, might not break the system but it could have an impact.

Javed: systems like VIVO need to have a distinction between local and external people.

Marijane: An example - VCards on OpenVIVO are messy. Melissa Haendel has 17 Vcards, three of which she coauthors with.

Mike: We kind of imported concepts from the publication industry, so they don't care about people, names are strings. In the VIVO world we're interested in people, authorship, etc. But somehow we ended up with authors that are VCards, but authors are not VCards. They're either people or corporate groups. This is a fundamental problem that is pretty big, and it would be an example of a really impactful ontology change. Author list changes are among the highest impact changes we could consider.

Christian: also very important, no way to represent name changes. Also a very difficult problem. Contributions from are just text, but authors are links to people. Privacy law in Germany impacts how people can be represented.

Mike: At Florida we took a hard line that every author is a person, not a VCard, we just might not know who they are. But this can lead to unresolvable duplicates, because we don't have an identifier for them. Entities with the same ORCID can be merged.

Brian: Not to complicate things too much, but in theory, all of these problems were built into the design back 4 years ago, and the state of support in the application never caught up, but in theory it shouldn't have to be any kind of impactful change. VCard was incorporated to handle name changes, for example. Also unfortunate that VCards are being used as authors. Should have an axiom that an Authorship relates a publication and a person, and the Vcard optionally has the name, etc. Like what a cataloger calls item in hand. You could recall that J Smith is the author string while also having a person entity for Smith in the Authorship. People didn't want to create all the people in their VIVOs, so they started making VCards for them instead. VCard should just be an extra container of data associated with the authorship. But all these issues were in mind when changes were last made, need to make sure everything is documented and the axioms should be in the ontology, should be able to do this without it being a catastrophic change.

Graham: the probelem is that the vcards are being related to authorships the way people are.

Brian: that sort of touches on the issue with Relationship/relates/relatedby, no way to distinguish what is important, no way to logically specify it.

Graham: depending on how you're trying to follow those relationships, that can cause a much bigger problem. Even if you take VCards out of it, and make everyone a Person, you still have things in different contexts that could be related, and if you're trying to trace through an author, you have to sift out the relationships you're interested in.

Brian: In the ISF it was considered important to have these reified things. going back to binary predicates would be pretty catastrophic.

Marijane: reified n-ary relationships are a pretty typical pattern in ontologies, the thing that is funny in the ISF is that there's no semantics in the predicates. Could add some more semantic predicates in, that would probably be a big change, unless perhaps they were subproperties of relates/related by.

Mike: want to note another potential low-impact change: if the data about a thing is rare, most people are not using that part of the ontology, you can change it. Won't hurt the community.

Javed: along the same lines, we could do some surveys from some VIVO institutions, and see what properties they're using. Maybe we'd find properties that are never used.

Mike: I'd rather get their triplestore and tabulate it, but yes, we could do a survey. In the past we've asked high level questions like "Do you have grants?" and have been surprised to find things that aren't represented, like teaching.

Javed: Can we ask people to run SPARQL queries?

Graham: or when we upgrade to 1.10 we can run against the TPF endpoint.

Mike: we're at the top of the hour, but yes, the idea of having people run queries is a good one.

Javed: link to things that I think are wrong in the ontology in the chat. https://www.dropbox.com/s/976depqlqrnah65/ontologyErrors.docx?dl=0

Marijane: these should eventually be GitHub issues. We need to figure out exactly what repo they go in, though.

Mike: Yes, but we're not quite ready to talk about process yet.

Brian: Post-meeting addendum (please delete or move as deemed appropriate) -- just writing this here while I'm thinking about it in the interest of preserving some institutional memory. Regarding relates/relatedBy. Initially, VIVO had predicates named things like "authorInAuthorship" and "informationResourceInAuthorship." This was a source of criticism from early on, since it appeared that the same concept was being modeled twice: once as a class and again as a predicate. In the ISF work it was decided that these predicates should be simplified, and at the same time vivo:Relationship was introduced as a class intended to be at the same general level as BFO's Role. While roles were realized in processes, relationships were not: they were simply static designations. (Whether this is useful or makes sense was a whole other debate...) But if we assume for the moment that a Relationship is something like a Role, then relates/relatedBy were intended to be parallel to inheresIn/bearerOf. Just as we don't particularly expect there to be bearerOfInvestigatorRole or bearerOfAdministratorRole -- even though these might be useful things to know as we're crawling triples -- we don't necessarily expect specific subproperties denoting whether something is related by an Authorship relationship or a Position relationship. Not trying to argue that this is the ideal approach, but it was the outcome of the numerous debates we had about it at the time.

The VIVO-ISF ontology is an information standard for representing scholarly work.

Additional Resources

Clone this wiki locally