-
Notifications
You must be signed in to change notification settings - Fork 822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to: Core Types to Support the Discovery of Life Sciences Resources #2711
Conversation
Not as an extension yet...
And fix hasSequence domains and ranges
and missing property sources
so the unit can be included as well
so the unit can be specified
adjusting domains and ranges when needed. And updating temperature range from Text to Quantity
and add inverseOf statements and links
Commented out classes that fail to work
- Tweaks to wording of terms - Fixes to make valid HTML - Omitting keywords property
These files are all available on BioChemEntity branch
See also @Tpt 's Yago variant, https://github.com/yago-naga/yago4/blob/master/src/data/bioschemas.ttl |
@danbri any update on the progress of the inclusion of these types? |
This pull request is being tagged as Stale due to inactivity. |
Inactivity seems to be from schema.org side, any idea why? |
Short version: My sense is that we should get this into Pending, with a view towards them becoming part of core schema.org as evidence of data-consuming applications is collected. Based on the experience of the last few years, we should also expand our notion of "data-consuming applications" to cover developer and datascientist -facing applications, such as public open data knowledge graphs. I believe the bioschemas schemas have great potential, but we have work to do yet to determine quite what level of detail is going to prove appropriate for this kind of vocabulary. Next steps: I've asked @RichardWallis to take a look at some minor fixes to the PR, to mark these terms as part of the Pending area of schema.org, and remove any conflicts (e.g. SchemaExamples/schemaexamples.py needs removing). Status and Context and expectation settingWhen the Bioschemas activity was first suggested we (Schema.org leads) were initially wary of bringing Schema.org into an area where there were a great number of existing scientific and research data ontologies, unless there was a serious prospect of the schemas being used in substantive user-benefitting applications that could guide our decision making. For general consumer topics (reviews, ratings, photos, etc.) Schema.org as a unifying vocabulary made clear sense and was guided by user-facing applications. As we touched on deeper scientific topics where many levels of detail are potentially applicable, the territory felt different. I spoke about this at the Elixir 2016 All Hands, and in particular emphasized that it could be counterproductive to add this kind of vocabulary with the expectation of it primarily being used in general web search engine product features. We didn't want life-science site publishers to be disappointed if they added the markup to their sites and did not subsequently feel they were benefitting from having done so (e.g. in the Google case, by the markup being used by one of the features in Google Search's list of structured data features). And I didn't want to run into people at conferences a few years later and be told "we added all this markup to our site and it hasn't done us any good at at all!". Although these considerations apply to all schema.org additions, Bioschemas was an effort to move Schema.org towards covering scientific concepts and data structures in more detail than we had approached before. Schema.org has always focussed on schemas that are used, in the sense of consumed/interpreted by products, in user-facing features and applications. Without this, it is difficult to judge appropriate levels of detail, and it can be difficult for publishers to justify the effort of adding the markup. The expectation originally was that the bioschemas project would work equally on the data publishing, and the data-consumption side of making these schemas part of a healthy ecosystem. I think what we've seen is a lot more success on the former side than on the latter (and that is no fault of any individual or group who has been part of the bioschemas effort). PendingBy bringing these terms into schema.org's Pending area, schema.org (per our standard documentation) sets the following expectations:
This is loosely analogous to language W3C uses for Working Drafts, and I highlight it here because it is important to acknowledge that the bioschemas vocabulary has been the product of a significant and expert-informed process over the last few years, and in particular it has been created, amended and developed in collaboration with many authoritative publishers of bioinformatics / lifesciences data. It may be that the vocabulary in its schema.org incarnation will evolve further, but readers arriving here without knowledge of its origins should know that there have been substantial and long-running, expert-led collaborations leading to these designs. Our challenge now will be to address any technical and usability integration issues between these schemas and the rest of Schema.org, and to move the focus towards data-consuming applications, so that we can understand whether the level of detail, definitions, properties proposed here are sufficient to meet the needs of user-facing applications. The Bioschemas project provides some supporting tooling, and there are other opensource tools (e.g. Gleaner.io, Schemarama that may be helpful to those developing applications. Schema.org for Knowledge Graph ExchangeAs we look to support the use of schema.org data in new and interesting areas, we should also take care to be open-minded about what counts as "using" Schema.org in a data-consuming application. For example, at Google we made some investigations into whether Schema.org extended with Bioschemas is sufficiently expressive to capture a useful "knowledge graph for lifesciences" subset extracted from Wikidata.org. Would such a database be a user-facing use of the data, or a workflow / infrastructural step towards an environment where user-facing applications could eventually be created? It is a little of both. While we can declare developers to be a kind of user we care about, these kinds of generic application do not always provide guidance that can help scope and shape schema design. Such "knowledge graph exchange" scenario for using Schema.org-based data are part of a larger trend. For example:
I believe we should as a project explicitly declare these kinds of open data sharing, "knowledge graph exchange" initiatives as being amongst the kinds of data-consuming application that justify additions and changes to Schema.org. They are very much in the spirit of the project, but some thought is needed on how to operationalize this. This doesn't mean that just spinning up an RDF database with some test data in would be sufficient; rather that we would be acknowledging data scientists, developers and others who work with data as being important user constituencies. Just as schema.org serves non-technical search engine end-users who are looking for jobs, recipes, reviews, events, datasets or fact checks on the various search engines, it can also support developers and data scientists who work with aggregations of schema.org data. As the DataCommons.org site says,
This kind of service (provided also by Wikidata et al.) can add huge value and help others meet the needs of their users. The clarification to be made here is that our exit criteria for moving terms out of "Pending" status into the Schema.org core vocabulary should consider public, opendata knowledge graph use (SPARQL/RDF, Property Graphs, etc.) as important evidence towards demonstrating the usefulness of schema.org schema designs. To @stain's point, it is true that we have been a little blocked at Schema.org in terms of knowing how to handle the Bioschemas proposals, since they do make significant amounts of great data accessible via schema.org markup, even if the data-consuming applications we collectively anticipated back in 2016 have yet to emerge. Schema.org in the past has suffered from "build it and they'll come" optimism, and contains a number of schema designs which lack substantive data-consuming implementations. This is why we introduced the notion of "pending", so that there is an opportunity to surface potentially valuable schema designs, while also flagging up that we believe there may be possible tweaks ahead as data-consuming implementations surface. If we clarify "user-facing, data-consuming application" to include open data-sharing "knowledge graph" systems like Wikidata, Yago, SN SciGraph, Ozymandius, Data Commons, I believe this opens up a roadmap for bringing Bioschemas (and similar proposals) into Schema.org, without setting unrealistic expectations about the schema details being used. In particular it gives us a new focal point for articulating questions about the user needs being met by schema designs; we can ask about the kinds of queries supported by the combination of these schemas with opendata that uses the schemas. Framed in this way I'm a lot more comfortable bringing these schemas into Pending, as it gives a plausible path for progressing things further. @AlasdairGray et al., does that work for you? |
This is a replacement for PR #2699 which required some work to get it to pass CI tests.
See that PR for details