The life science research community comprises a large number of diverse organisations consuming and/or producing data on the web. The community is very active in adopting standards and common APIs for specific types of data, but there isn’t a standard lightweight format that these organisations use to publish all their information, and many don't have the resources or expertise to create APIs for others to access their data.
Bioschemas is a project to promote the use of Schema.org markup in life sciences, as a way to address this. We are hoping to encourage life science organisations to adopt Schema.org markup, since it doesn’t require programming skills, it is widely adopted and well documented, and it makes sense anyway for SEO. We could then scrape web pages to access what will then be consistently formatted information.
Organisations involved in Bioschemas include ELIXIR, Pistoia Alliance, GOBLET, TeSS, BioSharing and BBMRI. (I work for ELIXIR, an organisation that is funded by European governments to build a sustainable infrastructure for life science information. It is one of the founders of Bioschemas.)
Bioschemas aims to create specifications for each type needed in the life sciences. Each specification will contain:
Here is an example: the specification for
Our general approach is to:
Example use case: A small marine metagenomics research group publishes its events on its website. These get limited publicity because the website isn’t well used. They don’t have the time or expertise to create an API and haven’t got an iCal feed.
Then they code their events with Schema.org markup through a plugin for a popular open source CMS (Wordpress, Joomla, Drupal) or through an online Schema.org markup generator. We write a script to scrape their site and add their events to a database of other life science events (an events portal).
Sorry for the long post, but we’ll be posting to this community in the next few weeks, so I thought I’d give some background! Anyway do let us know if you have any thoughts on the project.
I note in your email the plan to use BioJS. I took a quick peak and see some overlap with d3.js. Many of us have significant investment in d3.js education and libraries. You may wish to consider complementing d3.js rather than taking a a potentially orthogonal approach.
Sorry if it looked like I was pushing any technology here. We absolutely don't want to re-invent the wheel and ignore the work other people have done!
So I think it's natural we have a new group for "Tool": it's always been in ELIXIR / bio.tools plan to extend schema.org in a bio.tools-compatible way. How do I make a start with this? @rajido - what practical steps should I take?
@joncison (If you don’t mind me jumping in here) I’ll email you about your question, but in general if anyone is interested in having a new class/type in Bioschemas then they can just email email@example.com, or open a new Github issue.
This is the process we've been using so far (adopted to apply to tools):
You are welcome to lead this process for tools or delegate to whoever you see fit, and I can help too. We hope to have instruction on our website soon.
Thanks a lot! Really helpful. The good news is that 1. and 2. are done and resulted in https://github.com/bio-tools/biotoolsxsd, which leaves 3 - 5. As for 6, we'll need a new registration mechanism in bio.tools to cope.
You could help a lot by prodding me, in case this thread goes cold.
On the datasets side of things, you might care to look at these proposed changes I have just merged into our draft next release:
#1247 -> http://webschemas.org/docs/releases.html#g1083
These largely come from considerations around improving the usability of our Dataset and its integration with the rest of schema.org, in particular aiming for adoption by scientific dataset publishers including lifesciences. Feedback welcomed here or in #1083.