Adding Bioschemas markup to data repositories and developing tools to find, consume and use it

Presentation

Tasks

1. Creating Markup

Hacking tasks:

Generate markup for your resource:
- Use the Bioschemas Markup Generator to help create your initial version of Dataset and DataCatalog markup.
- Please provide feedback on the Bioschemas Generator
Refine markup: manually hack generated JSON-LD to your purpose
Submit markup as example to relevant directory: In the profile, click on the Links tab and then the Examples icod
- Dataset examples
- DataCatalog examples

Resources:

Bioschemas tutorial

2. Profile Creation/Refinement

Hacking tasks:

Develop missing profiles/type, e.g. Chemistry
- Need to create new spreadsheet for the crosswalk
Refine existing profiles and types, e.g. how to state the position of a Gene in the representation for the Gene
- Need to create a new version of the crosswalk spreadsheet
Improve Bioschemas tutorial material and documentation
Prepare schema.org submission for new types and properties

Resources:

Bioschemas tutorial
GOWeb tool: converts crosswalk spreadsheet to YAML for inclusion in web page

3. Developing Support Tools

3.1 GOWeb: Profile Page Generation

3.2 Validata: Markup Validation against Bioschema Profiles

3.3. Buzzbang: Enhancing Search

Representatives:

Alasdair Gray
Leyla Garcia
Ricardo Arcila
Phil Barker
Michel Dumontier

Achievements

Achievements:

Day 1:

Added Bioschemas markup to SynBioHub (DataCatalog, Dataset, DataRecord)
Added Bioschemas markup to Bgee (Dataset)
Added Bioschemas markup to Hamap (rules, profiles and proteomes Datasets)
Updated Bioschemas markup in CathDB (DataCatalog, Dataset, DataRecord)

BioStudies

Created BioSchema for BioStudy repository.
Developed application that generates Bioschema for a BioStudy.
1 Example DataCatalog and 2 DataRecord examples are added to https://github.com/BioSchemas/specifications

Ensembl

Added draft Bioschemas to Gene and Species pages on feature branch https://github.com/Ensembl/ensembl-webcode/tree/feature/bioschemas

Chemistry

Discussed and created BioSchema for MoleculeEntity. See molecules.Md
Designed the implementation for the dataset type for the ChEMBL database. See chembl-dataset-example.json
Created a draft specification in the official Bioschemas repository. See BioSchemas/specifications#234
Created an example for the MoleculeEntity implementation based on ChEMBL database.

Community

Bioschemas Community – ELIXIR Interoperability Platform

Background information

Bioschemas is an open community project built on top of schema.org aiming to embed markup in life sciences Web resources to make them more findable and promoting interoperability. Its selling point is its simplicity, with just enough structure in schemas such as ‘DataSet’, ‘BioChemEntity’ and ‘LabProtocol’ to enable FAIRer data applications. Bioschemas markup is being deployed, but more work to develop and exploit it is required.

During the BioHackathon we want to advance development, deployment and exploitation of BioSchemas markup, as well as the tools that enable this. We also want to engage and connect with new groups and communities, such as those working on data indexing, visualization and the semantic web.

Expected outcomes

Markup of core (and other) data resources, including deposition databases, developing new profiles as required
Development of tools supporting: Creation and embedding of markup, Validation of markup
Enhanced searching of life sciences resources based on BioSchemas markup: Crawling and indexing, Generation of life sciences knowledge graph, Rich search results
Training material for development and exploitation of Bioschemas markup using available tools
Publication about Bioschemas

Expected audience

Anyone with ideas about how to get the most of Bioschemas, for instance:

Ontologists,
Developers with knowledge of JavaScript, Java, GO, Python or other languages are welcome
Developers of data resources
People interested in web search: data indexing, snippet generation, ranking, etc
People interested in RDF and semantic web,
Developers interested in Bioschemas applications (data sync, search, knowledge graphs, etc.) Expected hacking days: 4 days

Related works and references

Bioschemas website
Bioschemas list of GSoC projects
Bioschemas tools
Bioschemas poster at SWAT4HCLS2017
Buzzbang prototype search engine
schema.org
Google structured data testing tool
Validata: Validation tool
The knowledge graph
Common crawl
Kibana

GitHub or any other public repositories of your FOSS products (if any)

Bioschemas repositories
Buzzbang project encompassing crawl/search components
Validata: Validation tool

Hackers

Egon Willighagen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Adding Bioschemas markup to data repositories and developing tools to find, consume and use it

Tasks

1. Creating Markup

2. Profile Creation/Refinement

3. Developing Support Tools

3.1 GOWeb: Profile Page Generation

3.2 Validata: Markup Validation against Bioschema Profiles

3.3. Buzzbang: Enhancing Search

Representatives:

Achievements

Achievements:

Day 1:

BioStudies

Ensembl

Chemistry

Community

Background information

Expected outcomes

Expected audience

Related works and references

GitHub or any other public repositories of your FOSS products (if any)

Hackers

Files

README.md

Latest commit

History

README.md

File metadata and controls

Adding Bioschemas markup to data repositories and developing tools to find, consume and use it

Tasks

1. Creating Markup

2. Profile Creation/Refinement

3. Developing Support Tools

3.1 GOWeb: Profile Page Generation

3.2 Validata: Markup Validation against Bioschema Profiles

3.3. Buzzbang: Enhancing Search

Representatives:

Achievements

Achievements:

Day 1:

BioStudies

Ensembl

Chemistry

Community

Background information

Expected outcomes

Expected audience

Related works and references

GitHub or any other public repositories of your FOSS products (if any)

Hackers