Skip to content

matdillen/33_molseq

Repository files navigation

33_molseq: This is the working repository for Project 33 in the 2020 Biohackathon.

Project 33: Connecting molecular sequence to their voucher specimens

Abstract

When sequencing DNA from an organism, it is standard practice to create voucher specimens. This ensures that the results are repeatable and that the identification of the organism can be verified. These vouchers are often stored in herbaria and museums, where they are curated and stored for the long-term. Similarly, DNA is also extracted from specimens already in collections, having been collected some time ago. In both cases, it is important both to be able to know all the sequences extracted from a specimen and find the specimen from which the sequences have been extracted. Yet, currently ...

Topics

Biodiversity

Project Number: 33

EasyChair Number: 53

Team

Lead(s)

Quentin Groom quentin.groom@plantentuinmeise.be (corresponding author) Mathias Dillen mathias.dillen@plantentuinmeise.be Pieter Huybrechts pieter.huybrechts@plantentuinmeise.be Maarten Trekels maarten.trekels@plantentuinmeise.be Andra Waagmeester andra@micel.io

Nominated participant(s)

Maarten Trekels

Expected outcomes

The outcome of the hackathon will be (1) recommendations on how specimen and sequence databases should be connected in the future (2) an analysis of the types of data available in databases suitable to find links and (3) the scripts implementing algorithms to match data. Ultimately, these outcomes will help any collection connect its data better and will support the Elixir goals of improving human and machine readable access to all data in the biological sciences.

Expected audience

scripting (Python, R) linked open data SPARQL Understanding of biological collection data and the metadata associated with molecular sequence data

Number of expected hacking days: 4