33_molseq: This is the working repository for Project 33 in the 2020 Biohackathon.
When sequencing DNA from an organism, it is standard practice to create voucher specimens. This ensures that the results are repeatable and that the identification of the organism can be verified. These vouchers are often stored in herbaria and museums, where they are curated and stored for the long-term. Similarly, DNA is also extracted from specimens already in collections, having been collected some time ago. In both cases, it is important both to be able to know all the sequences extracted from a specimen and find the specimen from which the sequences have been extracted. Yet, currently ...
Biodiversity
Project Number: 33
EasyChair Number: 53
Quentin Groom quentin.groom@plantentuinmeise.be (corresponding author) Mathias Dillen mathias.dillen@plantentuinmeise.be Pieter Huybrechts pieter.huybrechts@plantentuinmeise.be Maarten Trekels maarten.trekels@plantentuinmeise.be Andra Waagmeester andra@micel.io
Maarten Trekels
The outcome of the hackathon will be (1) recommendations on how specimen and sequence databases should be connected in the future (2) an analysis of the types of data available in databases suitable to find links and (3) the scripts implementing algorithms to match data. Ultimately, these outcomes will help any collection connect its data better and will support the Elixir goals of improving human and machine readable access to all data in the biological sciences.
scripting (Python, R) linked open data SPARQL Understanding of biological collection data and the metadata associated with molecular sequence data
Number of expected hacking days: 4