bio.tools & EDAM drop-in hackathon & discussions
Representative: Jon Ison
- Jon Ison
- Hans-Ioan Ienasescu
- Matúš Kalaš
- Hervé Ménager
- Veit Schwämmle
EDAM and bio.tools developers will attend the whole hackathon (Mon 12 - Fri 16) and run dicussion and hacking sessions, with each day focused on a specific theme (see below). We hope to work with any people and projects who are interested in using or developing EDAM and bio.tools.
Focus of each day
Each hackathon day has a focus, which we'll try to stick to, with a range of tasks catering for different interests and expertise. We do not expect to complete all the tasks, and will adapt depending upon who turns up, so feel free to drop in to any session at any time:
- Tue Nov 13 bio.tools testing (testing & search optimisation)
- Tue Nov 14 bio.tools outreach (kick-start the community development)
- Wed Nov 15 EDAM development (data formats catalogue, planning EDAM 2.0 and applications)
- Fri Nov 16 Planning (wrapping up, next steps, collaborations)
Day 1 (Nov 12): Warm-up
EDAM and bio.tools core-dev will be around to discuss sessions for days 2-5
Day 2 (Nov 13): bio.tools testing
Expected audience: anyone with an interest in improving bio.tools
Expected outcome: verify the next release, improve the search performance
The purpose is to test, evaluate and optimise the development deployment of bio.tools (https://dev.bio.tools/), changes in which are scheduled to be moved into production (https://bio.tools/) during Dec 3-7. The bio.tools core-dev will be on hand to discuss things in person.
Task 1: Release testing
Currently 28 issues labelled "done - staged for release" are implemented in https://dev.bio.tools. Before these can be moved into production, we need independent verification that these features and fixes are satisfactorily implemented.
The task is:
- pick any "done - staged for release" issue which lacks the "fix verified" label
- read the thread and test things are working as advertised
- add a comment to the thread; either reporting things are OK, or describing an outstanding problem: bio.tools core-dev will monitor the tracker, fix issues that crop up, and attach the
fix verifiedlabel to confirmed fixes
- repeat, until all
done - staged for releaseissues are verified
- experiment with https://dev.bio.tools - critique the interfaces, API and content - and report any bugs or suggestions via GitHub
Task 2: bio.tools API testing & optimisation
The latest development deployment of the bio.tools API (https://dev.bio.tools/api/tool) is, we hope, a big improvement on the current version. It supports a comprehensive set of parameters that enable precise query over tool function and other metadata. But before we can move these changes into production, the API needs to be thoroughly tested. We also want to optimise the search behaviour, in light of results of real user experiments, to ensure it works as anticipated.
The task is:
- systematically test the API, particulaly the behaviour of the search parameters as documented in the API Reference and API Usage Guide.
- provide feedback on the API search behaviour / possible improvement via GitHub. You can suggest fixes or improvements to the API docs here.
- elasticsearch experts only - please speak to bio.tools core-dev (there are issues we need help with!)
We hope (developments pending) to have an easy way to tweak the elasticsearch parameters during the workshop, allowing for immediate iterative improvements.
Day 3 (Nov 14): bio.tools outreach
Expected audience: anyone with an interest in developing bio.tools
Expected outcome: kick-start the community development process
The purpose is introduce our current development priorities and to introduce and improve the proposed community development process for bio.tools. The bio.tools core-dev will be on hand to discuss things in person.
Task 1: Development priorities
We label issues to reflect their status and priority:
- "critical priority" : our top priorities, including most of the reported bugs
- "high priority" : things which bio.tools core-dev consider high priorities; we get to these once "critical priority" issues are addressed
- "in progress" : things we're working on currently
- "Dec 18 release" : things we're aiming to put into the next production deployment
- "wontfixsoon" : things which, for one reason or another (usually lack of developer capacity), we don't anticipate doing soon (that's doesn't imply they're unimportant or bad ideas!)
We want to be sure our priorities reflect those of the community at large, and engage developers who are willing to help out. The task is:
- review our priorities (issues in any of the categories above) - providing feedback in the appropriate GitHub thread
- feel free to request new features, but please first search our issues as it might already be listed
- developers only - if you're interested to help out - especially on "critical priority" issues (or anything else!), then please disucss this with the bio.tools core-dev
Task 2: Open development process
Now that bio.tools is open source, there is an opportunity for hackers everywhere to contribute to the project. But first we must define how the community development process will work in practice. We have emerging contributor guidelines but we want to revise these in light of feedback from potential contributors.
Day 4 (Nov 15): EDAM development
Expected audience: anyone with an interest in improving EDAM, people knowledgeable of bioinformatics data formats
Expected outcome: improved EDAM Formats subontology, scoping the desired state of EDAM 2.0, developing EDAM applications
Task 1 Curation of bioinformatics data formats
The EDAM Format subontology has potential in systems such as Galaxy and for applications such as workflow composition. EDAM is close to providing a comprehensive catalogue of the prevalent bioinformatics data formats, but a significant amount of work remains. The task is to work on any aspects of the data format curation listed here including:
- addition of miscellaneous new data formats, or changes to existing ones (see issues)
- addition of formats ensuring coverage for Galaxy applications (issue)
- addition of formats to ensure coverage of FAIRSharing
We expect the tasks to be accomplished manually, programmatically, or by a combination of the two. Please see:
Task 2 Verification of EDAM Formats subontology
We have guidelines for the development of the EDAM formats subontology:
- editor guidelines when modifying EDAM; adding or changing concepts, concept metadata, crosslinking, etc.
- developer guidelines about the technical process
To develop EDAM Format subontology into a rigorous catalogue, we must ensure the guidelines are followed. The task is:
- review the editor guidelines and developer guidelines, and provide feedback on these via GitHub or discuss this in person with EDAM core-dev
- propose clean-ups of the connection between EDAM Format and Data subontologies (see issue) : please make suggestions via GitHub - see also issue
- (developers only) develop a utility that checks compliance of EDAM to the guidelines above, and generate a human-readable report that can be acted on. In case you want to work with EDAM in JSON / JSON-LD format, see edam2json
Task 3 Towards EDAM 2.0 (discussion & planning)
It's over 5 years since an article describing EDAM was published in Bioinformatics. Since then, there have been 18 new releases (currently EDAM 1.21), with many additions and improvements, and greatly improved documentation:
Within 3 - 6 months, we hope to release EDAM 2.0 implementing a set of features representing a step forward in value and quality over the 1.* releases. The task (working as a group, or alone) is:
- think; what are the desirable properties of EDAM 2.0? Is it simply to adhere to the rules and guidelines above, or something more?
- enumerate desirable properties in this issue; we'll try to prioritise these during the hackathon
- create sub-issues as needed, for finer-grained information
Task 4: EDAM applications (discussion & hacking)
EDAM is used (or being considered) in a variety of contexts. There is an opportunity for developers on projects that are using (or considering) EDAM to discuss their requirements and work with the EDAM developers. Or you might have an idea that we haven't heard of already; let's discuss.
Day 5 (Nov 16): Planning & coordination
The final day will be reserved to finishing off, and discussing and planning next steps around collaborations of EDAM and bio.tools with other projects.
We can work on other topics, depending upon interest and progress as we proceed, e.g.:
- integration of crawling and pulling data into bio.tools, e.g. plugin-mechanism, so that other communities can write crawlers and annotate tools automatically
- workflows in bio.tools: modelling, visualisation and curation
- evaluation of EDAM Browser (see GitHub) ontology browser; issues, features and next steps
- bio.tools content from an end-user perspective: annotation consistency, EDAM coverage, content views etc
- integration of bio.tools and biocontainers.pro
- integration of bio.tools and Galaxy
If you're particularly interested in a topic, mail Jon Ison
Links & references