FAQ "AI for drug discovery" #15

fkiraly · 2025-05-18T21:19:09Z

fkiraly
May 18, 2025
Maintainer

Answers to collected questions from applications and e-mail.

Project set-up

how will collaborations with domain experts - such as researchers in biochemistry, molecular biology, or structural bioinformatics - will be facilitated during the project? Will participants have access to mentorship or regular check-ins with experts from ecoSPECS or partnering research labs to guide the AI-driven aptamer design process?

Yes, there will be a team of mentors, with mentors from both the open source software/AI and from the domain.

Will there be opportunities to stay involved in the ecosystem after the project ends - for example, contributing to follow-up research, open-source tooling, or community initiatives.

Hopefully, you help create tools that will provide useful features ot people around the world - and hopefully, you will remain involved in their mainenance, and building a research community of practice around it!

ecoSPECS are also looking to build their capabilities in the AI space, so follow-up in the form of employment or freelancing may also be on the table (no promises or definitive knowledge though, from my side).

Is there any scope for involvement, as a non-selected participants—such as continuing with the proposed project, contributing to related research, or engaging with the community in a meaningful way?

Of course! This is the same as for any other openly structured open source project.

Once the projects are "ready for contributions" - and that will hopefully be soon, but not right away - it is very possible to become part of the initiative!

How do you envision governance of this open-source ecosystem? Will there be a steering committee, a code review process, formal release cadence?

On the technical side, community maintainership with one of the usual core developer models on GitHub. Licenses to be permissive.

On the governance side: an open governance model, with complexity adapting as the community grows.

Is there a plan to integrate wet-lab validation in later phases, and if so, how might contributors from the AI side support that?

Only as absolute stretch goal, and depending on opportunities of collaboration with universities or commercial partners. Support would primarily be indirect, through providing part of the software infrastructure basis.

Would early contributions like pretrained model integration, data ingestion modules, or pipeline improvements be eligible for co-authorship or open publication?

Yes, of course - we subscribe to best academic practice. You would of course be the first (or main) author on any paper that is primarily about your contributions, and you would be a coauthor on papers with significant co-contribution.

Requirements

Is experience with open-source contributions required or beneficial to the project?

Not required, but beneficial and clearly a bonus in the application. It will factor in as one of the evaluation metrics.

what are the key milestones or performance targets (e.g., docking accuracy, throughput) you’d like to see in the first 3–6 months?

There are no strict milestones or performance targets. The set-up is a stipend, not a freelancing contract. The requirement is sensible engagement, and serious contribution towards the general goal. You will be offered mentoring by experts in open source and the application domain.

The team will of course have some concrete technical ideas on how to proceed, but this should not be confused with prescribed milestones.

Code and design

Will there be a starter codebase or template to build on, or should we develop everything from scratch?
Do you have an existing reference implementation that we should build against, or should we propose a new API from scratch?

The project is in a complete greenfield.

API design will be part of the challenge, but we do have some ideas to get you started.

Regarding "existing code": it may make sense to follow third party templates or contribute to existing repositories though, or to interface other packages - but it is likely that significant components have to be built from scratch.

Will there be opportunities for contributors to co-design benchmarks or data schemas collaboratively?

This is actually an expectation.

Will the main emphasis be on consolidating and standardizing current models and datasets into a common API, or is there any space within the internship to both discover and prototype new AI strategies for aptamer design as well?

There is space for research, especially in later stages or potential follow-on - but the focus for the start needs to be consolidation of software, data, APIs.

This is also clear from a research perspective, by the way: if you come up with a new method, how would you test it? Where would you get the data from?

What challenges do you foresee in creating interoperable standards for AI tools in the context of aptamer research?

That there is currently none, primarily.

Data

I do wish to know how the organizers of the applied project will handle privacy and ethical concerns, especially when working with sensitive clinical and genomic data?

For the start, we will use only publicly avaliable data sets which have been cleared by research ethics committees for publication.

Are there preferred formats or metadata standards you’d like us to adopt from Day 1?

No, this is part of the design task.

Resources

Will there be compute resources will be available for training and molecular simulations (e.g., GPU types, cluster access, cloud credits)?

There may be a small amount of compute available through university collaborations - but the focus is not on model training or running experiments. Hence if compute is to be used, it will be mostly for framework testing purposes.

I’m curious to know more about the datasets and tools that will be provided for the project. Will there be access to biological sequence data or pretrained models ?

Not in the scope of this project - however, there is a good amount of these available, in research grade repositories.

Domain - aptamer/medical

Also, for this ecosystem are we focusing on drugs of any particular disease or is it for all the drugs present in the world?

No, there is no focus on a specific disease. The focus is on the tool stack, i.e., methodology, and data.

how does the project plan to balance domain-specific biological complexity with general-purpose AI infrastructure—especially in terms of designing benchmarks and model evaluation metrics that are both scientifically rigorous and open-source friendly?

This is an open project and the stipend gives you quite a lot of freedom.

Will this project include aptamer simulation?

It may, this has to be discussed in trade-offs, and with domain experts.

Other projects

Does ecoSPECS have other projects that involve retrieving information from documents based on user queries? I’m particularly interested in applying Retrieval-Augmented Generation (RAG) in various scenarios.

The "cleanroom design" project has this as an aspect, though applications have already closed and the stipend has been allocated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ "AI for drug discovery" #15

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

FAQ "AI for drug discovery" #15

Uh oh!

fkiraly May 18, 2025 Maintainer

Project set-up

Requirements

Code and design

Data

Resources

Domain - aptamer/medical

Other projects

Replies: 0 comments

fkiraly
May 18, 2025
Maintainer