🤖 A curated list of open-source projects that automate scientific research — from literature review to idea generation, experiment execution, paper writing, and peer review.
📅 Star counts last verified: 2026-05-17
- 🧪 End-to-End Autonomous Research Systems
- 📚 Deep Research & Literature Synthesis
- ⚙️ Automated Experiment & Code Agent
- 🔧 Research Skills & Plugin Collections
- 📋 Awesome Lists & Surveys
- 💡 How This Differs from General AI Agent Lists
- 🤝 Contributing
Projects that automate the full research lifecycle: idea → experiment → paper.
| Project | Stars | Framework / Tools | Supported LLM APIs | Description |
|---|---|---|---|---|
| autoresearch | Custom (PyTorch, nanochat) | Anthropic Claude, OpenAI Codex | By Andrej Karpathy. 630-line AI agent that reads its own training script, forms hypotheses, modifies code, runs experiments, and evaluates results — hundreds of experiments overnight. | |
| AI-Scientist | Custom (templates, LaTeX pipeline) | OpenAI, Anthropic Claude, DeepSeek, Gemini, OpenRouter, open-weight models | The first comprehensive system for fully automated open-ended scientific discovery. Automates idea generation, coding, experiments, and manuscript writing. | |
| RD-Agent | Custom + LiteLLM, Docker, Streamlit, Qlib | OpenAI (GPT-4o/o1/o3), Azure OpenAI, DeepSeek; any LiteLLM provider | Microsoft. Automates R&D processes — factor/model evolution for quant, Kaggle automation, paper-to-code implementation. Top MLE-bench agent. | |
| AutoResearchClaw | OpenClaw + Docker, LaTeX (NeurIPS/ICML/ICLR), OpenAlex, Semantic Scholar | OpenAI (GPT-4o), OpenRouter, DeepSeek, MiniMax; Claude/Gemini/Kimi via ACP | Fully autonomous research: idea → literature retrieval → sandbox experiments → multi-agent peer review → LaTeX paper output. | |
| ARIS | Claude Code + MCP servers (Codex, llm-chat, Zotero, Obsidian) | Anthropic Claude, OpenAI GPT, GLM-5, MiniMax, Kimi, Qwen, DeepSeek, LongCat; any OpenAI-compatible | Claude Code skills for autonomous ML research: cross-model review loops, idea discovery, experiment automation, and paper writing. | |
| AI-Scientist-v2 | Custom (BFTS agentic tree search, AIDE) | OpenAI (o1/o3/GPT-4o), Anthropic (Bedrock), Gemini | Upgraded version using agentic tree search. Generated the first AI-written workshop paper accepted through peer review. | |
| Agent Laboratory | Custom multi-agent (arXiv, HuggingFace, LaTeX) | OpenAI (o1/o3/GPT-4o), DeepSeek | End-to-end autonomous research workflow with specialized agents for literature review, experimentation, and report writing. | |
| AI-Researcher | Custom + LiteLLM, Docker, Gradio | Anthropic, OpenAI, Gemini, DeepSeek, OpenRouter, GitHub AI (via LiteLLM) | NeurIPS 2025 Spotlight. Fully autonomous system covering literature review, hypothesis generation, algorithm implementation, and manuscript preparation. | |
| claude-scholar | Claude Code / Codex CLI / OpenCode, Zotero MCP, Obsidian, LaTeX | Anthropic Claude, OpenAI (via Codex) | Semi-automated academic research assistant covering ideation → coding → experiments → writing → publication. | |
| Biomni | Custom biomedical agent + code execution, datalake, know-how library | Anthropic Claude, OpenAI, Azure OpenAI, Gemini, Groq, AWS Bedrock, custom OpenAI-compatible APIs | Stanford. General-purpose biomedical AI agent that autonomously executes research tasks across biology and medicine, combining LLM reasoning, retrieval, and tool/code use. | |
| EvoScientist | LangChain + DeepAgents, Docker (Python 3.11 + Node.js 24) | Anthropic Claude, OpenAI, Google Gemini, MiniMax, NVIDIA NIM | Self-evolving AI Scientists. Six-agent team with persistent memory autonomously explores and iteratively improves. Built-in messaging channels (Slack/Discord/Telegram/Feishu/WeChat). | |
| DeepScientist | Custom (Bayesian optimization, Findings Memory, Research Map), Git worktrees, LaTeX | OpenAI (Codex CLI), Anthropic Claude, Moonshot Kimi, OpenCode; local backends | Local-first autonomous research studio. Findings Memory + Bayesian optimization orchestrate baseline reproduction → branched experiments → LaTeX paper drafts. | |
| DATAGEN | LangChain + LangGraph, MCP servers, Firecrawl | OpenAI, Anthropic Claude, Gemini, Ollama, Groq | AI-driven multi-agent research assistant automating hypothesis generation, data analysis, visualization, and report writing. | |
| Idea2Paper | AgentAlpha Framework (Multi-Agent), Vector DB, Knowledge Graph (KG) | DeepSeek V3/R1, Claude 3.5, GPT-4o; Semantic Scholar, ArXiv API | Advanced Research Idea Exploration Engine: Orchestrates multi-agent workflows for deep literature mining and KG alignment; Refines raw ideas into novel, structured research proposals. | |
| InternAgent | Custom (Aider for codegen, persistent memory), Conda; Google Search, Semantic Scholar | OpenAI (incl. OpenAI-compatible), Anthropic Claude | Shanghai AI Lab. Unified agentic framework for long-horizon autonomous discovery across physics, biology, earth, and life sciences — reaction yield, molecular dynamics, protein engineering, climate diagnostics. |
Projects focused on automated information gathering, literature review, and report generation.
| Project | Stars | Framework / Tools | Supported LLM APIs | Description |
|---|---|---|---|---|
| DeerFlow | LangChain + LangGraph, InfoQuest | Any OpenAI-compatible API (GPT-4, Gemini via OpenRouter, etc.) | ByteDance. Open-source SuperAgent harness. Orchestrates sub-agents, memory, and sandboxes for deep research, code generation, and report writing. | |
| STORM | DSPy + LiteLLM, Streamlit | All LiteLLM models (OpenAI, Azure, etc.); Search: You.com, Bing, Google, Brave, Tavily, SearXNG | Stanford. LLM-powered knowledge curation system that generates full-length Wikipedia-like articles with citations. Features Co-STORM. | |
| GPT Researcher | LangGraph, MCP, FastAPI, NextJS | OpenAI, Anthropic Claude, Gemini; any OpenAI-compatible API | Autonomous agent for deep web & local research. Generates 5-6 page factual reports with citations in PDF/Docx/Markdown. | |
| ChatPaper | PyMuPDF, arxiv.py, Flask, Docker | OpenAI (GPT-3.5/4) | Use ChatGPT to summarize arXiv papers, provide professional translation, paper polishing, peer review analysis, and reviewer response generation. | |
| Tongyi DeepResearch | Custom (ReAct, IterResearch, GRPO RL); Serper, Jina, SandboxFusion | OpenAI-compatible, OpenRouter; Tongyi-30B-A3B, Dashscope/Bailian | Alibaba. Agentic LLM (30.5B params, 3.3B activated) for long-horizon deep information-seeking. SOTA on multiple benchmarks. | |
| Open Deep Research | LangChain + LangGraph, MCP, LangSmith | OpenAI (GPT-5/4.1), Anthropic (Sonnet 4), OpenRouter, Ollama (local) | LangChain. Open-source deep research framework with configurable MCP tools and search APIs. | |
| PaperQA2 | Custom + LiteLLM, Pydantic, tantivy | OpenAI, Anthropic, Gemini, Ollama, llama.cpp; any LiteLLM provider | High-accuracy RAG for scientific documents. Dynamically retrieves full-text papers and iterates on answers. Published at ICLR. | |
| local-deep-research | LangChain + LangGraph, FastAPI, FAISS, SQLCipher, SearXNG | Ollama, LM Studio, llama.cpp (local); OpenAI, Anthropic, Gemini, OpenRouter | Local-first deep research agent reaching ~95% on SimpleQA with local LLMs. Integrates arXiv/PubMed/Semantic Scholar/Wikipedia and 10+ other sources with encrypted storage. | |
| DeepResearchAgent | Custom (Autogenesis self-evolution), MMEngine configs | OpenRouter (multi-model access) | Skywork. Hierarchical multi-agent system with top-level planning agent coordinating specialized lower-level agents. | |
| Auto-Deep-Research | AutoAgent Framework + LiteLLM, Docker | Anthropic, OpenAI, Gemini, Mistral, Groq, OpenRouter, DeepSeek; any OpenAI-compatible | Open-source, cost-efficient alternative to OpenAI's Deep Research. Universal LLM compatibility, zero-config launch. Strong GAIA Benchmark results. | |
| OpenScholar | Custom RAG (PyTorch, HuggingFace, Contriever) | OpenAI (GPT-4o), Llama 3.1 8B (self-hosted); Semantic Scholar API, You.com | Retrieval-augmented LM searching 45M open-access papers. Published in Nature. Outperforms PaperQA2 and Perplexity Pro. | |
| ChatReviewer | Python, tiktoken, Docker, HuggingFace Spaces | OpenAI (GPT-3.5/4) | Uses ChatGPT to analyze paper strengths/weaknesses, provide improvement suggestions, and auto-generate reviewer responses. Companion to ChatPaper. | |
| OpenResearcher | Megatron-LM (training), vLLM (serving), HuggingFace, Tevatron, BM25 + Qwen3-Embedding, Serper | OpenResearcher-30B-A3B (open-weight release); OpenAI API (scoring) | Fully open training + inference pipeline for long-horizon deep research. Releases 30B-A3B model, surpassing GPT-4.1 and Claude Opus 4 on BrowseComp-Plus. |
Projects that automate coding, experiment execution, and iterative optimization. These serve as the "hands" of auto-research systems.
| Project | Stars | Framework / Tools | Supported LLM APIs | Description |
|---|---|---|---|---|
| AutoGPT | Custom (Agent Builder, workflow blocks), Docker | OpenAI, Anthropic, Groq, Llama, AI/ML API (300+ models) | One of the earliest autonomous AI agent frameworks. Includes Forge for agent creation, benchmarking suite, and user-friendly UI. | |
| OpenHands | Custom agentic framework, composable Python lib | Anthropic Claude, OpenAI GPT, MiniMax; any LLM | AI-driven software development platform. Autonomous coding agents that edit files, run commands, browse web. 72% on SWE-Bench Verified. | |
| Aider | Custom (AI pair-programming CLI), Git integration | Anthropic Claude, OpenAI, DeepSeek, OpenRouter, Ollama; nearly any LLM | AI pair programming in your terminal. Supports multi-file edits, git integration. Widely used as the coding backbone in research pipelines. | |
| SWE-agent | Custom (YAML-config-driven), purpose-built for research | OpenAI (GPT-4o), Anthropic (Sonnet 4, Claude 3.7); configurable | Princeton. Turns LLMs into software engineering agents that fix real GitHub issues. Pioneered the SWE-Bench benchmark. | |
| PaperBanana | Streamlit, OpenRouter | OpenAI, Anthropic, Gemini (via OpenRouter) | Reference-driven multi-agent framework for automated academic illustration. 5 specialized agents (Retriever, Planner, Stylist, Visualizer, Critic) produce publication-quality diagrams. | |
| MLE-agent | Python, Kaggle integration, arXiv, Papers with Code | OpenAI, Anthropic Claude, Ollama (Llama3), Mistral | Intelligent companion for ML engineering and research. Integrates with arXiv and Papers with Code for better code/research plans. Auto-debugging. | |
| AIDE | Python, Streamlit, Docker | OpenAI (GPT-4-turbo/4o), Anthropic Claude, Gemini, Ollama (local) | AI-Driven Exploration in the Space of Code. LLM agent that writes, evaluates, and improves ML code via agentic tree search. [paper] 4x more Kaggle medals than best linear agent. Hosted platform: Weco AI. |
Reusable skill sets and plugin ecosystems that integrate with coding agents (Claude Code, Codex, Gemini CLI, etc.) to enable research workflows.
| Project | Stars | Framework / Tools | Supported LLM APIs | Description |
|---|---|---|---|---|
| scientific-agent-skills | PyTorch Lightning, scikit-learn, BioPython, RDKit, DeepChem, Scanpy, OpenMM | Agent-agnostic (Claude Code, Cursor, Codex, Gemini CLI) | 133 ready-to-use scientific skills across bioinformatics, drug discovery, clinical research, medical imaging, and materials science. | |
| AI-Research-SKILLs | DeepSpeed, vLLM, LangChain, W&B, MLflow, and 80+ frameworks | Agent-agnostic (Claude Code, Codex, Gemini CLI, Qwen Code) | 86 skills across 22 categories covering the full AI research lifecycle: literature review, idea generation, experimentation, and paper authoring. | |
| OpenClaw-Medical-Skills | BioPython, GATK, Scanpy, RDKit, DeepChem, OpenMM, AlphaFold, pysam, MDAnalysis | Claude-based agents via OpenClaw / NanoClaw frameworks | 869 medical AI skills spanning clinical reports, genomics, drug discovery, bioinformatics, structural biology, and biomedical databases. |
Curated collections and survey papers on the auto-research landscape.
| Project | Stars | Description |
|---|---|---|
| awesome-autoresearch | Curated index of autonomous improvement loops, research agents, and autoresearch-style systems inspired by Karpathy's autoresearch. 50+ entries. | |
| awesome-ai-for-science | Curated list of AI tools, libraries, papers, datasets, and frameworks for scientific discovery across physics, chemistry, biology, and materials. | |
| Autonomous-Agents | Daily-updated curated collection of research papers on autonomous LLM agents. Covers multi-agent systems, scientific computing, robotics, and more. | |
| Awesome-Deep-Research | Curated collection of deep research agents — industry products, open-source implementations, 70+ recent papers, and benchmarks through early 2026. |
This list focuses specifically on automating the scientific research process — not general-purpose AI agents. We include projects that target one or more stages of the research lifecycle:
📖 Literature Review → 💡 Idea Generation → 🔍 Novelty Check → 📐 Experiment Design →
💻 Code Implementation → 🚀 Experiment Execution → 📊 Result Analysis → ✍️ Paper Writing → 📝 Peer Review
General-purpose coding agents (OpenHands, Aider, SWE-agent) are included because they serve as critical infrastructure for the experiment execution stage.
PRs welcome! Please ensure the project:
- Has 500+ GitHub stars (or is exceptionally notable with a top-venue publication)
- Is directly related to automating scientific research
- Is open-source with an active repository
Please keep entries sorted by star count (descending) within each category.
