Agentic ML refers to autonomous AI systems that can plan, execute, and iterate on machine learning workflows with minimal human intervention—from data preprocessing to model training, evaluation, and deployment.
🤖 This resource list is maintained with the help of Claude Opus 4.5.
End-to-end platforms and frameworks for building agentic ML systems.
| Project | Description | Stars |
|---|---|---|
| AutoGluon | Open-source AutoML toolkit by Amazon with foundational models and LLM agents. | |
| Karpathy | Agentic ML Engineer using Claude Code SDK and Google ADK. By K-Dense. | |
| K-Dense Web | Autonomous AI Scientist platform with dual-loop multi-agent system for research, coding, and ML. | - |
LLM-powered agents for automated machine learning pipelines.
| Project | Description | Stars |
|---|---|---|
| AIDE | AI-powered data science agent using tree search for solution exploration. | |
| AIRA-dojo | Meta's AI research agents using search policies (Greedy, MCTS, Evolutionary). | |
| AutoGluon Assistant | Multi-agent system for end-to-end multimodal ML automation. Also known as MLZero. | |
| AutoMind | Adaptive agent with expert knowledge base from 455 Kaggle competitions and tree search. By ZJU NLP. | |
| AutoML-Agent | Multi-Agent LLM Framework for Full-Pipeline AutoML. | |
| FM Agent | Baidu's foundation model agent for ML engineering tasks. | |
| InternAgent | ML engineering agent with DeepSeek-R1 integration. | |
| MLE-STAR | Google's ML engineering agent using web search and targeted code block refinement. Built with ADK. | - |
| ML-Master | AI-for-AI agent integrating exploration and reasoning with adaptive memory. By SJTU SAI. | |
| OpenHands | Open-source AI software development agent adaptable to ML tasks. | |
| R&D-Agent | Microsoft's research & development agent for ML tasks. | |
| SELA | Tree-Search Enhanced LLM Agents for AutoML using MCTS. Part of MetaGPT. |
Academic papers on agentic ML, autonomous ML systems, and LLM-based ML agents.
Papers introducing benchmarks and evaluation methodologies for agentic ML systems.
-
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering (2024) - Paper | Code
Benchmark by OpenAI with 75 Kaggle competitions for evaluating ML engineering agents. -
MLE-Smith: Scaling MLE Tasks with Automated Multi-Agent Pipeline (2025) - Paper
Automated pipeline transforming raw datasets into competition-style MLE challenges. -
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation (ICML 2024) - Paper
Benchmark for evaluating LLM agents on ML research tasks including model training and debugging. -
MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research (2025) - Paper
Benchmark with 201 research tasks from NeurIPS, ICLR, and ICML. Includes MLR-Judge for automated evaluation. -
DataSciBench: An LLM Agent Benchmark for Data Science (2025) - Paper | Code
Comprehensive benchmark with Task-Function-Code (TFC) framework for rigorous evaluation of LLMs on data science tasks.
Frameworks using multiple specialized agents for end-to-end ML pipelines.
-
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML (ICML 2025) - Paper | Code
Multi-agent system with data, model, and operation agents for full-pipeline automation. -
LightAutoDS-Tab: Multi-AutoML Agentic System for Tabular Data (2025) - Paper | Code
Combines LLM-based code generation with multiple AutoML tools (AutoGluon, LightAutoML, FEDOT). -
MLZero: A Multi-Agent System for End-to-end Machine Learning Automation (NeurIPS 2025) - Paper | Code
Transforms raw multimodal data into ML solutions with zero human intervention. -
SmartDS-Solver: Agentic AI for Vertical Domain Problem Solving in Data Science (ICLR 2026 Submission) - Paper
Reasoning-centric system with SARTE algorithm for data science problem solving.
Papers using tree search, MCTS, or structured planning for ML workflow optimization.
-
AI Research Agents for Machine Learning (2025) - Paper | Code
Formalizes AI research agents as search policies with operators. Compares Greedy, MCTS, and Evolutionary strategies. -
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science (2025) - Paper | Code
Features curated expert knowledge base from 455 Kaggle competitions, agentic knowledgeable tree search, and self-adaptive coding strategy. -
I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search (2025) - Paper | Code
Introspective node expansion with hybrid LLM-estimated and actual performance rewards. -
MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement (2025) - Paper | Blog
Uses web search to retrieve models and targeted code block refinement via ablation studies. -
ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning (2025) - Paper | Code
Integrates exploration and reasoning with adaptive memory mechanism. -
PiML: Automated Machine Learning Workflow Optimization using LLM Agents (AutoML 2025) - Paper
Persistent iterative framework with adaptive memory and systematic debugging. -
SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning (2024) - Paper | Code
Leverages MCTS to expand the search space with insight pools.
Agentic systems tailored for specific ML domains.
-
AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific ML (2025) - Paper
Specialized agents propose, critique, and refine SciML solutions. -
AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research (NeurIPS 2025 Position) - Paper
Position paper on AI automation for scientific discovery with multi-agent systems to simulate research societies. -
ClimateAgent: Multi-Agent Orchestration for Complex Climate Data Science Workflows (TMLR) - Paper
Multi-agent framework for end-to-end climate data analytics with dynamic API awareness and self-correction. -
The AI Cosmologist: Agentic System for Automated Data Analysis (2025) - Paper
Automates cosmological data analysis from idea generation to research dissemination. -
TS-Agent: Structured Agentic Workflows for Financial Time-Series Modeling (2025) - Paper
Modular framework for financial forecasting with structured knowledge banks.
Using LLMs for specific ML optimization tasks.
- Using Large Language Models for Hyperparameter Optimization (2023) - Paper
Iterative HPO via LLM prompting. Matches or outperforms Bayesian optimization in limited-budget settings.
Pre-trained models that enable rapid ML development.
-
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second (ICLR 2023) - Paper | Code
Prior-Data Fitted Network using in-context learning for instant tabular classification. -
Unlocking the Full Potential of Data Science Requires Tabular Foundation Models, Agents, and Humans (NeurIPS 2025 Position) - Paper
Position paper on collaborative systems integrating agents, tabular foundation models, and human experts for data science.
Benchmarks and datasets for evaluating agentic ML systems.
| Benchmark | Description | Link |
|---|---|---|
| AutoML-Agent Benchmark | 18 diverse datasets across tabular, CV, NLP, time-series, and graph tasks. | Paper |
| DataSciBench | Comprehensive data science benchmark with TFC framework for LLM evaluation. | Paper | GitHub |
| GAIA | General AI Assistants benchmark testing real-world reasoning and tool use. | Paper |
| MLE-bench | Kaggle-based benchmark for ML engineering agents by OpenAI. 75 competitions. | Paper | GitHub |
| MLAgentBench | Benchmark for LLM agents on ML experimentation tasks. | Paper |
| MLR-Bench | Open-ended ML research benchmark with 201 tasks from major ML conferences. | Paper |
Contributions are welcome! To add a project or paper, simply open an issue or submit a PR.
To the extent possible under law, the authors have waived all copyright and related rights to this work.
