This repository organizes a timeline of key events (products, services, papers, GitHub, blog posts and news) related AGI and frontierAI. It's a part of GenAI timeline.
It's curating a variety of information in this timeline, with a particular focus on LLM and Generative AI.
Issues and Pull Requests are greatly appreciated. If you've never contributed to an open source project before I'm more than happy to walk you through how to create a pull request.
You can start by opening an issue describing the problem that you're looking to resolve and we'll go from there.
arXiv ❌, PDF 📎, arxiv-vanity 📙, paper page 🏠, papers with code ✳️, Github
This document is licensed under the MIT license © Jonghong Jeon(전종홍)
- 05/17 - OpenAI dissolves team focused on long-term AI risks, less than one year after announcing it
(News), - 05/17 - International Scientific Report on the Safety of Advanced AI
(Blog), - 05/17 - Google DeepMind launches new framework to assess the dangers of AI models
(News), - 05/17 - Deepfakes and LLMs: Free will neural network for AI safety research
(News), - 05/16 - White House Unveils AI Safety Framework for US Workers
(News), - 05/16 - Testing the reliability of an AI-based large language model to extract ecological information from the scientific literature
(News), - 05/16 - Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 05/16 - How to Hit Pause on AI Before It’s Too Late
(News), - 05/16 - How Far Are We From AGI
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 05/16 - GPT Store Mining and Analysis
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 05/15 - The Challenges of Regulating AI and the Role of Behavioral Science
(Blog), - 05/15 - Google’s invisible AI watermark will help identify generative text and video
(News), - 05/15 - Google I/O 2024: everything announced
(Blog), - 05/15 - Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 05/14 - US and China to hold first talks to reduce risk of AI ‘miscalculation’
(News), - 05/14 - Google’s generative AI can now analyze hours of video
(Blog), - 05/13 - LLM Theory of Mind and Alignment: Opportunities and Risks
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 05/13 - How Much Research Is Being Written by Large Language Models?
(Blog), - 05/13 - Hello GPT-4o
(Blog), - 05/13 - GPT-4o first reactions: ‘essentially AGI’
(Blog), - 05/10 - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 05/10 - INSPECT - An open-source framework for large language model evaluations
(Blog), - 05/10 - AI Safety Institute releases new AI safety evaluations platform
(News), - 05/08 - Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 05/06 - UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 05/06 - Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 05/04 - A Literature Review and Framework for Human Evaluation of Generative Large Language Models in Healthcare
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/29 - NIST AI RMF Generative AI Profile
(News), - 04/29 - Artificial General Intelligence (AGI)-Native Wireless Systems: A Journey Beyond 6G
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/26 - Over 20 Technology and Critical Infrastructure Executives, Civil Rights Leaders, Academics, and Policymakers Join New DHS Artificial Intelligence Safety and Security Board to Advance AI’s Responsible Development and Deployment
(News), - 04/24 - The Ethics of Advanced AI Assistants
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/24 - MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/22 - Mechanistic Interpretability for AI Safety -- A Review
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/22 - Holistic Safety and Responsibility Evaluations of Advanced AI Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/19 - How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/16 - U.S. Commerce Secretary Gina Raimondo Announces Expansion of U.S. AI Safety Institute Leadership Team
(News), - 04/16 - Social Choice for AI Alignment: Dealing with Diverse Human Feedback
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/16 - Announcing a Benchmark to Improve AI Safety
(News), - 04/15 - LLM Evaluators Recognize and Favor Their Own Generations
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/15 - Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/15 - Opus can operate as a Turing machine
(twitter), - 04/15 - MathGPT: Leveraging Llama 2 to create a platform for highly personalized learning
- 04/15 - Learn Your Reference Model for Real Good Alignment
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/15 - GPT-4 rivals doctors in many medical exams - and beats them in psychiatry
(News), - 04/14 - TransformerFAM: Feedback attention is working memory
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/14 - TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/14 - On Speculative Decoding for Multimodal Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/12 - The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?
(❌) - 04/12 - Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/12 - Pre-training Small Base LMs with Fewer Tokens
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/12 - Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/12 - Is ChatGPT Transforming Academics' Writing Style?
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/12 - Dataset Reset Policy Optimization for RLHF
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/12 - AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/12 - The good, the bad, and the Humane Pin
(News), - 04/12 - Grok-1.5 Vision Preview
(Demo), - 04/11 - The Necessity of AI Audit Standards Boards
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/11 - Remembering Transformer for Continual Learning
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/11 - WESE: Weak Exploration to Strong Exploitation for LLM Agents
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/11 - OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/11 - UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/11 - SWE-agent
(twitter), (Demo), , () - 04/11 - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/11 - OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/11 - JetMoE: Reaching Llama2 Performance with 0.1M Dollars
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) (Project), (twitter), , (✳️), () - 04/11 - From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/11 - Context-aware Video Anomaly Detection in Long-Term Datasets
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/11 - ChatGPT-3.5, Claude 3 kick pixelated butt in Street Fighter III tournament for LLMs
(News), - 04/11 - ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/11 - Best Practices and Lessons Learned on Synthetic Data for Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/11 - Benchmark LLMs by fighting in Street Fighter 3
(Demo), , () - 04/11 - AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/10 - LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/10 - GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/10 - OpenAI and Meta are on the verge of releasing AI models capable of reasoning like humans, report says
(News), - 04/10 - MetaCheckGPT -- A Multi-task Hallucination Detector Using LLM Uncertainty and Meta-models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/10 - Incremental XAI: Memorable Understanding of AI with Incremental Explanations
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/10 - CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/10 - Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Generative Agents
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/09 - Large Language Models to the Rescue: Deadlock Resolution in Multi-Robot Systems
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/09 - Autonomous Evaluation and Refinement of Digital Agents
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/09 - AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/09 - AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/09 - Take a Look at it! Rethinking How to Evaluate Language Model Jailbreak
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/09 - Privacy Preserving Prompt Engineering: A Survey
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/09 - On Evaluating the Efficiency of Source Code Generated by LLMs
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/09 - CausalBench: A Comprehensive Benchmark for Causal Learning Capability of Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/09 - Apple just unveiled new Ferret-UI LLM — this AI can read your iPhone screen
(News), - 04/09 - AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/08 - An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Mapping
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/08 - The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/08 - The Fact Selection Problem in LLM-Based Program Repair
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/08 - SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/08 - Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/08 - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/08 - Evaluating Interventional Reasoning Capabilities of Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/08 - CodecLM: Aligning Language Models with Tailored Synthetic Data
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/08 - AutoCodeRover: Autonomous Program Improvement
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/08 - HAMMR: HierArchical MultiModal React agents for generic VQA
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/07 - LLM-Based Multi-Agent Systems for Software Engineering: Vision and the Road Ahead
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/07 - AI2Apps: A Visual IDE for Building LLM-based AI Agent Applications
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/07 - MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/07 - Data Bias According to Bipol: Men are Naturally Right and It is the Role of Women to Follow Their Lead
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/06 - Aligning Diffusion Models by Optimizing Human Utility
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/06 - The Case for Developing a Foundation Model for Planning-like Tasks from Scratch
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/06 - Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/06 - Challenges Faced by Large Language Models in Solving Multi-Agent Flocking
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/06 - Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/05 - Increased LLM Vulnerabilities from Fine-tuning and Quantization
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/05 - Exploring Autonomous Agents through the Lens of Large Language Models: A Review
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/05 - Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/04 - Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) (twitter), - 04/04 - Evaluating LLMs at Detecting Errors in LLM Responses
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/04 - CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/04 - Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/04 - CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/04 - Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/04 - RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/04 - Language Model Evolution: An Iterated Learning Perspective
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/04 - Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/04 - Designing for Human-Agent Alignment: Understanding what humans want from their agents
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/03 - Concept-Guided LLM Agents for Human-AI Safety Codesign
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/03 - Responsible Reporting for Frontier AI Development
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/03 - MIMIR: A Streamlined Platform for Personalized Agent Tuning in Domain Expertise
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/03 - Many-shot jailbreaking
(❌) - 04/03 - ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/02 - UK & United States announce partnership on science of AI safety
(News), - 04/02 - Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/02 - CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/02 - A Survey on Large Language Model-Based Game Agents
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/01 - LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 04/01 - Large Language Model Evaluation Via Multi AI Agents: Preliminary results
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 04/01 - Stream of Search (SoS): Learning to Search in Language
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/01 - U.S., U.K. Announce Partnership to Safety Test AI Models
(News), - 04/01 - Evalverse: Unified and Accessible Library for Large Language Model Evaluation
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 04/01 - Are large language models superhuman chemists?
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/31 - "My agent understands me better": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 03/30 - Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/30 - Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/30 - A Taxonomy for Human-LLM Interaction Modes: An Initial Exploration
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 03/30 - Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/29 - LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/29 - DeepMind develops SAFE, an AI-based app that can fact-check LLMs
(News), - 03/27 - Large Language Models Need Consultants for Reasoning: Becoming an Expert in a Complex Human System Through Behavior Simulation
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/27 - A Path Towards Legal Autonomy: An interoperable and explainable approach to extracting, transforming, loading and computing legal information using large language models, expert systems and Bayesian networks
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/26 - MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/26 - InternLM2 Technical Report
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/25 - AI Consciousness is Inevitable: A Theoretical Computer Science Perspective
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/25 - TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/25 - Towards Automatic Evaluation for LLMs' Clinical Capabilities: Metric, Data, and Algorithm
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/25 - RepairAgent: An Autonomous, LLM-Based Agent for Program Repair
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/25 - An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/25 - LLM Agent Operating System
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/23 - When LLM-based Code Generation Meets the Software Development Process
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/23 - EduAgent: Generative Student Agents in Learning
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/22 - Content Knowledge Identification with Multi-Agent Large Language Models (LLMs)
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/22 - LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/22 - Can large language models explore in-context?
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/21 - VidLA: Video-Language Alignment at Scale
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/21 - General Assembly adopts landmark resolution on artificial intelligence
(News), - 03/21 - PeerGPT: Probing the Roles of LLM-based Peer Agents as Team Moderators and Participants in Children's Collaborative Learning
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/20 - Polaris: A Safety-focused LLM Constellation Architecture for Healthcare
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/20 - Mapping LLM Security Landscapes: A Comprehensive Stakeholder Risk Assessment Proposal
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/20 - Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/20 - Evaluating Frontier Models for Dangerous Capabilities
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/19 - When Do We Not Need Larger Vision Models?
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/19 - Evolutionary Optimization of Model Merging Recipes
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), ([:octocat:](https://github.com/ sakanaai/evolutionary-model-merge)![GitHub Repo stars](https://img.shields.io/github/stars/ sakanaai/evolutionary-model-merge?style=social)) - 03/19 - Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/19 - Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/19 - LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/19 - Embodied LLM Agents Learn to Cooperate in Organized Teams
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/19 - Characteristic AI Agents via Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), (![GitHub Repo stars](https://img.shields.io/github/stars/nuaa-nlp/character100 ?style=social)) - 03/18 - Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 03/18 - How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/18 - Can LLM-Augmented autonomous agents cooperate?, An evaluation of their cooperative capabilities through Melting Pot
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/18 - Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/18 - From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/18 - Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/17 - PhD: A Prompted Visual Hallucination Evaluation Dataset
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/17 - Beyond Static Evaluation: A Dynamic Approach to Assessing AI Assistants' API Invocation Capabilities
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/16 - Do Large Language Models understand Medical Codes?
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/15 - Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/14 - Scaling Instructable Agents Across Many Simulated Worlds
(twitter), (Blog), - 03/14 - Helpful or Harmful? Exploring the Efficacy of Large Language Models for Online Grooming Prevention
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/13 - The Human Factor in Detecting Errors of Large Language Models: A Systematic Literature Review and Future Research Directions
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/13 - Human Alignment of Large Language Models through Online Preference Optimisation
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/11 - Transparent AI Disclosure Obligations: Who, What, When, Where, Why, How
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 03/11 - HILL: A Hallucination Identifier for Large Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 03/11 - TIME - Exclusive: U.S. Must Move ‘Decisively’ to Avert ‘Extinction-Level’ Threat From AI, Government-Commissioned Report Says (News),
- 03/11 - TIME - Employees at Top AI Labs Fear Safety Is an Afterthought, Report Says (News),
- 03/11 - Stealing Part of a Production Language Model
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 03/11 - Multistep Consistency Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/11 - Chain-of-table: Evolving tables in the reasoning chain for table understanding (Blog),
- 03/11 - An Action Plan to increase the safety and security of advanced AI (Blog), (Video),
- 03/10 - Beyond human intelligence: Claude 3.0 and the quest for AGI
(Blog), - 03/09 - Algorithmic progress in language models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/09 - RAG arena (Demo),
- 03/08 - Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/08 - Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 03/08 - Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/08 - Now available on Poe: Claude 3 (Demo),
- 03/07 - Teaching Large Language Models to Reason with Reinforcement Learning
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/07 - Meet ‘Liberated Qwen’, an uncensored LLM that strictly adheres to system prompts (News),
- 03/07 - How Far Are We from Intelligent Visual Deductive Reasoning?
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/07 - Evaluating LLM models at scale (Blog),
- 03/07 - Common 7B Language Models Already Possess Strong Math Capabilities
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/07 - Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/07 - Can Large Language Models Reason and Plan?
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) - 03/06 - SaulLM-7B: A pioneering Large Language Model for Law
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/06 - Learning to Decode Collaboratively with Multiple Language Models
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/06 - Is AGI Getting Closer? Anthropic's Claude 3 Opus Model Shows Glimmers of Metacognitive Reasoning (News)
- 03/05 - OpenAI and Elon Musk (Blog),
- 03/05 - AIs ranked by IQ; AI passes 100 IQ for first time, with release of Claude-3 (News),
- 03/05 - WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/05 - SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/05 - Revisiting Meta-evaluation for Grammatical Error Correction
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/05 - Online Learning of Human Constraints from Feedback in Shared Autonomy
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/05 - MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/05 - KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 03/05 - Interactive Continual Learning: Fast and Slow Thinking (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 03/05 - InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 03/05 - Generative Software Engineering (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS)
- 03/05 - Exploring the Limitations of Large Language Models in Compositional Relation Reasoning (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 03/05 - Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 03/05 - Design2Code: How Far Are We From Automating Front-End Engineering? (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 03/05 - CURATRON: Complete Robust Preference Data for Robust Alignment of Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 03/05 - Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 03/05 - An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Models are Task-specific Classifiers (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 3/5 - India asks tech firms to seek approval before releasing 'unreliable' AI tools (News),
- 03/04 - Large language models surpass human experts in predicting neuroscience results
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 03/04 - The Claude 3 Model Family: Opus, Sonnet, Haiku (❌) (twitter), , (✳️)
- 03/04 - Enhancing LLM Safety via Constrained Direct Preference Optimization (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 03/04 - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), ()
- 03/04 - CatCode: A Comprehensive Evaluation Framework for LLMs On the Mixture of Code and Text (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 03/04 - Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 03/04 - adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), ()
- 3/4 - Why OpenAI’s nonprofit mission to build AGI is under fire — again | The AI Beat (News),
- 3/4 - SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 3/4 - NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 3/4 - Build AI for a Better Future (twitter), (News),
- 3/3 - The AGI Lawsuit: Elon Musk vs. OpenAI and the Quest for Artificial General Intelligence that Benefits Humanity (Blog,
- 3/3 - Nvidia CEO Jensen Huang says AI could pass most human tests in 5 years (News
- 3/2 - Nvidia CEO says AI could pass human tests in five years (News
- 3/2 - LAB: Large-Scale Alignment for ChatBots (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 3/2 - Evaluating Large Language Models as Virtual Annotators for Time-series Physical Sensing Data (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 3/1 - Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 3/1 - Formulation Comparison for Timeline Construction using LLMs (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 3/1 - Elon Musk sues OpenAI and CEO Sam Altman over contract breach (News),
- 3/1 - DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 3/1 - Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 02/29 - Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS) (twitter), - 2/29 - OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/29 - NewsBench: Systematic Evaluation of LLMs for Writing Proficiency and Safety Adherence in Chinese Journalistic Editorial Applications (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/29 - Let LLMs Take on the Latest Challenges! A Chinese Dynamic Question Answering Benchmark (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/29 - GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/29 - Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: A Benchmark Study (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/29 - Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 02/28 - Position Paper: Agent AI Towards a Holistic Intelligence
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 02/28 - Evaluating LLMs Through a Federated, Scenario-Writing Approach (Blog),
- 2/28 - Organizational AGI is coming – most companies aren’t prepared (Blog),
- 2/28 - MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/28 - From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/28 - FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/28 - Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/28 - Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/28 - CogBench: a large language model walks into a psychology lab (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/28 - CLLMs: Consistency Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/28 - Cause and Effect: Can Large Language Models Truly Understand Causality? (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/28 - Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/28 - ‘Baby AGI’ could be a reality in early 2025: SingularityNET founder (News),
- 02/27 - A High Level Guide to LLM Evaluation Metrics (Blog),
- 2/27 - When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/27 - Users Say Microsoft's AI Has Alternate Personality as Godlike AGI That Demands to Be Worshipped (News),
- 2/27 - TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/27 - The Emergence of Large Language Models in Static Analysis: A First Look through Micro-Benchmarks (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS)
- 2/27 - ShapeLLM: Universal 3D Object Understanding for Embodied Interaction (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/27 - Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/27 - Investigating Continual Pretraining in Large Language Models: Insights and Implications (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/27 - How the “Frontier” Became the Slogan of Uncontrolled AI (Blog),
- 2/27 - Google DeepMind CEO on AGI, OpenAI and Beyond – MWC 2024 (News),
- 2/27 - Evaluating Very Long-Term Conversational Memory of LLM Agents (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/27 - Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/27 - Benchmarking Data Science Agents (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/27 - Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/27 - A Language Model based Framework for New Concept Placement in Ontologies (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/26 - Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/26 - MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/26 - LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/26 - HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (![GitHub Repo stars](https://img.shields.io/github/stars/FloatAI/HumanEval-XL ?style=social))
- 2/26 - Benchmarking LLMs on the Semantic Overlap Summarization Task (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/26 - A Comprehensive Evaluation of Quantization Strategies for Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/25 - HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/25 - Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/24 - SportQA: A Benchmark for Sports Understanding in Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/24 - OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/24 - Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/24 - Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/23 - KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/23 - Google DeepMind C.E.O. Demis Hassabis on the Path From Chatbots to A.G.I. (News)
- 2/23 - Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS)
- 2/23 - AttributionBench: How Hard is Automatic Attribution Evaluation? (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/22 - Visual Hallucinations of Multi-modal Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/22 - Unintended Impacts of LLM Alignment on Global Representation (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/22 - UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/22 - tinyBenchmarks: evaluating LLMs with fewer examples (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/22 - The European Commitment to Human-Centered Technology: The Integral Role of HCI in the EU AI Act's Success (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/22 - Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/22 - MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/22 - MeTMaP: Metamorphic Testing for Detecting False Vector Matching Problems in LLM Augmented Generation (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS)
- 2/22 - Identifying Multiple Personalities in Large Language Models with External Evaluation (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/22 - Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/22 - ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/21 - SaGE: Evaluating Moral Consistency in Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/21 - RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/21 - Potential and Challenges of Model Editing for Social Debiasing (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/21 - OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/21 - OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 2/21 - LLM Jailbreak Attack versus Defense Techniques -- A Comprehensive Study (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/21 - KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/21 - Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/21 - Hallucinations or Attention Misdirection? The Path to Strategic Value Extraction in Business Using Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/21 - GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/21 - Factual Consistency Evaluation of Summarisation in the Era of Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/21 - CriticBench: Evaluating Large Language Models as Critic (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/21 - BIRCO: A Benchmark of Information Retrieval Tasks with Complex Objectives (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/21 - Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/21 - Bench: Extending Long Context Evaluation Beyond 100K Tokens (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 02/20 - Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 2/20 - What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/20 - DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/20 - An Autonomous Large Language Model Agent for Chemical Literature Data Mining (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/20 - A Survey on Knowledge Distillation of Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 02/19 - Simulacra as Conscious Exotica (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 02/19 - A Critical Evaluation of AI Feedback for Aligning Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), ()
- 2/19 - WildFake: A Large-scale Challenging Dataset for AI-Generated Images Detection (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/19 - TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/19 - Evolving AI Collectives to Enhance Human Diversity and Enable Self-Regulation (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/19 - EmoBench: Evaluating the Emotional Intelligence of Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/18 - ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 2/18 - KMMLU: Measuring Massive Multitask Language Understanding in Korean (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/18 - Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 2/17 - M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/16 - Comparing Hallucination Detection Metrics for Multilingual Generation (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/15 - Taxonomy-based CheckList for Large Language Model Evaluation (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS)
- 2/15 - Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/15 - Exploring the Adversarial Capabilities of Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/15 - AMAZON AGI TEAM SAY THEIR AI IS SHOWING "EMERGENT ABILITIES" (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS) (News
- 2/14 - Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/14 - AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 2/13 - Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk (News)
- 2/8 - Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 02/08 - Biden-Harris Administration Announces First-Ever Consortium Dedicated to AI Safety
(News), - 02/07 - AISIC Working Groups
(Blog), - 02/07 - AISIC Members
(Blog), - 02/07 - A Roadmap to Pluralistic Alignment
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 2/7 - Advancing Explainable AI Toward Human-Like Intelligence: Forging the Path to Artificial Brain (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 02/05 - UK AI Safety Institute: third progress report
(Blog) - 02/05 - Governance of Generative Artificial Intelligence for Companies
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 2/5 - AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 1/31 - Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 1/29 - Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 01/23 - Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 1/23 - Unsocial Intelligence: a Pluralistic, Democratic, and Participatory Investigation of AGI Discourse (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 1/22 - Generative AI-Driven Human Digital Twin in IoT-Healthcare: A Comprehensive Survey (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 1/22 - Detecting Multimedia Generated by Large AI Models: A Survey (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 1/21 - MedLM: Exploring Language Models for Medical Question Answering Systems (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 1/21 - Interactive AI with Retrieval-Augmented Generation for Next Generation Networking (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 01/18 - WHO - Ethics and governance of artificial intelligence for health: Guidance on large multi-modal models (News),
- 1/17 - Caught in the Quicksand of Reasoning, Far from AGI Summit: Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 1/11 - A Universal Knowledge Model and Cognitive Architecture for Prototyping AGI (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 01/10 - Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training (✳️), ()
- 1/9 - A Taxonomy for AI Hazard Analysis (❌) , (SS)
- 01/08 - PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLM (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 1/6 - Human-Instruction-Free LLM Self-Alignment with Limited Samples (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 1/5 - The EU AI Act: A pioneering effort to regulate frontier AI? (❌) , (SS)
- 1/3 - A Review of Findings from Neuroscience and Cognitive Psychology as Possible Inspiration for the Path to Artificial General Intelligence (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 1/2 - A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 1/1 - TrustLLM: Trustworthiness in Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 12/30 - Responses to catastrophic AGI risk: a survey (❌) , (SS)
- 12/18 - From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscap (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 12/14 - CERN for AGI: A Theoretical Framework for Autonomous Simulation-Based Artificial Intelligence Testing and Alignment (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 12/12 - Hallucination Augmented Contrastive Learning for Multimodal Large Language Model (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 12/10 - Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 12.11 - METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 12/7 - Testing LLM performance on the Physics GRE: some observations (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 11/30 - TaskBench: Benchmarking Large Language Models for Task Automation (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), ()
- 11/28 - Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), () - 11/28 - MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 11/28 - Deepfakes, Misinformation, and Disinformation in the Era of Frontier AI, Generative AI, and Large AI Models (❌) , (SS)
- 11/23 - Connecting the Dots in Trustworthy Artificial Intelligence: From AI Principles, Ethics, and Key Requirements to Responsible AI Systems and Regulation (❌) , (✳️), (SS)
- 11/21 - GAIA: a benchmark for General AI Assistant (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 11/21 - A Survey of Graph Meets Large Language Model: Progress and Future Directions (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 11/19 - Meta Prompting for AGI Systems (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 11/15 - Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 11/15 - Distinguishing Fact from Fiction: A Benchmark Dataset for Identifying Machine-Generated Scientific Papers in the LLM Era. (❌) , (SS)
- 11/15 - Artificial General Intelligence, Existential Risk, and Human Risk Perception (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 11/13 - An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 11/10 - Testing LLMs on Code Generation with Varying Levels of Prompt Specificity (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 11/10 - How to Bridge the Gap between Modalities: A Comprehensive Survey on Multimodal Large Language Model (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 11/10 - Advanced AI Governance: A Literature Review of Problems, Options, and Proposals (❌) , (SS)
- 11/4 - Levels of AGI: Operationalizing Progress on the Path to AGI (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 11/3 - Don't Make Your LLM an Evaluation Benchmark Cheater (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 11/3 - AlignBench: Benchmarking Chinese Alignment of Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 11/2 - Evil Geniuses: Delving into the Safety of LLM-based Agents (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 10/30 - Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 10/30 - Evaluating Large Language Models: A Comprehensive Survey (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 10/30 - AI Alignment: A Comprehensive Survey (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 10/25 - Safety and security risks of generative artificial intelligence to 2025 (Annex B) (❌)
- 10/25 - Future risks of frontier AI (Annex A) (❌)
- 10/25 - Frontier AI: capabilities and risks – discussion paper (❌)
- 10/25 - Frontier AI: capabilities and risks – discussion paper (❌)
- 10/25 - AI Hazard Management: A framework for the systematic management of root causes for AI risks (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 10/24 - Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 10/23 - Systematic AI Approach for AGI: Addressing Alignment, Energy, and AGI Grand Challenges (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 10/20 - Oversight for Frontier AI through a Know-Your-Customer Scheme for Compute Providers (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 10/13 - Multinational AGI Consortium (MAGIC): A Proposal for International Coordination on AI (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 10/01 - RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️), ()
- 9/30 - Deployment Corrections: An incident response framework for frontier AI models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 9/29 - LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 9/26 - Large Language Model Alignment: A Survey (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 9/24 - LLM for Test Script Generation and Migration: Challenges, Capabilities, and Opportunities (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 09/20 - Explosive growth from AI automation: A review of the arguments
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 9/19 - OpenCog Hyperon: A Framework for AGI at the Human Level and Beyond (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️)
- 9/19 - MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 9/17 - ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 9/15 - Self-Assessment Tests are Unreliable Measures of LLM Personality (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 9/14 - Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 9/14 - The Rise and Potential of Large Language Model Based Agents: A Survey (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 9/13 - Pretraining on the Test Set Is All You Need (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 9/12 - A Proposal for a Definition of General Purpose Artificial Intelligence Systems (❌) , (SS)
- 9/5 - Artificial General Intelligence for Radiation Oncology (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 9/4 - Concepts is All You Need: A More Direct Path to AGI (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 8/31 - An overview of research on human-centered design in the development of artificial general intelligence (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 08/27 - MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
(❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️) - 8/13 - The risks associated with Artificial General Intelligence: A systematic review (❌) , (SS)
- 8/12 - A new solution and concrete implementation steps for Artificial General Intelligence (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 8/10 - Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 8/7 - Why We Don't Have AGI Yet (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 8/5 - A criterion for Artificial General Intelligence: hypothetic-deductive reasoning, tested on ChatGPT (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 8/3 - Digital twin brain: a bridge between biological intelligence and artificial intelligence (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 7/28 - RSGPT: A Remote Sensing Vision Language Model and Benchmark (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 7/26 - General Purpose Artificial Intelligence Systems (GPAIS): Properties, Definition, Taxonomy, Open Challenges and Implications (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 7/24 - Aligning Large Language Models with Human: A Survey (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 7/17 - Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 7/16 - Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 7/12 - A Comprehensive Overview of Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 7/7 - Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 7/6 - Frontier AI Regulation: Managing Emerging Risks to Public Safety (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 7/6 - A Survey on Evaluation of Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 7/3 - Review of Large Vision Models and Visual Prompt Engineering (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 6/30 - Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), ()
- 6/26 - Kosmos-2: Grounding Multimodal Large Language Models to the World (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 6/23 - MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 6/23 - A Survey on Multimodal Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 6/19 - Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest Cost (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 6/17 - Large Language Models for Telecom: The Next Big Thing? (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 6/14 - Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 6/9 - Judging LLM-as-a-judge with MT-Bench and Chatbot Arena (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 6/8 - PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 6/8 - Artificial General Intelligence for Medical Imaging (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 6/7 - The Two Word Test: A Semantic Benchmark for Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 6/5 - Transformative AGI by 2043 is <1% likely (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 5/30 - GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 5/28 - Managing the risks of artificial general intelligence: A human factors and ergonomics perspective (❌) , (SS)
- 5/26 - AGI labs need an internal audit function (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 05/25 - Role-Play with Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- 5/23 - LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 5/22 - Reflective Linguistic Programming (RLP): A Stepping Stone in Socially-Aware AGI (SocialAGI) (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 5/14 - A Comprehensive Survey on Segment Anything Model for Vision and Beyond (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 5/11 - Towards best practices in AGI safety and governance: A survey of expert opinion (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (SS)
- 5/4 - Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 4/26 - Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 4/19 - Fundamental Limitations of Alignment in Large Language Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 4/13 - AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 4/10 - OpenAGI: When LLM Meets Domain Experts (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 4/4 - One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 3/2 - Sparks of Artificial General Intelligence: Early experiments with GPT-4 (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 2/18 - A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 2/13 - An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), ()
- 12/20 - Benchmarking Spatial Relationships in Text-to-Image Generation (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS), () 2020
- 11/12 - A Survey of Artificial General Intelligence Projects for Ethics, Risk, and Policy (❌) , (SS) 2016
- 4/5 - Evaluation of General-Purpose Artificial Intelligence : Why , What & How (❌) , (SS)
- 4/2 - The AGI Containment Problem (❌), (📖), (📎), (📙), (🏠), (HTML), (SP), (GS), (SS), (✳️), (SS)
- 2/19 - Additional Comments on the “White Paper: On Artificial Intelligence - A European approach to excellence and trust” (❌) , (SS) 2017
- 02/15 - Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm (❌), (📖), (📎), (📙), (🏠), (HTML), (SL), (SP), (GS), (SS), (✳️)
- Awesome Korean LLM
- Awesome-LLMOps
- Language Model Evaluation Harness
- A collection of papers and resources related to evaluations on large language models
- Awesome-Healthcare-Foundation-Models
- LLM-evaluation
- Awesome-LLM
- Examples and guides for using the OpenAI API
- Ultimate-Awesome-Transformer-Attention
- Awesome Segment Anything
- Segment Anything Model (SAM) for Medical Image Segmentation
- GPT-4登場以降に出てきたChatGPT/LLMに関する論文や技術の振り返り
- LLM Collection
- 🤗 Open LLM Leaderboard
- AI Incident Database
- Daily papers by AK
- Awesome-Generative-RecSys - A curated list of Generative Recommender Systems (Paper & Code)
- Prompt Engineering Guide - papers -
- awesome-ChatGPT-repositories
- The Rundown
- WEEKLY PAPERS
- Primo.ai LLM wiki
- ML Papers of the Week
- CS 324 - Advances in Foundation Models
- ML timeline
- ChatGPT Timeline
- OpenAI Timeline
- LLM Explained: The LLM Training Landscape