Skip to content

merlresearch/llmphy

Repository files navigation

LLMPhy: Parameter-Identifiable Physical Reasoning
Combining Large Language Models and Physics Engines

Mitsubishi Electric Research Labs (MERL), Cambridge, MA

ArXiv Dataset License: AGPL

Teaser Most learning-based approaches to complex physical reasoning overlook the crucial challenge of parameter identification (e.g., mass, friction) that governs scene dynamics—despite its importance in real-world applications such as collision avoidance and robotic manipulation. We present LLMPhy, a black-box optimization framework that integrates large language models (LLMs) with physics simulators for physical reasoning. LLMPhy bridges the textbook physical knowledge embedded in LLMs with world models implemented in modern physics engines, enabling the construction of digital twins of input scenes through the estimation of latent parameters.

LLMPhy decomposes digital twin construction into two phases: Phase 1 estimates continuous physical parameters, and Phase 2 estimates discrete scene layout parameters. For each phase, LLMPhy iteratively prompts an LLM (GPT in our case) to generate Python programs with parameter estimates, executes them in the physics engine to reconstruct the scene, and uses the resulting reconstruction error as feedback to refine the LLM’s predictions. We use the MuJoCo physics engine in our implementation.

The code shared here implements the core functionalities of LLMPhy, including Python API interfaces between the LLM and MuJoCo, prompts used in both phases, and evaluation of generated solutions against ground truth. As existing physical reasoning benchmarks rarely account for parameter identifiability, we introduce a new dataset—LLMPhy-TraySim—designed to evaluate the physical reasoning capability of modern LLMs in a zero-shot setting. The official LLMPhy-TraySim dataset is shared separately, but the code provided can also be used to generate new data samples for both phases using the simulator.

Model Architecture

In the LLMPhy architecture, an LLM is prompted with multi-view images and object motion video sequences to synthesize Python code characterizing the underlying physics and object layout. The code is executed in the simulator producing scene reconstructions, which are matched to the inputs producing error. In the next iteration, the LLM is prompted to improvise its estimations to reduce the reconstruction error.

LLMPhy executes in two phases. In phase 1, the unobservable physics parameters of the objects are estimated from the given multi-view video sequences. In phase 2, the scene layout, i.e., where each object is placed in the scene, is estimated from the multi-view input images. The parameters estimated from the two phases are then used to reconstruct the scene shown in the input image in the physics engine, followed by setting the pusher into motion with the given velocity, followed by extracting the steady state of the scene after the impact to select the answers from the given options. LLMPhy uses Python programs for the LLM to interact with the simulator.

Teaser

Prerequisites

Installation

# clone project
conda create -n LLMPhy python=3.7 # we use MuJoCo version 2.10.
conda activate LLMPhy

Note: We use MuJoCo 2.1.0 with mujoco_py in this repository. Some older notes may refer to this as "2.10". Please install MuJoCo and mujoco_py before proceeding with the rest of the installation.

# install dependencies
pip install -r requirements.txt

macOS (Apple Silicon) notes

On Apple Silicon Macs, the legacy mujoco_py stack is easiest to run from an x86_64 conda environment under Rosetta. In our testing, an arm64 conda environment combined with /usr/local MuJoCo / GCC tooling failed during the mujoco_py build step with an error similar to gcc-9: error: this compiler does not support arm64.

Use the following sequence on a similar macOS setup:

  1. Install Rosetta if it is not already present.
softwareupdate --install-rosetta --agree-to-license
  1. Open an x86_64 shell and create an x86_64 conda environment. The example below uses Python 3.7, which was the configuration we verified locally for mujoco_py.
arch -x86_64 /bin/bash
source /Users/cherian/miniforge3/etc/profile.d/conda.sh
CONDA_SUBDIR=osx-64 conda create -n LLMPhy python=3.7
conda activate LLMPhy
conda config --env --set subdir osx-64

If you are validating the setup in a differently named environment, replace LLMPhy with that environment name in the commands below.

  1. Install the x86_64 compiler dependencies from Homebrew.
arch -x86_64 /bin/bash -lc "brew install gcc libomp"
  1. Download MuJoCo 2.1.0 for macOS and unpack it so that the directory ~/.mujoco/mujoco210 exists.
mkdir -p ~/.mujoco
tar -xzf mujoco210-macos-x86_64.tar.gz -C ~/.mujoco
  1. Point mujoco_py at MuJoCo and force a compatible compiler if needed. On newer Apple Silicon/macOS toolchains, we also needed to relax the incompatible-pointer-types diagnostic during the first mujoco_py build.
export MUJOCO_PY_MUJOCO_PATH=$HOME/.mujoco/mujoco210
export CC=/usr/local/bin/gcc-14
export CXX=/usr/local/bin/g++-14
export CFLAGS="-Wno-error=incompatible-pointer-types -Wno-incompatible-pointer-types"
  1. Install the Python dependencies for this repository.
pip install -r requirements.txt
pip install mujoco-py==2.1.2.14
  1. Verify that mujoco_py imports successfully before running the dataset generator.
python -c "import mujoco_py; print(mujoco_py.__file__)"

If the import succeeds, the environment is ready for the data-generation and evaluation scripts. On our Apple Silicon setup, the import printed duplicate-GLFW warnings from the MuJoCo bundle and Homebrew GLFW, but still completed successfully. When running on Apple Silicon, make sure you stay inside this x86_64 conda environment for all commands that rely on mujoco_py.

Dataset Download

Download the LLMPhy-TraySim dataset from Zenodo for formally evaluating on our benchmark. (TODO: Attach links here)

Data Generation

You may create additional simulation data using the phy_data_creator tool. To produce a new TraySim instances (for both phase 1 and 2), please run the following script. See the code for other command line options.

On Apple Silicon, run the command from the working x86_64 MuJoCo environment described above.

python phy_data_creator.py \
  --generate_dataset \
  --num_examples 4 \
  --expt_id 123 \
  --max_objs 9 \
  --data_root ./test/

The above code will produce the following directory ./test/examples_num_objs_9_expt_123_size_4/ with 4 examples in it. The code above will also produce a llmphy_twophase_9_123_4.npy file that includes all the ground truth and other physics meta information. Use the option --render_to_viewer to render the simulations on the screen.

Zero-Shot Inference Using GPT-4.1

For executing the phase 1 and 2 models, please the script below. Note: You need to set the OpenAI API key in api_keys.py

python main.py \
    --puzzle_root ./test/examples_num_objs_9_expt_123_size_4/ \
    --puzzle_info_file llmphy_twophase_9_123_4.npy \
    --expt_id 1234 \
    --model_name gpt-4.1  \
    --phase all

Use --no_verbose to avoid printing all the logs and prompts in the command window.

Citation

@inproceedings{Cherian2026_llmphy,
  author    = {Cherian, Anoop and Corcodel, Radu and Jain, Siddarth and Romeres, Diego},
  title     = {LLMPhy: Parameter-Identifiable Physical Reasoning Combining Large Language Models and Physics Engines},
  booktitle = {The 29th International Conference on Artificial Intelligence and Statistics (AISTATS)},
  year      = {2026},
}

License

Released under AGPL-3.0-or-later license, as found in the LICENSE.md file.

Copyright (C) 2026 Mitsubishi Electric Research Laboratories (MERL)

SPDX-License-Identifier: AGPL-3.0-or-later

About

Release code for llmphy physical reasoning model using LLMs

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages