LLMPhy: Parameter-Identifiable Physical Reasoning
Combining Large Language Models and Physics Engines
Most learning-based approaches to complex physical reasoning overlook the crucial challenge of parameter identification (e.g., mass, friction) that governs scene dynamics—despite its importance in real-world applications such as collision avoidance and robotic manipulation. We present LLMPhy, a black-box optimization framework that integrates large language models (LLMs) with physics simulators for physical reasoning. LLMPhy bridges the textbook physical knowledge embedded in LLMs with world models implemented in modern physics engines, enabling the construction of digital twins of input scenes through the estimation of latent parameters.
LLMPhy decomposes digital twin construction into two phases: Phase 1 estimates continuous physical parameters, and Phase 2 estimates discrete scene layout parameters. For each phase, LLMPhy iteratively prompts an LLM (GPT in our case) to generate Python programs with parameter estimates, executes them in the physics engine to reconstruct the scene, and uses the resulting reconstruction error as feedback to refine the LLM’s predictions. We use the MuJoCo physics engine in our implementation.
The code shared here implements the core functionalities of LLMPhy, including Python API interfaces between the LLM and MuJoCo, prompts used in both phases, and evaluation of generated solutions against ground truth. As existing physical reasoning benchmarks rarely account for parameter identifiability, we introduce a new dataset—LLMPhy-TraySim—designed to evaluate the physical reasoning capability of modern LLMs in a zero-shot setting. The official LLMPhy-TraySim dataset is shared separately, but the code provided can also be used to generate new data samples for both phases using the simulator.
In the LLMPhy architecture, an LLM is prompted with multi-view images and object motion video sequences to synthesize Python code characterizing the underlying physics and object layout. The code is executed in the simulator producing scene reconstructions, which are matched to the inputs producing error. In the next iteration, the LLM is prompted to improvise its estimations to reduce the reconstruction error.
LLMPhy executes in two phases. In phase 1, the unobservable physics parameters of the objects are estimated from the given multi-view video sequences. In phase 2, the scene layout, i.e., where each object is placed in the scene, is estimated from the multi-view input images. The parameters estimated from the two phases are then used to reconstruct the scene shown in the input image in the physics engine, followed by setting the pusher into motion with the given velocity, followed by extracting the steady state of the scene after the impact to select the answers from the given options. LLMPhy uses Python programs for the LLM to interact with the simulator.
# clone project
conda create -n LLMPhy python=3.7 # we use MuJoCo version 2.10.
conda activate LLMPhyNote: We use MuJoCo 2.1.0 with mujoco_py in this repository. Some older notes may refer to this as "2.10". Please install MuJoCo and mujoco_py before proceeding with the rest of the installation.
# install dependencies
pip install -r requirements.txtOn Apple Silicon Macs, the legacy mujoco_py stack is easiest to run from an x86_64 conda environment under Rosetta. In our testing, an arm64 conda environment combined with /usr/local MuJoCo / GCC tooling failed during the mujoco_py build step with an error similar to gcc-9: error: this compiler does not support arm64.
Use the following sequence on a similar macOS setup:
- Install Rosetta if it is not already present.
softwareupdate --install-rosetta --agree-to-license- Open an
x86_64shell and create anx86_64conda environment. The example below uses Python 3.7, which was the configuration we verified locally formujoco_py.
arch -x86_64 /bin/bash
source /Users/cherian/miniforge3/etc/profile.d/conda.sh
CONDA_SUBDIR=osx-64 conda create -n LLMPhy python=3.7
conda activate LLMPhy
conda config --env --set subdir osx-64If you are validating the setup in a differently named environment, replace LLMPhy with that environment name in the commands below.
- Install the
x86_64compiler dependencies from Homebrew.
arch -x86_64 /bin/bash -lc "brew install gcc libomp"- Download MuJoCo 2.1.0 for macOS and unpack it so that the directory
~/.mujoco/mujoco210exists.
mkdir -p ~/.mujoco
tar -xzf mujoco210-macos-x86_64.tar.gz -C ~/.mujoco- Point
mujoco_pyat MuJoCo and force a compatible compiler if needed. On newer Apple Silicon/macOS toolchains, we also needed to relax theincompatible-pointer-typesdiagnostic during the firstmujoco_pybuild.
export MUJOCO_PY_MUJOCO_PATH=$HOME/.mujoco/mujoco210
export CC=/usr/local/bin/gcc-14
export CXX=/usr/local/bin/g++-14
export CFLAGS="-Wno-error=incompatible-pointer-types -Wno-incompatible-pointer-types"- Install the Python dependencies for this repository.
pip install -r requirements.txt
pip install mujoco-py==2.1.2.14- Verify that
mujoco_pyimports successfully before running the dataset generator.
python -c "import mujoco_py; print(mujoco_py.__file__)"If the import succeeds, the environment is ready for the data-generation and evaluation scripts. On our Apple Silicon setup, the import printed duplicate-GLFW warnings from the MuJoCo bundle and Homebrew GLFW, but still completed successfully. When running on Apple Silicon, make sure you stay inside this x86_64 conda environment for all commands that rely on mujoco_py.
Download the LLMPhy-TraySim dataset from Zenodo for formally evaluating on our benchmark. (TODO: Attach links here)
You may create additional simulation data using the phy_data_creator tool. To produce a new TraySim instances (for both phase 1 and 2), please run the following script. See the code for other command line options.
On Apple Silicon, run the command from the working x86_64 MuJoCo environment described above.
python phy_data_creator.py \
--generate_dataset \
--num_examples 4 \
--expt_id 123 \
--max_objs 9 \
--data_root ./test/The above code will produce the following directory ./test/examples_num_objs_9_expt_123_size_4/ with 4 examples in it. The code above will also produce a llmphy_twophase_9_123_4.npy file that includes all the ground truth and other physics meta information. Use the option --render_to_viewer to render the simulations on the screen.
For executing the phase 1 and 2 models, please the script below. Note: You need to set the OpenAI API key in api_keys.py
python main.py \
--puzzle_root ./test/examples_num_objs_9_expt_123_size_4/ \
--puzzle_info_file llmphy_twophase_9_123_4.npy \
--expt_id 1234 \
--model_name gpt-4.1 \
--phase allUse --no_verbose to avoid printing all the logs and prompts in the command window.
@inproceedings{Cherian2026_llmphy,
author = {Cherian, Anoop and Corcodel, Radu and Jain, Siddarth and Romeres, Diego},
title = {LLMPhy: Parameter-Identifiable Physical Reasoning Combining Large Language Models and Physics Engines},
booktitle = {The 29th International Conference on Artificial Intelligence and Statistics (AISTATS)},
year = {2026},
}
Released under AGPL-3.0-or-later license, as found in the LICENSE.md file.
Copyright (C) 2026 Mitsubishi Electric Research Laboratories (MERL)
SPDX-License-Identifier: AGPL-3.0-or-later
