# Introduction

This notebook demonstrates how to run the MatCalc-Benchmark. We will use the recently released TensorNet-MatPES-PBE-v2025.1-PES and M3GNet-MatPES-PBE-v2025.1-PES universal machine learning interatomic potentials for demonstration purposes. All that is needed to run the benchmark on a separate model is to provide a compatible ASE Calculator for your UMLIP.

In [None]:
from __future__ import annotations

import warnings
import pandas as pd

from matcalc.utils import PESCalculator
from matcalc.benchmark import run_elasticity_benchmark

# Elasticity Benchmark

For demonstration purposes only, we will sample 100 structures from the entire test dataset.

In [None]:
results = {}
for model_name in [
    "M3GNet-MatPES-PBE-v2025.1-PES",
    "TensorNet-MatPES-PBE-v2025.1-PES",
]:
    calculator = PESCalculator.load_universal(model_name)
    short_name = model_name.split("-")[0]
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        results[short_name] = run_elasticity_benchmark(calculator, short_name, n_samples=100, seed=2025)

  self.element_refs = AtomRef(property_offset=torch.tensor(element_refs, dtype=matgl.float_th))
  self.register_buffer("data_mean", torch.tensor(data_mean, dtype=matgl.float_th))
  self.register_buffer("data_std", torch.tensor(data_std, dtype=matgl.float_th))
  pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
  pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
  pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
  pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
  pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
  pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
  pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
  pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
  pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
  pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
  pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
  pos[natoms:] = self.logm(pos[natom

In [None]:
df = pd.merge(results["M3GNet"], results["TensorNet"], on='mp_id', how='inner')

In [None]:
# To dump the results to a csv file, uncomment the code below.
# results.to_csv("MatCalc-Benchmark-elasticity.csv")

In [None]:
mae_k_tensornet = df["AE K TensorNet"].mean()
mae_k_m3gnet = df["AE K M3GNet"].mean()
mae_g_tensornet = df["AE G TensorNet"].mean()
mae_g_m3gnet = df["AE G M3GNet"].mean()

print(f"MAE K_TensorNet = {mae_k_tensornet:.1f}")
print(f"MAE K_M3GNet = {mae_k_m3gnet:.1f}")
print(f"MAE G_TensorNet = {mae_g_tensornet:.1f}")
print(f"MAE G_M3GNet = {mae_g_m3gnet:.1f}")

MAE K_TensorNet = 20.9
MAE K_M3GNet = 33.9
MAE G_TensorNet = 12.9
MAE G_M3GNet = 16.9


# Statistical significance test

When comparing the performance of models, it is important to not just look at the final MAE but also to perform a rigorous statistical test of whether there is a significant difference between the MAEs. Since the model predictions are for the same set of compounds, this can be done using the paired t-test. See: https://www.jmp.com/en/statistics-knowledge-portal/t-test/two-sample-t-test

In [None]:
from scipy.stats import ttest_rel

In [None]:
print(ttest_rel(df["AE K TensorNet"], df["AE K M3GNet"]))
print(ttest_rel(df["AE G TensorNet"], df["AE G M3GNet"]))

TtestResult(statistic=-3.242337016035638, pvalue=0.0016161285714116364, df=99)
TtestResult(statistic=-2.1084779949176915, pvalue=0.03751396338840221, df=99)


At an alpha of 5%, the p values show that we **reject the null hypothesis that the MAEs in K and G of the two models are the same**, i.e., the differences in MAEs of the two models are statistically signficiant for both K and G.