Open Catalyst 2020 Nudged Elastic Band (OC20NEB)
======================================================

## Overview
This is a validation dataset which was used to assess model performance in [CatTSunami: Accelerating Transition State Energy Calculations with Pre-trained Graph Neural Networks](https://arxiv.org/abs/2405.02078). It is comprised of 932 NEB relaxation trajectories. There are three different types of reactions represented: desorptions, dissociations, and transfers. NEB calculations allow us to find transition states. The rate of reaction is determined by the transition state energy, so access to transition states is very important for catalysis research. For more information, check out the paper.

## File Structure and Contents
The tar file contains 3 subdirectories: dissociations, desorptions, and transfers. As the names imply, these directories contain the converged DFT trajectories for each of the reaction classes. Within these directories, the trajectories are named to identify the contents of the file. Here is an example and the anatomy of the name:

```desorption_id_83_2409_9_111-4_neb1.0.traj```

1. `desorption` indicates the reaction type (dissociation and transfer are the other possibilities)
2. `id` identifies that the material belongs to the validation in domain split (ood - out of domain is th e other possibility)
3. `83` is the task id. This does not provide relavent information
4. `2409` is the bulk index of the bulk used in the ocdata bulk pickle file
5. `9` is the reaction index. for each reaction type there is a reaction pickle file in the repository. In this case it is the 9th entry to that pickle file
6. `111-4` the first 3 numbers are the miller indices (i.e. the (1,1,1) surface), and the last number cooresponds to the shift value. In this case the 4th shift enumerated was the one used.
7. `neb1.0` the number here indicates the k value used. For the full dataset, 1.0 was used so this does not distiguish any of the trajectories from one another.


The content of these trajectory files is the repeating frame sets. Despite the initial and final frames not being optimized during the NEB, the initial and final frames are saved for every iteration in the trajectory. For the dataset, 10 frames were used - 8 which were optimized over the neb. So the length of the trajectory is the number of iterations (N) * 10. If you wanted to look at the frame set prior to optimization and the optimized frame set, you could get them like this:

In [1]:
from __future__ import annotations

!wget https://dl.fbaipublicfiles.com/opencatalystproject/data/large_files/desorption_id_83_2409_9_111-4_neb1.0.traj

from ase.io import read

traj = read("desorption_id_83_2409_9_111-4_neb1.0.traj", ":")
unrelaxed_frames = traj[0:10]
relaxed_frames = traj[-10:]

--2025-12-11 19:18:16--  https://dl.fbaipublicfiles.com/opencatalystproject/data/large_files/desorption_id_83_2409_9_111-4_neb1.0.traj


Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 

52.84.217.124, 52.84.217.55, 52.84.217.5, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|52.84.217.124|:443... connected.
HTTP request sent, awaiting response... 

200 OK
Length: 10074935 (9.6M) [binary/octet-stream]
Saving to: ‘desorption_id_83_2409_9_111-4_neb1.0.traj’

          desorptio   0%[                    ]       0  --.-KB/s               

         desorption  16%[==>                 ]   1.60M  7.21MB/s               


2025-12-11 19:18:17 (30.4 MB/s) - ‘desorption_id_83_2409_9_111-4_neb1.0.traj’ saved [10074935/10074935]



## Download
|Splits |Size of compressed version (in bytes)  |Size of uncompressed version (in bytes)    | MD5 checksum (download link)   |
|---    |---    |---    |---    |
|ASE Trajectories   |1.5G  |6.3G   | [52af34a93758c82fae951e52af445089](https://dl.fbaipublicfiles.com/opencatalystproject/data/oc20neb/oc20neb_dft_trajectories_04_23_24.tar.gz)   |



## Use
One more note: We have not prepared an lmdb for this dataset. This is because it is NEB calculations are not supported directly in ocp. You must use the ase native OCP class along with ase infrastructure to run NEB calculations. Here is an example of a use:

In [2]:
import os

from ase.io import read
from ase.mep import DyNEB
from ase.optimize import BFGS
from fairchem.core import FAIRChemCalculator, pretrained_mlip

traj = read("desorption_id_83_2409_9_111-4_neb1.0.traj", ":")
images = traj[0:10]
predictor = pretrained_mlip.get_predict_unit("uma-s-1p1")

neb = DyNEB(images, k=1)
for image in images:
    image.calc = FAIRChemCalculator(predictor, task_name="oc20")

optimizer = BFGS(
    neb,
    trajectory="neb.traj",
)

# Use a small number of steps here to keep the docs fast during CI, but otherwise do quite reasonable settings.
fast_docs = os.environ.get("FAST_DOCS", "false").lower() == "true"
if fast_docs:
    optimization_steps = 20
else:
    optimization_steps = 300

conv = optimizer.run(fmax=0.45, steps=optimization_steps)
if conv:
    neb.climb = True
    conv = optimizer.run(fmax=0.05, steps=optimization_steps)

checkpoints/uma-s-1p1.pt:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

iso_atom_elem_refs.yaml:   0%|          | 0.00/9.00k [00:00<?, ?B/s]

form_elem_refs.yaml:   0%|          | 0.00/11.8k [00:00<?, ?B/s]



      Step     Time          Energy          fmax
BFGS:    0 19:18:33     -305.763014        5.169706


BFGS:    1 19:18:33     -305.691696       11.366598


BFGS:    2 19:18:34     -305.916311        1.889963


BFGS:    3 19:18:35     -305.932501        2.616029


BFGS:    4 19:18:36     -306.010363        2.264344


BFGS:    5 19:18:37     -306.003679        6.892219


BFGS:    6 19:18:38     -306.254764        9.617144


BFGS:    7 19:18:39     -306.224749        3.371027


BFGS:    8 19:18:40     -306.290789        4.665820


BFGS:    9 19:18:41     -306.315119        0.727079


BFGS:   10 19:18:42     -306.329410        0.653816


BFGS:   11 19:18:43     -306.357725        1.619299


BFGS:   12 19:18:44     -306.412193        1.941472


BFGS:   13 19:18:45     -306.441261        0.604942


BFGS:   14 19:18:47     -306.471008        0.560226


BFGS:   15 19:18:48     -306.495159        2.152754


BFGS:   16 19:18:49     -306.497854        0.480727


BFGS:   17 19:18:50     -306.504537        0.518279


BFGS:   18 19:18:51     -306.511322        0.711967


BFGS:   19 19:18:52     -306.508516        0.831479


BFGS:   20 19:18:53     -306.478012        1.208364


BFGS:   21 19:18:54     -306.508860        0.552510


BFGS:   22 19:18:55     -306.509962        0.379461


BFGS:   23 19:18:56     -306.396103        3.033869


BFGS:   24 19:18:57     -306.426626        1.008637


BFGS:   25 19:18:57     -306.391153        0.993778


BFGS:   26 19:18:58     -306.185268        0.915833


BFGS:   27 19:19:00     -306.127266        0.639657


BFGS:   28 19:19:01     -306.157968        0.670597


BFGS:   29 19:19:02     -306.240148        0.423650


BFGS:   30 19:19:03     -306.257910        0.528082


BFGS:   31 19:19:04     -306.257275        0.610258


BFGS:   32 19:19:05     -306.249781        0.648465


BFGS:   33 19:19:06     -306.257126        0.535923


BFGS:   34 19:19:07     -306.273638        0.435072


BFGS:   35 19:19:07     -306.310968        0.512952


BFGS:   36 19:19:08     -306.361048        0.544072


BFGS:   37 19:19:09     -306.432917        0.516363


BFGS:   38 19:19:10     -306.504013        0.484754


BFGS:   39 19:19:11     -306.531124        0.794182


BFGS:   40 19:19:12     -306.458769        1.429548


BFGS:   41 19:19:12     -306.300875        1.014482


BFGS:   42 19:19:13     -306.236753        0.794684


BFGS:   43 19:19:14     -306.260868        0.389892


BFGS:   44 19:19:15     -306.288946        0.345340


BFGS:   45 19:19:16     -306.316992        0.403709


BFGS:   46 19:19:17     -306.326790        0.517171


BFGS:   47 19:19:17     -306.307279        0.547021


BFGS:   48 19:19:18     -306.282932        0.419780


BFGS:   49 19:19:19     -306.273542        0.487014


BFGS:   50 19:19:20     -306.273205        0.294770


BFGS:   51 19:19:21     -306.276281        0.363938


BFGS:   52 19:19:22     -306.292076        0.294762


BFGS:   53 19:19:23     -306.317211        0.356967


BFGS:   54 19:19:24     -306.316715        0.358370


BFGS:   55 19:19:24     -306.309208        0.314177


BFGS:   56 19:19:25     -306.313618        0.270289


BFGS:   57 19:19:26     -306.321354        0.288102


BFGS:   58 19:19:27     -306.324963        0.254839


BFGS:   59 19:19:28     -306.331278        0.285684


BFGS:   60 19:19:29     -306.335564        0.245711


BFGS:   61 19:19:30     -306.332638        0.275227


BFGS:   62 19:19:31     -306.335577        0.215878


BFGS:   63 19:19:32     -306.344648        0.202006


BFGS:   64 19:19:33     -306.352693        0.214971


BFGS:   65 19:19:34     -306.349775        0.266403


BFGS:   66 19:19:35     -306.346077        0.269452


BFGS:   67 19:19:36     -306.365057        0.227446


BFGS:   68 19:19:36     -306.369831        0.383276


BFGS:   69 19:19:37     -306.326641        0.396749


BFGS:   70 19:19:38     -306.342115        0.213590


BFGS:   71 19:19:39     -306.350174        0.237270


BFGS:   72 19:19:40     -306.352743        0.122916


BFGS:   73 19:19:41     -306.351669        0.085205


BFGS:   74 19:19:42     -306.351932        0.073609


BFGS:   75 19:19:43     -306.355541        0.180107


BFGS:   76 19:19:43     -306.353340        0.217224


BFGS:   77 19:19:44     -306.350799        0.182767


BFGS:   78 19:19:45     -306.350725        0.153011


BFGS:   79 19:19:46     -306.352526        0.165186


BFGS:   80 19:19:47     -306.347433        0.244343


BFGS:   81 19:19:48     -306.342426        0.175545


BFGS:   82 19:19:49     -306.335630        0.201559


BFGS:   83 19:19:50     -306.340116        0.225061


BFGS:   84 19:19:51     -306.350677        0.265647


BFGS:   85 19:19:52     -306.351290        0.232156


BFGS:   86 19:19:53     -306.343902        0.313707


BFGS:   87 19:19:54     -306.336744        0.298311


BFGS:   88 19:19:55     -306.297838        0.402232


BFGS:   89 19:19:56     -306.283724        0.672844


BFGS:   90 19:19:57     -306.310186        1.146012


BFGS:   91 19:19:58     -306.356737        0.349238


BFGS:   92 19:19:59     -306.354839        0.223238


BFGS:   93 19:20:00     -306.353899        0.145738


BFGS:   94 19:20:01     -306.340233        0.201467


BFGS:   95 19:20:02     -306.321707        0.383005


BFGS:   96 19:20:03     -306.335466        0.271715


BFGS:   97 19:20:04     -306.354938        0.279353


BFGS:   98 19:20:05     -306.356661        0.214428


BFGS:   99 19:20:06     -306.357244        0.202457


BFGS:  100 19:20:07     -306.344637        0.290392


BFGS:  101 19:20:08     -306.329956        0.422440


BFGS:  102 19:20:09     -306.332952        0.356690


BFGS:  103 19:20:10     -306.358930        0.645853


BFGS:  104 19:20:11     -306.364460        0.386617


BFGS:  105 19:20:12     -306.361204        0.179058


BFGS:  106 19:20:13     -306.358429        0.251881


BFGS:  107 19:20:14     -306.358553        0.321853


BFGS:  108 19:20:15     -306.359930        0.329671


BFGS:  109 19:20:16     -306.353908        0.244575


BFGS:  110 19:20:16     -306.354607        0.214749


BFGS:  111 19:20:17     -306.355041        0.174525


BFGS:  112 19:20:18     -306.353693        0.174669


BFGS:  113 19:20:19     -306.330416        0.251957


BFGS:  114 19:20:20     -306.336599        0.225983


BFGS:  115 19:20:21     -306.349693        0.160531


BFGS:  116 19:20:22     -306.348516        0.132100


BFGS:  117 19:20:23     -306.348644        0.105912


BFGS:  118 19:20:24     -306.354425        0.105872


BFGS:  119 19:20:24     -306.359302        0.170799


BFGS:  120 19:20:25     -306.361452        0.165484


BFGS:  121 19:20:26     -306.361647        0.198285


BFGS:  122 19:20:27     -306.359629        0.144948


BFGS:  123 19:20:27     -306.361088        0.105630


BFGS:  124 19:20:28     -306.361984        0.033944
