Open Catalyst 2020 Nudged Elastic Band (OC20NEB)
======================================================

## Overview
This is a validation dataset which was used to assess model performance in [CatTSunami: Accelerating Transition State Energy Calculations with Pre-trained Graph Neural Networks](https://arxiv.org/abs/2405.02078). It is comprised of 932 NEB relaxation trajectories. There are three different types of reactions represented: desorptions, dissociations, and transfers. NEB calculations allow us to find transition states. The rate of reaction is determined by the transition state energy, so access to transition states is very important for catalysis research. For more information, check out the paper.

## File Structure and Contents
The tar file contains 3 subdirectories: dissociations, desorptions, and transfers. As the names imply, these directories contain the converged DFT trajectories for each of the reaction classes. Within these directories, the trajectories are named to identify the contents of the file. Here is an example and the anatomy of the name:

```desorption_id_83_2409_9_111-4_neb1.0.traj```

1. `desorption` indicates the reaction type (dissociation and transfer are the other possibilities)
2. `id` identifies that the material belongs to the validation in domain split (ood - out of domain is th e other possibility)
3. `83` is the task id. This does not provide relavent information
4. `2409` is the bulk index of the bulk used in the ocdata bulk pickle file
5. `9` is the reaction index. for each reaction type there is a reaction pickle file in the repository. In this case it is the 9th entry to that pickle file
6. `111-4` the first 3 numbers are the miller indices (i.e. the (1,1,1) surface), and the last number cooresponds to the shift value. In this case the 4th shift enumerated was the one used.
7. `neb1.0` the number here indicates the k value used. For the full dataset, 1.0 was used so this does not distiguish any of the trajectories from one another.


The content of these trajectory files is the repeating frame sets. Despite the initial and final frames not being optimized during the NEB, the initial and final frames are saved for every iteration in the trajectory. For the dataset, 10 frames were used - 8 which were optimized over the neb. So the length of the trajectory is the number of iterations (N) * 10. If you wanted to look at the frame set prior to optimization and the optimized frame set, you could get them like this:

In [1]:
from __future__ import annotations

!wget https://dl.fbaipublicfiles.com/opencatalystproject/data/large_files/desorption_id_83_2409_9_111-4_neb1.0.traj

from ase.io import read

traj = read("desorption_id_83_2409_9_111-4_neb1.0.traj", ":")
unrelaxed_frames = traj[0:10]
relaxed_frames = traj[-10:]

--2025-12-22 20:05:56--  https://dl.fbaipublicfiles.com/opencatalystproject/data/large_files/desorption_id_83_2409_9_111-4_neb1.0.traj


Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 13.227.74.9, 13.227.74.45, 13.227.74.12, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|13.227.74.9|:443... connected.
HTTP request sent, awaiting response... 

200 OK
Length: 10074935 (9.6M) [binary/octet-stream]
Saving to: ‘desorption_id_83_2409_9_111-4_neb1.0.traj’

          desorptio   0%[                    ]       0  --.-KB/s               


2025-12-22 20:05:57 (52.7 MB/s) - ‘desorption_id_83_2409_9_111-4_neb1.0.traj’ saved [10074935/10074935]



## Download
|Splits |Size of compressed version (in bytes)  |Size of uncompressed version (in bytes)    | MD5 checksum (download link)   |
|---    |---    |---    |---    |
|ASE Trajectories   |1.5G  |6.3G   | [52af34a93758c82fae951e52af445089](https://dl.fbaipublicfiles.com/opencatalystproject/data/oc20neb/oc20neb_dft_trajectories_04_23_24.tar.gz)   |



## Use
One more note: We have not prepared an lmdb for this dataset. This is because it is NEB calculations are not supported directly in ocp. You must use the ase native OCP class along with ase infrastructure to run NEB calculations. Here is an example of a use:

In [2]:
import os

from ase.io import read
from ase.mep import DyNEB
from ase.optimize import BFGS
from fairchem.core import FAIRChemCalculator, pretrained_mlip

traj = read("desorption_id_83_2409_9_111-4_neb1.0.traj", ":")
images = traj[0:10]
predictor = pretrained_mlip.get_predict_unit("uma-s-1p1")

neb = DyNEB(images, k=1)
for image in images:
    image.calc = FAIRChemCalculator(predictor, task_name="oc20")

optimizer = BFGS(
    neb,
    trajectory="neb.traj",
)

# Use a small number of steps here to keep the docs fast during CI, but otherwise do quite reasonable settings.
fast_docs = os.environ.get("FAST_DOCS", "false").lower() == "true"
if fast_docs:
    optimization_steps = 20
else:
    optimization_steps = 300

conv = optimizer.run(fmax=0.45, steps=optimization_steps)
if conv:
    neb.climb = True
    conv = optimizer.run(fmax=0.05, steps=optimization_steps)

checkpoints/uma-s-1p1.pt:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

iso_atom_elem_refs.yaml:   0%|          | 0.00/9.00k [00:00<?, ?B/s]

form_elem_refs.yaml:   0%|          | 0.00/11.8k [00:00<?, ?B/s]



      Step     Time          Energy          fmax
BFGS:    0 20:06:12     -305.763004        5.169705


BFGS:    1 20:06:13     -305.691696       11.366600


BFGS:    2 20:06:14     -305.916311        1.889963


BFGS:    3 20:06:15     -305.932501        2.616029


BFGS:    4 20:06:16     -306.010359        2.264344


BFGS:    5 20:06:17     -306.003679        6.892217


BFGS:    6 20:06:18     -306.254766        9.617801


BFGS:    7 20:06:19     -306.224739        3.371394


BFGS:    8 20:06:20     -306.290787        4.665998


BFGS:    9 20:06:21     -306.315126        0.727079


BFGS:   10 20:06:22     -306.329407        0.653700


BFGS:   11 20:06:23     -306.357742        1.618776


BFGS:   12 20:06:24     -306.412218        1.942354


BFGS:   13 20:06:25     -306.441248        0.604967


BFGS:   14 20:06:26     -306.471022        0.562842


BFGS:   15 20:06:27     -306.495146        2.154392


BFGS:   16 20:06:28     -306.497801        0.480829


BFGS:   17 20:06:29     -306.504539        0.520175


BFGS:   18 20:06:30     -306.511392        0.713341


BFGS:   19 20:06:31     -306.508583        0.827811


BFGS:   20 20:06:32     -306.477787        1.214061


BFGS:   21 20:06:33     -306.508945        0.552592


BFGS:   22 20:06:34     -306.509811        0.379351


BFGS:   23 20:06:35     -306.397648        2.984241


BFGS:   24 20:06:36     -306.426241        1.009586


BFGS:   25 20:06:37     -306.390331        0.993695


BFGS:   26 20:06:38     -306.186106        0.937312


BFGS:   27 20:06:39     -306.127434        0.637037


BFGS:   28 20:06:40     -306.157610        0.669011


BFGS:   29 20:06:41     -306.239571        0.423966


BFGS:   30 20:06:42     -306.257448        0.526656


BFGS:   31 20:06:43     -306.257114        0.610405


BFGS:   32 20:06:44     -306.249788        0.649705


BFGS:   33 20:06:45     -306.257301        0.535643


BFGS:   34 20:06:46     -306.273727        0.436347


BFGS:   35 20:06:47     -306.310896        0.513390


BFGS:   36 20:06:48     -306.360681        0.544665


BFGS:   37 20:06:49     -306.432335        0.516452


BFGS:   38 20:06:50     -306.503183        0.487069


BFGS:   39 20:06:51     -306.530498        0.793713


BFGS:   40 20:06:52     -306.458816        1.427546


BFGS:   41 20:06:52     -306.300497        1.018225


BFGS:   42 20:06:53     -306.236236        0.798480


BFGS:   43 20:06:54     -306.259910        0.391314


BFGS:   44 20:06:55     -306.288181        0.345674


BFGS:   45 20:06:56     -306.316395        0.403155


BFGS:   46 20:06:57     -306.326502        0.518615


BFGS:   47 20:06:57     -306.307571        0.554030


BFGS:   48 20:06:58     -306.283377        0.415880


BFGS:   49 20:06:59     -306.274111        0.485851


BFGS:   50 20:07:00     -306.273825        0.296021


BFGS:   51 20:07:01     -306.276478        0.364951


BFGS:   52 20:07:02     -306.292000        0.297636


BFGS:   53 20:07:03     -306.317480        0.355650


BFGS:   54 20:07:03     -306.317308        0.360455


BFGS:   55 20:07:04     -306.310020        0.308979


BFGS:   56 20:07:05     -306.314382        0.269388


BFGS:   57 20:07:06     -306.322018        0.279861


BFGS:   58 20:07:07     -306.325346        0.250440


BFGS:   59 20:07:08     -306.331175        0.287499


BFGS:   60 20:07:09     -306.335640        0.242666


BFGS:   61 20:07:10     -306.332949        0.271635


BFGS:   62 20:07:11     -306.335930        0.218203


BFGS:   63 20:07:12     -306.344887        0.200085


BFGS:   64 20:07:13     -306.352892        0.203200


BFGS:   65 20:07:14     -306.350017        0.255766


BFGS:   66 20:07:15     -306.346264        0.265914


BFGS:   67 20:07:15     -306.364763        0.227376


BFGS:   68 20:07:16     -306.370287        0.380261


BFGS:   69 20:07:17     -306.328245        0.414234


BFGS:   70 20:07:18     -306.342272        0.205954


BFGS:   71 20:07:19     -306.350819        0.240049


BFGS:   72 20:07:20     -306.352631        0.133323


BFGS:   73 20:07:21     -306.351442        0.117057


BFGS:   74 20:07:22     -306.352034        0.081792


BFGS:   75 20:07:22     -306.356048        0.213848


BFGS:   76 20:07:23     -306.354097        0.211210


BFGS:   77 20:07:24     -306.342558        0.224463


BFGS:   78 20:07:25     -306.346516        0.178858


BFGS:   79 20:07:26     -306.352989        0.386668


BFGS:   80 20:07:27     -306.356878        0.158231


BFGS:   81 20:07:27     -306.357117        0.128227


BFGS:   82 20:07:28     -306.353805        0.680993


BFGS:   83 20:07:29     -306.330395        0.831106


BFGS:   84 20:07:30     -306.401512        0.967226


BFGS:   85 20:07:31     -306.307035        0.774645


BFGS:   86 20:07:32     -306.163717        1.064225


BFGS:   87 20:07:33     -306.024359        0.948057


BFGS:   88 20:07:34     -305.992239        1.746444


BFGS:   89 20:07:35     -306.108734        1.411024


BFGS:   90 20:07:36     -306.184717        3.184292


BFGS:   91 20:07:37     -306.308908        1.895553


BFGS:   92 20:07:38     -306.288362        0.599597


BFGS:   93 20:07:39     -306.229185        0.626044


BFGS:   94 20:07:40     -306.140514        0.786150


BFGS:   95 20:07:41     -306.201195        0.448835


BFGS:   96 20:07:42     -306.229416        0.355979


BFGS:   97 20:07:43     -306.270243        0.382163


BFGS:   98 20:07:45     -306.283590        0.644051


BFGS:   99 20:07:46     -306.288393        0.641060


BFGS:  100 20:07:47     -306.291469        0.556893


BFGS:  101 20:07:48     -306.331579        0.806527


BFGS:  102 20:07:49     -306.383568        1.070590


BFGS:  103 20:07:50     -306.346222        0.917891


BFGS:  104 20:07:51     -306.264294        0.901918


BFGS:  105 20:07:52     -306.292730        0.679461


BFGS:  106 20:07:53     -306.299456        0.714808


BFGS:  107 20:07:54     -306.329313        0.733975


BFGS:  108 20:07:55     -306.330986        0.493757


BFGS:  109 20:07:56     -306.304518        0.806374


BFGS:  110 20:07:57     -306.313715        0.795134


BFGS:  111 20:07:57     -306.348055        0.626900


BFGS:  112 20:07:58     -306.367197        0.403514


BFGS:  113 20:07:59     -306.356401        0.477608


BFGS:  114 20:08:00     -306.338316        0.507663


BFGS:  115 20:08:01     -306.340446        0.280678


BFGS:  116 20:08:02     -306.345343        0.305150


BFGS:  117 20:08:03     -306.348005        0.253813


BFGS:  118 20:08:04     -306.343456        0.351723


BFGS:  119 20:08:05     -306.347671        0.519462


BFGS:  120 20:08:06     -306.363651        0.442607


BFGS:  121 20:08:07     -306.343294        0.274364


BFGS:  122 20:08:08     -306.345516        0.235249


BFGS:  123 20:08:09     -306.352240        0.195999


BFGS:  124 20:08:10     -306.346464        0.252673


BFGS:  125 20:08:11     -306.326963        0.644122


BFGS:  126 20:08:12     -306.352779        0.651161


BFGS:  127 20:08:13     -306.369646        0.407165


BFGS:  128 20:08:14     -306.327963        0.547077


BFGS:  129 20:08:15     -306.338968        0.467320


BFGS:  130 20:08:16     -306.355263        0.257729


BFGS:  131 20:08:17     -306.356256        0.126559


BFGS:  132 20:08:18     -306.356726        0.215179


BFGS:  133 20:08:19     -306.354319        0.243859


BFGS:  134 20:08:20     -306.336994        0.446838


BFGS:  135 20:08:21     -306.330273        0.498759


BFGS:  136 20:08:22     -306.336840        0.798301


BFGS:  137 20:08:23     -306.409246        0.834353


BFGS:  138 20:08:24     -306.422262        1.234960


BFGS:  139 20:08:25     -306.481281        0.713450


BFGS:  140 20:08:26     -306.434509        0.710958


BFGS:  141 20:08:27     -306.349813        0.562991


BFGS:  142 20:08:28     -306.352751        0.307476


BFGS:  143 20:08:29     -306.358271        0.176271


BFGS:  144 20:08:30     -306.355936        0.975142


BFGS:  145 20:08:31     -306.361463        0.292632


BFGS:  146 20:08:32     -306.360279        0.239047


BFGS:  147 20:08:32     -306.361336        0.126070


BFGS:  148 20:08:33     -306.361336        0.192642


BFGS:  149 20:08:33     -306.361336        0.052779


BFGS:  150 20:08:34     -306.361336        0.091136


BFGS:  151 20:08:34     -306.361336        0.046877
