Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch_process_wbm_dataset.py: AssertionError: mat_id='wbm-1-9': e_form=-0.31117 != e_form_ppd - correction=-0.32358 #23

Closed
pbenner opened this issue May 1, 2023 · 2 comments · Fixed by #26
Labels
bug Something isn't working data Data loading and processing

Comments

@pbenner
Copy link
Collaborator

pbenner commented May 1, 2023

Executing fetch_process_wbm_dataset.py results in the following error:

Traceback (most recent call last):
  File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 577, in <module>
    abs(e_form - (e_form_ppd - correction)) < 1e-4
AssertionError: mat_id='wbm-1-9': e_form=-0.31117 != e_form_ppd - correction=-0.32358
@janosh
Copy link
Owner

janosh commented May 2, 2023

That consistency check is broken because the elemental reference energies used by the PatchedPhaseDiagram and the get_e_form_per_atom() function have diverged slightly. I requeried all MP ComputedStructureEntries a few weeks before publishing this repo, whereas get_e_form_per_atom() still uses MP refs from when I originally queried them in Sep 2022.

It passes if you loosen the trigger a lot:

- abs(e_form - e_form_ppd) < 1e-4
+ abs(e_form - e_form_ppd) < 0.1

but a real fix requires updating the elemental references used by get_e_form_per_atom(). Thanks for flagging this.

@janosh janosh added bug Something isn't working data Data loading and processing labels May 2, 2023
@janosh
Copy link
Owner

janosh commented May 2, 2023

Here's old vs new:

Old
mp_elemental_ref_energies = {
    "Ne": -0.0259, "He": -0.0091, "Ar": -0.0688, "F": -1.9115, "O": -4.948, "Cl": -1.8485, "N": -8.3365, "Kr": -0.0567, "Br": -1.6369, "I": -1.524, "Xe": -0.0362, "S": -4.1364, "Se": -3.4959, "C": -9.2268, "Au": -3.2739, "W": -12.9581, "Pb": -3.7126, "Rh": -7.3643, "Pt": -6.0709, "Ru": -9.2744, "Pd": -5.1799, "Os": -11.2274, "Ir": -8.8384, "H": -3.3927, "P": -5.4133, "As": -4.6591, "Mo": -10.8456, "Te": -3.1433, "Sb": -4.129, "B": -6.6794, "Bi": -3.89, "Ge": -4.623, "Hg": -0.3037, "Sn": -4.0096, "Ag": -2.8326, "Ni": -5.7801, "Tc": -10.3606, "Si": -5.4253, "Re": -12.4445, "Cu": -4.0992, "Co": -7.1083, "Fe": -8.47, "Ga": -3.0281, "In": -2.7517, "Cd": -0.9229, "Cr": -9.653, "Zn": -1.2597, "V": -9.0839, "Tl": -2.3626, "Al": -3.7456, "Nb": -10.1013, "Be": -3.7394, "Mn": -9.162, "Ti": -7.8955, "Ta": -11.8578, "Pa": -9.5147, "U": -11.2914, "Sc": -6.3325, "Np": -12.9478, "Zr": -8.5477, "Mg": -1.6003, "Th": -7.4139, "Hf": -9.9572, "Pu": -14.2678, "Lu": -4.521, "Tm": -4.4758, "Er": -4.5677, "Ho": -4.5824, "Y": -6.4665, "Dy": -4.6068, "Gd": -14.0761, "Eu": -10.292, "Sm": -4.7186, "Nd": -4.7681, "Pr": -4.7809, "Pm": -4.7505, "Ce": -5.9331, "Yb": -1.5396, "Tb": -4.6344, "La": -4.936, "Ac": -4.1212, "Ca": -2.0056, "Li": -1.9089, "Sr": -1.6895, "Na": -1.3225, "Ba": -1.919, "Rb": -0.9805, "K": -1.1104, "Cs": -0.8954,  # noqa: E501
}
New
mp_elemental_ref_energies = {
    "Ne": -0.0259, "He": -0.0091, "Ar": -0.0688, "F": -1.9115, "O": -4.9467, "Cl": -1.8485, "N": -8.3365, "Kr": -0.0567, "Br": -1.553, "I": -1.4734, "Xe": -0.0362, "S": -4.1364, "Se": -3.4959, "C": -9.2287, "Au": -3.2739, "W": -12.9581, "Pb": -3.7126, "Rh": -7.3643, "Pt": -6.0711, "Ru": -9.2744, "Pd": -5.1799, "Os": -11.2274, "Ir": -8.8384, "H": -3.3927, "P": -5.4133, "As": -4.6591, "Mo": -10.8457, "Te": -3.1433, "Sb": -4.129, "B": -6.6794, "Bi": -3.8405, "Ge": -4.623, "Hg": -0.3037, "Sn": -4.0096, "Ag": -2.8326, "Ni": -5.7801, "Tc": -10.3606, "Si": -5.4253, "Re": -12.4445, "Cu": -4.0992, "Co": -7.1083, "Fe": -8.47, "Ga": -3.0281, "In": -2.7517, "Cd": -0.9229, "Cr": -9.653, "Zn": -1.2597, "V": -9.0839, "Tl": -2.3626, "Al": -3.7456, "Nb": -10.1013, "Be": -3.7394, "Mn": -9.162, "Ti": -7.8955, "Ta": -11.8578, "Pa": -9.5147, "U": -11.2914, "Sc": -6.3325, "Np": -12.9478, "Zr": -8.5477, "Mg": -1.6003, "Th": -7.4139, "Hf": -9.9572, "Pu": -14.2678, "Lu": -4.521, "Tm": -4.4758, "Er": -4.5677, "Ho": -4.5824, "Y": -6.4665, "Dy": -4.6068, "Gd": -14.0761, "Eu": -10.257, "Sm": -4.7186, "Nd": -4.7681, "Pr": -4.7809, "Pm": -4.7505, "Ce": -5.9331, "Yb": -1.5396, "Tb": -4.6344, "La": -4.936, "Ac": -4.1212, "Ca": -2.0056, "Li": -1.9089, "Sr": -1.6895, "Na": -1.3225, "Ba": -1.919, "Rb": -0.9805, "K": -1.1104, "Cs": -0.8954}
}

Screenshot 2023-05-01 at 18 23 09

Hmmm... didn't expect the difference to be so large. For I and Br in particular it's 50 and 80 meV / atom. Using the new refs, the consistency check again passes.

The difference in WBM formation energies before and after is ~2 meV/atom, small enough to not affect the analysis.

(e_form_per_atom_uncorrected_old - e_form_per_atom_uncorrected_new).abs().mean() = 0.001965

output

janosh added a commit that referenced this issue May 2, 2023
…(2023-02-07) MP elemental reference energies (closes #23)
janosh added a commit that referenced this issue May 3, 2023
* fix fetch_process_wbm_dataset.py after pandas v2 breaking changes

* drop pytest-markdown-docs from optional deps

* fix double slash in PRED_FILES

* bump deps

* update docs, DataFiles and matbench_discovery/energy.py with updated (2023-02-07) MP elemental reference energies (closes #23)

* update 2022-10-19-wbm-summary.csv formation energies with 2023-02-07 element reference energies

compress data/mp/2023-02-07-mp-elemental-reference-entries.json.gz
update data/figshare/1.0.0.json file links

* pin pandas>=2.0.0

#22 (comment)
mark test_load_train_test_no_mock() for mp_computed_structure_entries as very_slow

* load_train_test() support loading and caching pickle files (for mp_patched_phase_diagram)

change signature from data_names (str | list[str], optional) = 'all' to data_key (str)

* rename load_train_test() to load()
@janosh janosh closed this as completed in #26 May 3, 2023
janosh added a commit that referenced this issue Jun 20, 2023
* fix fetch_process_wbm_dataset.py after pandas v2 breaking changes

* drop pytest-markdown-docs from optional deps

* fix double slash in PRED_FILES

* bump deps

* update docs, DataFiles and matbench_discovery/energy.py with updated (2023-02-07) MP elemental reference energies (closes #23)

* update 2022-10-19-wbm-summary.csv formation energies with 2023-02-07 element reference energies

compress data/mp/2023-02-07-mp-elemental-reference-entries.json.gz
update data/figshare/1.0.0.json file links

* pin pandas>=2.0.0

#22 (comment)
mark test_load_train_test_no_mock() for mp_computed_structure_entries as very_slow

* load_train_test() support loading and caching pickle files (for mp_patched_phase_diagram)

change signature from data_names (str | list[str], optional) = 'all' to data_key (str)

* rename load_train_test() to load()
janosh added a commit that referenced this issue Jun 20, 2023
* fix fetch_process_wbm_dataset.py after pandas v2 breaking changes

* drop pytest-markdown-docs from optional deps

* fix double slash in PRED_FILES

* bump deps

* update docs, DataFiles and matbench_discovery/energy.py with updated (2023-02-07) MP elemental reference energies (closes #23)

* update 2022-10-19-wbm-summary.csv formation energies with 2023-02-07 element reference energies

compress data/mp/2023-02-07-mp-elemental-reference-entries.json.gz
update data/figshare/1.0.0.json file links

* pin pandas>=2.0.0

#22 (comment)
mark test_load_train_test_no_mock() for mp_computed_structure_entries as very_slow

* load_train_test() support loading and caching pickle files (for mp_patched_phase_diagram)

change signature from data_names (str | list[str], optional) = 'all' to data_key (str)

* rename load_train_test() to load()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data Data loading and processing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants