-
Notifications
You must be signed in to change notification settings - Fork 4
Complete atom type table implementation with parser-native element inference #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Copilot
wants to merge
21
commits into
master
Choose a base branch
from
copilot/fix-031b7d92-b17b-41a9-9747-90a2aba2b6a4
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
3063126
Initial plan
Copilot c36c520
Implement atomic number refactor with hash-backed inference and backw…
Copilot e723aef
Add comprehensive API equivalence and compatibility tests
Copilot 71add64
Initial plan
Copilot 11adf56
Implement parser updates for element inference and LAMMPS fixes
Copilot b402902
Complete implementation with working unit tests for mass→element mapping
Copilot b8e30b0
Merge branch 'copilot/fix-43ee205c-1eb7-402a-b05d-cc95a42f4322' into …
scanberg 47f9ae3
MD_Z_XX how has the form MD_Z_Xx.
scanberg 6cd762c
removed conflicting str_eq_cstr body
scanberg 75a28f8
Updated testdata
scanberg 7143dd3
Add comprehensive PDB element inference validation tests
Copilot 953191a
Condense repetitive PDB element inference tests into single consolida…
Copilot 1df36b4
fixed double -> float issue for mass table
scanberg b054257
Removed percentages from element test
scanberg fab6356
Simplified alot inside atomic infer. Now the unittests should pass
scanberg cd1ad28
Implement atom type table with LAMMPS parser support
Copilot 07880fa
Remove lammps: prefix from atom type labels - use numeric index only
Copilot 7aa21a6
Add convenience functions and complete atom type table implementation…
Copilot eed8b5a
Remove element inference from postprocessing - now handled within par…
Copilot 4a12b0b
Move mmCIF element inference to init_molecule function level
Copilot 9be4e82
Fix mmCIF parser line-skipping bug and add comprehensive tests
Copilot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,166 @@ | ||
| #pragma once | ||
|
|
||
| #include <core/md_str.h> | ||
| #include <stdint.h> | ||
| #include <stdbool.h> | ||
|
|
||
| #ifdef __cplusplus | ||
| extern "C" { | ||
| #endif | ||
|
|
||
| // Atomic number type: 0 = unknown, 1-118 = element atomic numbers | ||
| typedef uint8_t md_atomic_number_t; | ||
|
|
||
| // Legacy alias for compatibility | ||
| typedef md_atomic_number_t md_element_t; | ||
|
|
||
| // Forward declaration of molecule type | ||
| struct md_molecule_t; | ||
|
|
||
| // Atomic number constants for all elements (Z values) | ||
| enum { | ||
| MD_Z_X = 0, // Unknown | ||
| MD_Z_H = 1, // Hydrogen | ||
| MD_Z_He = 2, // Helium | ||
| MD_Z_Li = 3, // Lithium | ||
| MD_Z_Be = 4, // Beryllium | ||
| MD_Z_B = 5, // Boron | ||
| MD_Z_C = 6, // Carbon | ||
| MD_Z_N = 7, // Nitrogen | ||
| MD_Z_O = 8, // Oxygen | ||
| MD_Z_F = 9, // Fluorine | ||
| MD_Z_Ne = 10, // Neon | ||
| MD_Z_Na = 11, // Sodium | ||
| MD_Z_Mg = 12, // Magnesium | ||
| MD_Z_Al = 13, // Aluminium | ||
| MD_Z_Si = 14, // Silicon | ||
| MD_Z_P = 15, // Phosphorus | ||
| MD_Z_S = 16, // Sulfur | ||
| MD_Z_Cl = 17, // Chlorine | ||
| MD_Z_Ar = 18, // Argon | ||
| MD_Z_K = 19, // Potassium | ||
| MD_Z_Ca = 20, // Calcium | ||
| MD_Z_Sc = 21, // Scandium | ||
| MD_Z_Ti = 22, // Titanium | ||
| MD_Z_V = 23, // Vanadium | ||
| MD_Z_Cr = 24, // Chromium | ||
| MD_Z_Mn = 25, // Manganese | ||
| MD_Z_Fe = 26, // Iron | ||
| MD_Z_Co = 27, // Cobalt | ||
| MD_Z_Ni = 28, // Nickel | ||
| MD_Z_Cu = 29, // Copper | ||
| MD_Z_Zn = 30, // Zinc | ||
| MD_Z_Ga = 31, // Gallium | ||
| MD_Z_Ge = 32, // Germanium | ||
| MD_Z_As = 33, // Arsenic | ||
| MD_Z_Se = 34, // Selenium | ||
| MD_Z_Br = 35, // Bromine | ||
| MD_Z_Kr = 36, // Krypton | ||
| MD_Z_Rb = 37, // Rubidium | ||
| MD_Z_Sr = 38, // Strontium | ||
| MD_Z_Y = 39, // Yttrium | ||
| MD_Z_Zr = 40, // Zirconium | ||
| MD_Z_Nb = 41, // Niobium | ||
| MD_Z_Mo = 42, // Molybdenum | ||
| MD_Z_Tc = 43, // Technetium | ||
| MD_Z_Ru = 44, // Ruthenium | ||
| MD_Z_Rh = 45, // Rhodium | ||
| MD_Z_Pd = 46, // Palladium | ||
| MD_Z_Ag = 47, // Silver | ||
| MD_Z_Cd = 48, // Cadmium | ||
| MD_Z_In = 49, // Indium | ||
| MD_Z_Sn = 50, // Tin | ||
| MD_Z_Sb = 51, // Antimony | ||
| MD_Z_Te = 52, // Tellurium | ||
| MD_Z_I = 53, // Iodine | ||
| MD_Z_Xe = 54, // Xenon | ||
| MD_Z_Cs = 55, // Caesium | ||
| MD_Z_Ba = 56, // Barium | ||
| MD_Z_La = 57, // Lanthanum | ||
| MD_Z_Ce = 58, // Cerium | ||
| MD_Z_Pr = 59, // Praseodymium | ||
| MD_Z_Nd = 60, // Neodymium | ||
| MD_Z_Pm = 61, // Promethium | ||
| MD_Z_Sm = 62, // Samarium | ||
| MD_Z_Eu = 63, // Europium | ||
| MD_Z_Gd = 64, // Gadolinium | ||
| MD_Z_Tb = 65, // Terbium | ||
| MD_Z_Dy = 66, // Dysprosium | ||
| MD_Z_Ho = 67, // Holmium | ||
| MD_Z_Er = 68, // Erbium | ||
| MD_Z_Tm = 69, // Thulium | ||
| MD_Z_Yb = 70, // Ytterbium | ||
| MD_Z_Lu = 71, // Lutetium | ||
| MD_Z_Hf = 72, // Hafnium | ||
| MD_Z_Ta = 73, // Tantalum | ||
| MD_Z_W = 74, // Tungsten | ||
| MD_Z_Re = 75, // Rhenium | ||
| MD_Z_Os = 76, // Osmium | ||
| MD_Z_Ir = 77, // Iridium | ||
| MD_Z_Pt = 78, // Platinum | ||
| MD_Z_Au = 79, // Gold | ||
| MD_Z_Hg = 80, // Mercury | ||
| MD_Z_Tl = 81, // Thallium | ||
| MD_Z_Pb = 82, // Lead | ||
| MD_Z_Bi = 83, // Bismuth | ||
| MD_Z_Po = 84, // Polonium | ||
| MD_Z_At = 85, // Astatine | ||
| MD_Z_Rn = 86, // Radon | ||
| MD_Z_Fr = 87, // Francium | ||
| MD_Z_Ra = 88, // Radium | ||
| MD_Z_Ac = 89, // Actinium | ||
| MD_Z_Th = 90, // Thorium | ||
| MD_Z_Pa = 91, // Protactinium | ||
| MD_Z_U = 92, // Uranium | ||
| MD_Z_Np = 93, // Neptunium | ||
| MD_Z_Pu = 94, // Plutonium | ||
| MD_Z_Am = 95, // Americium | ||
| MD_Z_Cm = 96, // Curium | ||
| MD_Z_Bk = 97, // Berkelium | ||
| MD_Z_Cf = 98, // Californium | ||
| MD_Z_Es = 99, // Einsteinium | ||
| MD_Z_Fm = 100, // Fermium | ||
| MD_Z_Md = 101, // Mendelevium | ||
| MD_Z_No = 102, // Nobelium | ||
| MD_Z_Lr = 103, // Lawrencium | ||
| MD_Z_Rf = 104, // Rutherfordium | ||
| MD_Z_Db = 105, // Dubnium | ||
| MD_Z_Sg = 106, // Seaborgium | ||
| MD_Z_Bh = 107, // Bohrium | ||
| MD_Z_Hs = 108, // Hassium | ||
| MD_Z_Mt = 109, // Meitnerium | ||
| MD_Z_Ds = 110, // Darmstadtium | ||
| MD_Z_Rg = 111, // Roentgenium | ||
| MD_Z_Cn = 112, // Copernicium | ||
| MD_Z_Nh = 113, // Nihonium | ||
| MD_Z_Fl = 114, // Flerovium | ||
| MD_Z_Mc = 115, // Moscovium | ||
| MD_Z_Lv = 116, // Livermorium | ||
| MD_Z_Ts = 117, // Tennessine | ||
| MD_Z_Og = 118, // Oganesson | ||
| }; | ||
|
|
||
| // New preferred API names | ||
|
|
||
| // Element symbol and name lookup functions | ||
| md_atomic_number_t md_atomic_number_from_symbol(str_t sym); | ||
| md_atomic_number_t md_atomic_number_from_symbol_icase(str_t sym); | ||
| str_t md_symbol_from_atomic_number(md_atomic_number_t z); | ||
| str_t md_name_from_atomic_number(md_atomic_number_t z); | ||
|
|
||
| // Element property functions | ||
| float md_atomic_mass(md_atomic_number_t z); | ||
| float md_vdw_radius(md_atomic_number_t z); | ||
| float md_covalent_radius(md_atomic_number_t z); | ||
| int md_max_valence(md_atomic_number_t z); | ||
| uint32_t md_cpk_color(md_atomic_number_t z); | ||
|
|
||
| // Per-atom inference from labels (atom name + residue) | ||
| md_atomic_number_t md_atom_infer_atomic_number(str_t atom_name, str_t res_name); | ||
|
|
||
| // Batch form wired to molecule structure | ||
| bool md_atoms_infer_atomic_numbers(md_atomic_number_t out[], size_t n, const struct md_molecule_t* mol); | ||
|
|
||
| #ifdef __cplusplus | ||
| } | ||
| #endif |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| #include <core/md_atomic.h> | ||
| #include <md_molecule.h> | ||
| #include <md_util.h> | ||
|
|
||
| #include <core/md_hash.h> | ||
| #include <core/md_allocator.h> | ||
| #include <core/md_common.h> | ||
| #include <core/md_str.h> | ||
|
|
||
| #include <string.h> | ||
| #include <ctype.h> | ||
|
|
||
| // Core atomic number functions using existing md_util tables | ||
| md_atomic_number_t md_atomic_number_from_symbol(str_t sym) { | ||
| return md_util_element_lookup(sym); | ||
| } | ||
|
|
||
| md_atomic_number_t md_atomic_number_from_symbol_icase(str_t sym) { | ||
| return md_util_element_lookup_ignore_case(sym); | ||
| } | ||
|
|
||
| str_t md_symbol_from_atomic_number(md_atomic_number_t z) { | ||
| return md_util_element_symbol(z); | ||
| } | ||
|
|
||
| str_t md_name_from_atomic_number(md_atomic_number_t z) { | ||
| return md_util_element_name(z); | ||
| } | ||
|
|
||
| float md_atomic_mass(md_atomic_number_t z) { | ||
| return md_util_element_atomic_mass(z); | ||
| } | ||
|
|
||
| float md_vdw_radius(md_atomic_number_t z) { | ||
| return md_util_element_vdw_radius(z); | ||
| } | ||
|
|
||
| float md_covalent_radius(md_atomic_number_t z) { | ||
| return md_util_element_covalent_radius(z); | ||
| } | ||
|
|
||
| int md_max_valence(md_atomic_number_t z) { | ||
| return md_util_element_max_valence(z); | ||
| } | ||
|
|
||
| uint32_t md_cpk_color(md_atomic_number_t z) { | ||
| return md_util_element_cpk_color(z); | ||
| } | ||
|
|
||
| // Inference functions | ||
| md_atomic_number_t md_atom_infer_atomic_number(str_t atom_name, str_t res_name) { | ||
|
|
||
| // Special case: if atom name is empty but residue name is an element (ion case) | ||
| if (atom_name.len == 0 && res_name.len > 0) { | ||
| md_atomic_number_t res_element = md_atomic_number_from_symbol_icase(res_name); | ||
| if (res_element != MD_Z_X) { | ||
| return res_element; | ||
| } | ||
| return MD_Z_X; | ||
| } | ||
|
|
||
| if (atom_name.len == 0) return MD_Z_X; | ||
|
|
||
| // First try residue+atom combination | ||
| if (res_name.len > 0) { | ||
| // Special case: if water, amino acid or nucleotide: Match against first character only | ||
| if (md_util_resname_water(res_name) || md_util_resname_amino_acid(res_name) || md_util_resname_nucleotide(res_name)) { | ||
| str_t first_char = str_substr(atom_name, 0, 1); | ||
| md_atomic_number_t z = md_atomic_number_from_symbol_icase(first_char); | ||
| if (z != MD_Z_X) return z; | ||
| } | ||
|
|
||
| // If residue name itself is an element (ion case) | ||
| md_atomic_number_t res_element = md_atomic_number_from_symbol_icase(res_name); | ||
| if (res_element != MD_Z_X) { | ||
| // If atom name is empty or equals residue, return that element | ||
| if (atom_name.len == 0 || str_eq_ignore_case(atom_name, res_name)) { | ||
| return res_element; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Try two-letter element heuristic (e.g., CL12 => Cl, BR1 => Br) | ||
| if (atom_name.len >= 2) { | ||
| str_t two_letter_str = str_substr(atom_name, 0, 2); | ||
| md_atomic_number_t two_z = md_atomic_number_from_symbol_icase(two_letter_str); | ||
| if (two_z != MD_Z_X) { | ||
| return two_z; | ||
| } | ||
| } | ||
|
|
||
| // Final fallback: first-letter element mapping | ||
| str_t first_letter_str = str_substr(atom_name, 0, 1); | ||
| return md_atomic_number_from_symbol_icase(first_letter_str); | ||
| } | ||
|
|
||
| bool md_atoms_infer_atomic_numbers(md_atomic_number_t out[], size_t n, const struct md_molecule_t* mol) { | ||
| if (!out || !mol || n == 0) return false; | ||
|
|
||
| size_t count = MIN(n, mol->atom.count); | ||
| for (size_t i = 0; i < count; ++i) { | ||
| str_t atom_name = LBL_TO_STR(mol->atom.type[i]); | ||
| str_t res_name = mol->atom.resname ? LBL_TO_STR(mol->atom.resname[i]) : (str_t){0}; | ||
| out[i] = md_atom_infer_atomic_number(atom_name, res_name); | ||
| } | ||
|
|
||
| return true; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why introduce the element field here if the plan is to get rid of it? Perhaps use a temporary element field, if required for atom type inference and interning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by removing the complex LAMMPS mass→element mapping and focusing on the simplified element inference approach. The element field is kept for compatibility and is properly initialized to 0 before being filled by md_util_element_guess in postprocessing (b402902).