Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate Species name error when parsing FactSage DAT into Database #425

Open
Timm638 opened this issue Aug 7, 2022 · 4 comments · May be fixed by #426
Open

Duplicate Species name error when parsing FactSage DAT into Database #425

Timm638 opened this issue Aug 7, 2022 · 4 comments · May be fixed by #426

Comments

@Timm638
Copy link

Timm638 commented Aug 7, 2022

When trying to parse in a DAT file which uses Cobalt 'Co' as system component and contains carbon monoxide 'CO' as gas phase, then a ValueError for using duplicate species name will be raised:

db = pycalphad.Database('Bugged_DAT.dat')

Traceback (most recent call last):
  File "C:\Users\timm\anaconda3\envs\factsage-data-extraction-3-8\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "C:\Users\timm\anaconda3\envs\factsage-data-extraction-3-8\lib\site-packages\pycalphad\io\database.py", line 115, in __new__
    return cls.from_file(fname, fmt=fmt)
  File "C:\Users\timm\anaconda3\envs\factsage-data-extraction-3-8\lib\site-packages\pycalphad\io\database.py", line 220, in from_file
    format_registry[fmt.lower()].read(dbf, fd)
  File "C:\Users\timm\anaconda3\envs\factsage-data-extraction-3-8\lib\site-packages\pycalphad\io\cs_dat.py", line 1196, in read_cs_dat
    parsed_phase.insert(dbf, header.pure_elements, header.gibbs_coefficient_idxs, header.excess_coefficient_idxs)
  File "C:\Users\timm\anaconda3\envs\factsage-data-extraction-3-8\lib\site-packages\pycalphad\io\cs_dat.py", line 504, in insert
    raise ValueError(f"A Species named {sp.name} (defined for phase {self.phase_name}) already exists in the database's species ({dbf.species}), but the constituents do not match.")
ValueError: A Species named CO (defined for phase GAS_IDEAL) already exists in the database's species ({Species('CO', 'CO1.0'), Species('N', 'N1.0'), Species('H', 'H1.0'), Species('O', 'O1.0'), Species('NI', 'NI1.0'), Species('FE', 'FE1.0'), Species('CU', 'CU1.0'), Species('B', 'B1.0'), Species('SC', 'SC1.0'), Species('C', 'C1.0'), Species('MN', 'MN1.0'), Species('K', 'K1.0'), Species('CL', 'CL1.0'), Species('S', 'S1.0'), Species('P', 'P1.0'), Species('MG', 'MG1.0'), Species('CA', 'CA1.0'), Species('SI', 'SI1.0')}), but the constituents do not match.

This bug has been found while trying to parse exported DAT files from FactSage 6.2.
This has been tested under Python 3.9 and pycalphad 0.10.1.
I've attached a zip file containing a minimal DAT file, a script for reproducing the error and a list of packages versions, in case that's needed.

script.zip

@richardotis
Copy link
Collaborator

I can reproduce this, thanks for the excellent report. For TDB files, it's always been safe to assume Species(string) == Species(string.upper()), and so stringified species names are canonicalized to uppercase strings in several places in the code. (TDB would represent carbon monoxide as C1O1.) Right now we're actually converting the whole file to uppercase before parsing, which is clearly not safe to do for DAT files in general, and so we'll need to think about the right place(s) to relax this assumption so the Database can distinguish between, e.g., cobalt and carbon monoxide.

In light of this example, it is not obvious to me what Species('CO') should do, without bringing in more context. Should every Database define its own Species class? I think not, since Species intrinsically couples Model and Database together, and so I see it as more of a pycalphad-level helper object that isn't tied to particular database formats or model types.

My thinking is that we have to make the calling code responsible for managing case. I'm working through the implications of this for the parsers now. The DAT parser will need to work around the assumption that Species('CO') is cobalt, and keep track of case-insensitivity issues where it needs to.

@richardotis
Copy link
Collaborator

This may be a use case for semantic typing, i.e., say Species will only accept a string of type CaseInsensitiveString, and then require explicit conversions.

@bocklund
Copy link
Collaborator

bocklund commented Aug 7, 2022

This is related to #419, with the core issue being that species names are basically cosmetic in DAT files

richardotis added a commit to richardotis/pycalphad that referenced this issue Aug 7, 2022
@richardotis
Copy link
Collaborator

#426 should fix the present issue, though it leaves the core issue mentioned above unresolved. It should be enough to move forward with this database, at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants