Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: ChemSage DAT: Case sensitivity of compound names in endmembers #426

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

richardotis
Copy link
Collaborator

These are some minimal changes to allow for case sensitivity in DAT compound names (fixes #425). This doesn't address the fundamental issue that DAT and pycalphad think of constituents differently (#419) but it will hopefully enable a greater subset of DAT files to work without issue. In particular, we no longer convert the entire input file to uppercase before parsing. Instead, selective uppercasing is applied where it is "safer" to extract unambiguous names (though it is still not safe in general).

There is a test to make sure the database from the original issue parses, but this still needs a validation test to make sure the parsed database is correct.

@codecov
Copy link

codecov bot commented Aug 7, 2022

Codecov Report

Merging #426 (ab348c8) into develop (ebcfbdb) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop     #426      +/-   ##
===========================================
+ Coverage    90.22%   90.24%   +0.01%     
===========================================
  Files           50       50              
  Lines         7696     7708      +12     
===========================================
+ Hits          6944     6956      +12     
  Misses         752      752              
Impacted Files Coverage Δ
pycalphad/io/cs_dat.py 81.53% <100.00%> (+0.25%) ⬆️
pycalphad/tests/test_database.py 100.00% <100.00%> (ø)
pycalphad/core/phase_rec.pyx 94.32% <0.00%> (-0.52%) ⬇️
pycalphad/core/eqsolver.pyx 84.50% <0.00%> (-0.50%) ⬇️
pycalphad/core/minimizer.pyx 83.80% <0.00%> (+0.33%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@bocklund
Copy link
Collaborator

bocklund commented Aug 8, 2022

I'm hesitant about this because (if I understand #419) I believe that in FactSage giving both CO and Co as uppercase CO in the DAT file would work correctly.

As in #419:

However, you cannot count on this convention being followed in DAT files, and in fact it rarely is. Further, one really ought not to be using the constituent names for anything. They are there for readability only, and follow no rules at all. Renaming all five constituents to Bob should result in no changes to the calculation.

It happens to be that the casing can disambiguate them here, but in general I don't think we can expect that to be true. It's not yet clear to me how reconcile the fact that the names are meaningless, but they are quite meaningful in pycalphad because we make an assumption that names are distinguishing in the Model and variables.SiteFraction symbolics.

The error you get with CO and Co not allowing duplicate species prevents the SiteFraction objects from becoming equivalent SymEngine symbols. In fact, I think it might be that this change could cause that because the StateVariable.__init__ casts the name to uppercase and SiteFraction.__init__ gives the variable name (including the species) to StateVariable via super.

@richardotis
Copy link
Collaborator Author

Where is the stoichiometry information coming from for each phase at Model construction time?

@bocklund
Copy link
Collaborator

bocklund commented Aug 8, 2022

I'm not sure I know what you're referring to about stoichiometry. The constituent species are known via the Database().phases Phase objects. And the mass of pure elements are coming from the species as well. My concern is that you'll have a Y(PHASE,0,"CO") (from "CO") and Y(PHASE,O,"CO") (from "Co"), which mess up the free symbols in expressions where the site fractions are used (both Model().moles and Model().GM

@richardotis
Copy link
Collaborator Author

I think I mean to ask about the chemistry of the phases (what chemical species are in solution), not the stoichiometry exactly. Can we construct meaningful constituent names from the cation/anion information we have for each phase? It seems like this information has to exist somewhere, or else it wouldn't be possible to construct a mass balance.

@bocklund
Copy link
Collaborator

bocklund commented Aug 8, 2022

I think I understand your question now. According to #419, the DAT lets us get the pure elements constituents of any species. We could construct a name from that, rather than using the name/label in the DAT. I don’t know if the species by stoichiometry are guaranteed to be unique, but I don’t know why they wouldn’t be for in any phase in practice.

For example, in this test database the numbers after 4 3 are the pure element constituents of this species. There are 18 pure elements in the database.

gas_ideal
  IDMX
  CO
    4  3    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
     0.0    0.0    0.0    1.0    0.0    1.0    0.0    0.0

@richardotis
Copy link
Collaborator Author

I think I'm understanding this issue better now. Can we discard the label at Database construction time, and construct a canonical Species name based on the pure element constituents (and charge information)? When we write DAT out, we can simply write the canonical name with no loss of physical information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate Species name error when parsing FactSage DAT into Database
2 participants