FIX: ChemSage DAT: Case sensitivity of compound names in endmembers #426

richardotis · 2022-08-07T18:30:35Z

These are some minimal changes to allow for case sensitivity in DAT compound names (fixes #425). This doesn't address the fundamental issue that DAT and pycalphad think of constituents differently (#419) but it will hopefully enable a greater subset of DAT files to work without issue. In particular, we no longer convert the entire input file to uppercase before parsing. Instead, selective uppercasing is applied where it is "safer" to extract unambiguous names (though it is still not safe in general).

There is a test to make sure the database from the original issue parses, but this still needs a validation test to make sure the parsed database is correct.

codecov · 2022-08-07T22:36:40Z

Codecov Report

Merging #426 (ab348c8) into develop (ebcfbdb) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop     #426      +/-   ##
===========================================
+ Coverage    90.22%   90.24%   +0.01%     
===========================================
  Files           50       50              
  Lines         7696     7708      +12     
===========================================
+ Hits          6944     6956      +12     
  Misses         752      752

Impacted Files	Coverage Δ
pycalphad/io/cs_dat.py	`81.53% <100.00%> (+0.25%)`	⬆️
pycalphad/tests/test_database.py	`100.00% <100.00%> (ø)`
pycalphad/core/phase_rec.pyx	`94.32% <0.00%> (-0.52%)`	⬇️
pycalphad/core/eqsolver.pyx	`84.50% <0.00%> (-0.50%)`	⬇️
pycalphad/core/minimizer.pyx	`83.80% <0.00%> (+0.33%)`	⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

bocklund · 2022-08-08T16:36:56Z

I'm hesitant about this because (if I understand #419) I believe that in FactSage giving both CO and Co as uppercase CO in the DAT file would work correctly.

As in #419:

However, you cannot count on this convention being followed in DAT files, and in fact it rarely is. Further, one really ought not to be using the constituent names for anything. They are there for readability only, and follow no rules at all. Renaming all five constituents to Bob should result in no changes to the calculation.

It happens to be that the casing can disambiguate them here, but in general I don't think we can expect that to be true. It's not yet clear to me how reconcile the fact that the names are meaningless, but they are quite meaningful in pycalphad because we make an assumption that names are distinguishing in the Model and variables.SiteFraction symbolics.

The error you get with CO and Co not allowing duplicate species prevents the SiteFraction objects from becoming equivalent SymEngine symbols. In fact, I think it might be that this change could cause that because the StateVariable.__init__ casts the name to uppercase and SiteFraction.__init__ gives the variable name (including the species) to StateVariable via super.

richardotis · 2022-08-08T17:09:16Z

Where is the stoichiometry information coming from for each phase at Model construction time?

bocklund · 2022-08-08T20:38:04Z

I'm not sure I know what you're referring to about stoichiometry. The constituent species are known via the Database().phases Phase objects. And the mass of pure elements are coming from the species as well. My concern is that you'll have a Y(PHASE,0,"CO") (from "CO") and Y(PHASE,O,"CO") (from "Co"), which mess up the free symbols in expressions where the site fractions are used (both Model().moles and Model().GM

richardotis · 2022-08-08T21:49:20Z

I think I mean to ask about the chemistry of the phases (what chemical species are in solution), not the stoichiometry exactly. Can we construct meaningful constituent names from the cation/anion information we have for each phase? It seems like this information has to exist somewhere, or else it wouldn't be possible to construct a mass balance.

bocklund · 2022-08-08T22:37:51Z

I think I understand your question now. According to #419, the DAT lets us get the pure elements constituents of any species. We could construct a name from that, rather than using the name/label in the DAT. I don’t know if the species by stoichiometry are guaranteed to be unique, but I don’t know why they wouldn’t be for in any phase in practice.

For example, in this test database the numbers after 4 3 are the pure element constituents of this species. There are 18 pure elements in the database.

gas_ideal
  IDMX
  CO
    4  3    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
     0.0    0.0    0.0    1.0    0.0    1.0    0.0    0.0

richardotis · 2022-10-15T18:46:23Z

I think I'm understanding this issue better now. Can we discard the label at Database construction time, and construct a canonical Species name based on the pure element constituents (and charge information)? When we write DAT out, we can simply write the canonical name with no loss of physical information.

richardotis added 3 commits August 7, 2022 11:06

WIP: cs_dat: Selective case sensitivity for species name strings

9744e9e

TST: test for fixing pycalphadgh-425

eb03ffd

FIX: Add database for test

ab348c8

richardotis mentioned this pull request Aug 7, 2022

Duplicate Species name error when parsing FactSage DAT into Database #425

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: ChemSage DAT: Case sensitivity of compound names in endmembers #426

FIX: ChemSage DAT: Case sensitivity of compound names in endmembers #426

richardotis commented Aug 7, 2022

codecov bot commented Aug 7, 2022

bocklund commented Aug 8, 2022 •

edited

richardotis commented Aug 8, 2022

bocklund commented Aug 8, 2022

richardotis commented Aug 8, 2022

bocklund commented Aug 8, 2022

richardotis commented Oct 15, 2022

FIX: ChemSage DAT: Case sensitivity of compound names in endmembers #426

Are you sure you want to change the base?

FIX: ChemSage DAT: Case sensitivity of compound names in endmembers #426

Conversation

richardotis commented Aug 7, 2022

codecov bot commented Aug 7, 2022

Codecov Report

bocklund commented Aug 8, 2022 • edited

richardotis commented Aug 8, 2022

bocklund commented Aug 8, 2022

richardotis commented Aug 8, 2022

bocklund commented Aug 8, 2022

richardotis commented Oct 15, 2022

bocklund commented Aug 8, 2022 •

edited