Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChemSage DAT support: Constituent parsing bug #419

Open
maxposchmann opened this issue Jun 10, 2022 · 4 comments
Open

ChemSage DAT support: Constituent parsing bug #419

maxposchmann opened this issue Jun 10, 2022 · 4 comments

Comments

@maxposchmann
Copy link

The constituents of a multi-sublattice phase listed in a DAT can be chemically complex or contain information about the charge state of the constituent, or really anything else at all. See here for an example:

 Fe[3+]                   Fe[2+]                   V2O3
 Ti[4+]                   Ti[3+]
 O

Fe[3+] contains a number that tells you about the charge state, while V2O3 contains numbers that tell you about the stoichiometry of the constituent.

It appears that currently, pycalphad counts on the square brackets being there around the charges. So things work fine for Fe[3+], but break down if one has Fe3+ instead, as it reads this as 3 irons. Due to this issue, incorrect results will be computed for many DAT files.

However, you cannot count on this convention being followed in DAT files, and in fact it rarely is. Further, one really ought not to be using the constituent names for anything. They are there for readability only, and follow no rules at all. Renaming all five constituents to Bob should result in no changes to the calculation.

There is in fact no way at all to determine the stoichiometry of individual constituents in an MQMQA phase from a DAT file. Fortunately, these aren't necessary as the quadruplet stoichiometries can be determined from the stoichiometries of the endmembers, the coordinations, and the arrays that link the endmembers to the constituents.

@bocklund
Copy link
Collaborator

There may be some bugs hiding, having a reproducible example would be helpful if one is found.

Species objects have a name and a constituents attributes. The name is supposed to be arbitrary and the constituents are what matters. As long as pycalphad gets the constituents correctly, everything else should just work. Species also have the feature of being capable of parsing the constituents if constructed from only a name if the constituents are not given, but I don't remember exactly how this is used in practice, except as a user-facing shortcut to make v.Species("CU") work as expected.

The intent of pycalphad.io.cs_dat.Endmember is to construct Species objects using the pure element constituents according to the stoichiometry in the DAT, not to rely on the names in the models.

IIRC, I think the parsed names of Species (e.g. Species("Bob")) do indeed get added to the Database object, but that should not affect anything in practice. If that is what's happening, we should probably avoid doing that.

@maxposchmann
Copy link
Author

Try out the database I linked above, you will see issues with the constituents immediately.

Here's an example:
species.name = FE[3+]+3.0
species.constituents = {'FE': 1.0, '[': 3.0, ']': 1.0}

Or, if I delete the []:
species.name = FE3++3.0
species.constituents = {'FE': 3.0}

So both of those are wrong in different ways. The first shouldn't consider [ or ] as elements, and the second gets the Fe stoichiometry wrong.

I've not had time to see where this goes wrong, but since all that I changed in the DAT was the constituent name, it certainly seems that more information is being extracted from that name than is safe.

@bocklund
Copy link
Collaborator

Yeah that's, definitely not correct for the species, but do you know if it's affecting the correctness for any phases or models? For example, the species should be similarly incorrect for the Shishin Fe-Sb-O-S database, but we have some explicit tests in the test suite with that compares pycalphad energies to Thermochimica energies.

@maxposchmann
Copy link
Author

I set up a calculation as follows:

db = Database('FeTiVO.dat')
my_phases = ['SLAGBSOLN']
comps = ['TI', 'FE', 'O']

calc_result = calculate(db, comps, 'SLAGBSOLN', P=101325, T=500)
print(calc_result)

With the database as it is originally, my output is:

pycalphad.core.errors.ConditionError: None of the passed phases (['SLAGBSOLN']) are active. List of possible phases: ['BROK1SOLN', 'FE(S)', 'FE2O3_HIGH-PRESSURE-(S2)', 'FE2O3_HIGH-PRESSURE-(S3)', 'FE2O5TI_FERRIC_PSEUDO(S)', 'FE3O4_HIGH-PRESSURE-(S3)', 'FE3O4_HIGH-PRESSURE-(S4)', 'FEO_WUSTITE(S)', 'FE_BCC(S)', 'GAS_IDEAL', 'HEMASOLN', 'O2(S)', 'RUTILESOLN', 'TI(S)', 'TI2O3_SOLID-A(S)', 'TI2O3_SOLID-B(S2)', 'TIO2_ANATASE(S2)', 'TIO2_RUTILE(S)', 'TI_SOLID_ALPHA(S)'].

With the brackets removed from Fe[3+] only the output is:

<xarray.Dataset>
Dimensions:    (N: 1, P: 1, T: 1, points: 1, component: 3, internal_dof: 1)
Coordinates:
  * component  (component) <U2 'FE' 'O' 'TI'
  * N          (N) float64 1.0
  * P          (P) float64 1.013e+05
  * T          (T) float64 500.0
Dimensions without coordinates: points, internal_dof
Data variables:
    X          (N, P, T, points, component) float64 0.4 0.6 0.0
    Phase      (N, P, T, points) <U9 'SLAGBSOLN'
    Y          (N, P, T, points, internal_dof) float64 1.0
    GM         (N, P, T, points) float64 -1.643e+05
Attributes:
    phase_indices:  {'SLAGBSOLN': slice(0, 1, None)}

With all brackets removed the output is:

<xarray.Dataset>
Dimensions:    (N: 1, P: 1, T: 1, points: 108010, component: 3, internal_dof: 10)
Coordinates:
  * component  (component) <U2 'FE' 'O' 'TI'
  * N          (N) float64 1.0
  * P          (P) float64 1.013e+05
  * T          (T) float64 500.0
Dimensions without coordinates: points, internal_dof
Data variables:
    X          (N, P, T, points, component) float64 0.5 0.5 ... 0.5546 0.09745
    Phase      (N, P, T, points) <U9 'SLAGBSOLN' 'SLAGBSOLN' ... 'SLAGBSOLN'
    Y          (N, P, T, points, internal_dof) float64 1.0 1e-14 ... 0.02869
    GM         (N, P, T, points) float64 -1.384e+05 -1.497e+05 ... -1.877e+05
Attributes:
    phase_indices:  {'SLAGBSOLN': slice(0, 108010, None)}

I want to emphasize that none of these databases is more or less correct than the others.

About the tests: they have the quadruplet concentrations set directly, so it is believable that some error modes are sidestepped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants