stress-test of workflows and API - the 1000 glass challenge

We've talked in the past about the stress test we would like the package to pass before we feel confident to give it into the hands of users.

Here is my proposal:

We pick all glasses from a database with measured
1. densities
2. elastic constants
3. CTE
4. (optional, where present) high-temperature viscosity 

We do some basic data engineering to filter out outliers and then pick ~1000 diverse compositions based on an element vector distance.

Then we run them through our pipeline and compare.

### Which database to draw from?

I first looked into SciGlass, since it is open source, used by the guys in Jena and there would be no issue to publish all the data (compositions, measurements) alongside the calculations.

The downside: I only find 570 compositions with all properties 1-3 (before we do any further filtering), while I find ~12k such compositions in InterGlaD.

The [interglad terms of use](https://www.newglass.jp/interglad_n/gaiyo/doc/PD-INTGL-2025-02_INTERGLAD_Terms%20of%20Use.pdf) actually also mention the case of papers using information from interglad, but it does not explicitly allow e.g. publication of small subsets.
We would need to ask (I know the director) but I think it is quite likely that they would say no.

My feeling is the SciGlass dataset is not large enough... we could of course do a compromise and mix - take as much diverse data we can get from SciGlass, top up to 1000 from InterGlaD, and publish only the SciGlass subset (this could anyhow be a start, with SciGlass calculations running on the BAM side and INterGlaD on the SCHOTT side).

What do you think @Atilaac @Gitdowski 

P.S. One could in principle relax the constraint to glasses with measured density OR elastic constants OR CTE. But for 1000 comparisons for each property that means up to 3x the number of calculations + it means you cannot necessarily look at different properties predicted for the same glass and see how their errors differ.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stress-test of workflows and API - the 1000 glass challenge #178

Which database to draw from?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

stress-test of workflows and API - the 1000 glass challenge #178

Description

Which database to draw from?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions