We've talked in the past about the stress test we would like the package to pass before we feel confident to give it into the hands of users.
Here is my proposal:
We pick all glasses from a database with measured
- densities
- elastic constants
- CTE
- (optional, where present) high-temperature viscosity
We do some basic data engineering to filter out outliers and then pick ~1000 diverse compositions based on an element vector distance.
Then we run them through our pipeline and compare.
Which database to draw from?
I first looked into SciGlass, since it is open source, used by the guys in Jena and there would be no issue to publish all the data (compositions, measurements) alongside the calculations.
The downside: I only find 570 compositions with all properties 1-3 (before we do any further filtering), while I find ~12k such compositions in InterGlaD.
The interglad terms of use actually also mention the case of papers using information from interglad, but it does not explicitly allow e.g. publication of small subsets.
We would need to ask (I know the director) but I think it is quite likely that they would say no.
My feeling is the SciGlass dataset is not large enough... we could of course do a compromise and mix - take as much diverse data we can get from SciGlass, top up to 1000 from InterGlaD, and publish only the SciGlass subset (this could anyhow be a start, with SciGlass calculations running on the BAM side and INterGlaD on the SCHOTT side).
What do you think @Atilaac @Gitdowski
P.S. One could in principle relax the constraint to glasses with measured density OR elastic constants OR CTE. But for 1000 comparisons for each property that means up to 3x the number of calculations + it means you cannot necessarily look at different properties predicted for the same glass and see how their errors differ.
We've talked in the past about the stress test we would like the package to pass before we feel confident to give it into the hands of users.
Here is my proposal:
We pick all glasses from a database with measured
We do some basic data engineering to filter out outliers and then pick ~1000 diverse compositions based on an element vector distance.
Then we run them through our pipeline and compare.
Which database to draw from?
I first looked into SciGlass, since it is open source, used by the guys in Jena and there would be no issue to publish all the data (compositions, measurements) alongside the calculations.
The downside: I only find 570 compositions with all properties 1-3 (before we do any further filtering), while I find ~12k such compositions in InterGlaD.
The interglad terms of use actually also mention the case of papers using information from interglad, but it does not explicitly allow e.g. publication of small subsets.
We would need to ask (I know the director) but I think it is quite likely that they would say no.
My feeling is the SciGlass dataset is not large enough... we could of course do a compromise and mix - take as much diverse data we can get from SciGlass, top up to 1000 from InterGlaD, and publish only the SciGlass subset (this could anyhow be a start, with SciGlass calculations running on the BAM side and INterGlaD on the SCHOTT side).
What do you think @Atilaac @Gitdowski
P.S. One could in principle relax the constraint to glasses with measured density OR elastic constants OR CTE. But for 1000 comparisons for each property that means up to 3x the number of calculations + it means you cannot necessarily look at different properties predicted for the same glass and see how their errors differ.