Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate CifParser.get_structures() in favor of new parse_structures in which primitive defaults to False #3419

Merged
merged 10 commits into from Nov 8, 2023

Conversation

janosh
Copy link
Member

@janosh janosh commented Oct 24, 2023

Motivation

CifParser.get_structures('some.cif')[0] and Structure.from_file('some.cif') currently return different cells. And primitive=True seems like a bad default. Cell reduction seems like something the user should request explicitly.

Context

This came up in the MP ingestion of new ICSD structures added since 2019 with @esoteric-ephemera.

@shyuep @mkhorton Let me know if you disagree.

@janosh janosh added io Input/output functionality breaking Breaking change core Pymatgen core cif Crystallographic information file labels Oct 24, 2023
@JaGeo
Copy link
Member

JaGeo commented Oct 24, 2023

I fear this will break a lot of code! Can one alert users about this change?

@janosh
Copy link
Member Author

janosh commented Oct 24, 2023

@JaGeo Definitely! See 22f6201.

@shyuep
Copy link
Member

shyuep commented Oct 24, 2023

Pls do not do this kind of thing. No breaking change unless absolutely necessary.

@shyuep
Copy link
Member

shyuep commented Oct 24, 2023

We have lived with this for many years. Most people use Structure.from . Pls don't worry about this.

@mkhorton
Copy link
Member

Concur, we should leave it as is. I agree it’s a bad default but a breaking change is worse.

For the specific example, MP has had issues in the other direction in the past too; for example accidentally calculating supercells because a primitive transformation was not applied, or calculating awkwardly-defined cells when a more reasonable basis might exist etc. Any default might give unwanted results in some cases.

@shyuep
Copy link
Member

shyuep commented Oct 24, 2023

I will state this outright - any Pr that has a breaking change for end users must have at least two maintainer reviews. There is a balance to be struck here. There are a lot of things that can be done better in pymatgen but will also cause all hell to break loose in a lot of downstream code. We do not break user code unless it is extremely compelling.

@janosh
Copy link
Member Author

janosh commented Oct 25, 2023

I see conflicting arguments here. Either most people use Structure.from_file or this PR will break a lot of code. You're claiming both.

Good defaults are more important than no breaking changes imo. Breaking changes are a one time cost, a bad default will trip users up forever. If @esoteric-ephemera and myself can be bitten by this, it can happen to anyone.

Any default might give unwanted results in some cases.

@mkhorton If a user wants a primitive cell and forgets to reduce, that's their mistake. If a user loads a CIF file and expects it to remain as is but we silently reduce it, that's us laying a trap. We should not lay traps!

@shyuep
Copy link
Member

shyuep commented Oct 25, 2023

The point is that it is unnecessary. CifParser predates Structure.from_file. There is a lot of legacy code that uses that. Even if you are willing to commit to fixing all intra MP code, you cannot fix code from other users. So you are imposing a huge cost on the community just for the sake of API cleanliness. Structure.from_file is more modern and used by newer users.

In short, there is a lot of legacy stuff that can be fixed but will cause unnecessary headaches. It is pointless to do. Even if you want to do it, then you should do it instead in a NEW method name other than get_structures and then deprecate get_structures for a good long while.

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Nov 2, 2023

To chime in here, while most regular users of Pymatgen use .from_file, in my experience, less experienced users look up "CIF pymatgen" on Google and see the name CifParser so that's what they use. Arguably, more of my colleagues from my PhD group used this than .from_file until I told them otherwise. That said, yes, the choice of primitive=True by default is a bad one and is a very common issue among users of this routine. It's way too problematic of a change to make (without a deprecation warning) since it'll be difficult to detect a change until it's too late.

Case and point:

Me to ChatGPT: How do I read a CIF file with Pymatgen?

image

Obviously the syntax is wrong, but you can see how one might go from this to the CifParser...

@JaGeo
Copy link
Member

JaGeo commented Nov 2, 2023

I personally also think that breaking changes should be introduced very carefully, especially to such a core part.
We have many, many people following the updates on this repo and it might also be scary to see a lot of "breaking" statements. I am also always wondering/worrying if my code still works as intended afterwards.

@janosh
Copy link
Member Author

janosh commented Nov 2, 2023

Alright, I deprecated get_structures and point to a new parse_structures method in the deprecation notice. Only difference between get_structures and parse_structures is the default for primitive (True and False resp.).

Sounds like people would be ok with that? Happy to accommodate further suggestions.

@JaGeo
Copy link
Member

JaGeo commented Nov 2, 2023

I would be in general careful with breaking changes and leave this really for several versions.

I would btw point out more clearly that there is indeed a big difference in the default behaviour!

@janosh
Copy link
Member Author

janosh commented Nov 2, 2023

I would btw point out more clearly that there is indeed a big difference in the default behaviour!

Not sure I follow. Feel free to write a comment in the code what should be clearer.

@janosh janosh removed the breaking Breaking change label Nov 2, 2023
@janosh janosh changed the title Breaking: change default primitive=True to False in CifParser.get_structures() Deprecate CifParser.get_structures() in favor of new parse_structures in which primitive defaults to True Nov 2, 2023
@janosh janosh changed the title Deprecate CifParser.get_structures() in favor of new parse_structures in which primitive defaults to True Deprecate CifParser.get_structures() in favor of new parse_structures in which primitive defaults to False Nov 2, 2023
return self.parse_structures(*args, **kwargs)
E       TypeError: CifParser.parse_structures() got multiple values for argument 'primitive'
@janosh janosh enabled auto-merge (squash) November 8, 2023 15:51
@janosh janosh merged commit 9e27ca8 into master Nov 8, 2023
22 checks passed
@janosh janosh deleted the cif-parser-default-primitive-false branch November 8, 2023 16:13
@janosh janosh added api Application programming interface ux User experience labels Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Application programming interface cif Crystallographic information file core Pymatgen core io Input/output functionality ux User experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants