Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xps #2110

Merged
merged 4 commits into from Apr 20, 2021
Merged

Xps #2110

merged 4 commits into from Apr 20, 2021

Conversation

shyuep
Copy link
Member

@shyuep shyuep commented Apr 20, 2021

Port of Galore code for XPS generation from DOS.

@shyuep shyuep merged commit 0c92495 into master Apr 20, 2021
@shyuep shyuep deleted the xps branch April 20, 2021 14:16
@utf
Copy link
Member

utf commented Apr 20, 2021

Hi Shyue,

Is there a reason you didn't add galore as a dependency? Galore only has numpy, Matplotlib and SciPy as dependencies (all of which are dependencies of pymatgen already), so it is not much bloat.

If there is something stopping you from using galore as a library (such as the issue you raised here: SMTG-Bham/galore#28) I'd be happy to get that resolved. I think it is better for both packages if there isn't the duplication of data. For example, we occasionally find mistakes in the tabulated data and we are planning on adding more datasets in the future.

@shyuep
Copy link
Member Author

shyuep commented Apr 20, 2021

I guess the main reasons are:

  1. Smearing was already supported in the DOS object (although only Gaussian and not Lorentzian, though that is not difficult to add).
  2. pymatgen already supports other spectra (XANES for instance)
  3. There are certain code design choices I prefer, e.g., OOP, static constructors, etc.

The issue I reported in galore was just one. In the end, I found it more difficult than necessary to generate XPS from galore. I also feel that smearing should be part of the pymatgen default package since it can be used in many instances, rather than requiring a dependency to do smearing. I am not asking for any credit for this - in fact, the documentation explicitly points to the Galore citation.

@utf
Copy link
Member

utf commented Apr 20, 2021

Ok. It seems to me that all of those issues do not prevent pymatgen from using galore as a library? Your comments are instead reasons to use the pymatgen interface to galore over galore itself (which is a reasonable preference).

If pymatgen were to use galore as a library, I think there are two options:

  1. The simplest would be to just import the galore datasets rather than copying the data into pymatgen.
  2. Better, I think, would be to replace the entirety of the contents of XPS.from_dos to just use the galore api which does all of those things, and just return the XPS object at the end.

Either way, both of the options are preferable to copying the data into pymatgen, which has the downsides that I mentioned before. Namely, the data might have bugs and will be updated in the future.

@mkhorton
Copy link
Member

Hi @shyuep, another factor I noticed is that galore is a GPL licensed code, so we really can't import the data into pymatgen directly.

I'd definitely vote for using it as an optional dependency given that galore has minimal and overlapping dependencies with pymatgen. So something like what @utf suggested in point 2 above.

@shyuep
Copy link
Member Author

shyuep commented Apr 20, 2021

The dataset I used from galore is from a 1980s work. If needed, I can easily parse the data from the original work. If the license is an issue, that's what I will do. I do not wish to introduce another dependency on a function that already exists in pymatgen and a dataset that belongs in the public domain.

@mkhorton
Copy link
Member

Parsing, proofing and formatting a dataset is still an effort, I'm not sure it's in our best interests to re-invent the wheel here when a clean dependency exists that we could use. I think it's to our benefit to use solutions from the community when they exist and are well maintained.

@utf
Copy link
Member

utf commented Apr 20, 2021

To echo Matt's point. It was a lot of effort to parse those datasets (note that Galore includes XPS, UPS, and HAXPES weights with more in the pipeline) and several of those datasets are actually fitted and not just parsing of raw tabulated data.

I can imagine that a future version of pymatgen would want to include the UPS and HAXPES functionality of galore. So adding galore as an (optional) dependency will give a lot of features for a very small additional footprint. Note, the entire galore installation is only ~500 kb.

@shyuep
Copy link
Member Author

shyuep commented Apr 21, 2021

@utf At this juncture, I only need XPS. Further, smearing was already implemented in the DOS object since v1 of pymatgen. As of now, I don't see a reason to add a dependency, even an optional one, for this capability. If and when it is deemed that we need a lot more functionality from galore, I will add it as a dependency.

Pls note that adding any dependency in general requires a consideration of the benefits vs costs. For instance, even though galore only depends on numpy, scipy and matplotlib, there is maintenance there too. E.g., numpy and scipy may deprecate stuff and galore needs to be kept up. Galore may depend on a different version of numpy from pymatgen. Future python versions may also break stuff in either pymatgen or galore that needs to be fixed. Based on the last commit date, galore was last updated in Jul 2019. We have gone through many numpy/scipy versions since then. Further, I generally do not add dependencies for data, but rather only code functionality. Right now, I am not using any code functionality from galore.

Now, the only things I need are the cross-sections. If it bothers everyone so much that I am using the data file from galore (even though I attributed the entire piece of code to the galore publication), I will delete it and just get another student to parse it out. I just spent 5 mins opening up that 1985 Yeh paper in Acrobat and "Export to Plain Text" already yields a semi-usable Table 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants