Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.2.3: Spectrum readers, spectroelectrochemistry refactor, CyclicVoltammogram.plot_cycles #73

Merged
merged 30 commits into from Jun 10, 2022

Conversation

ScottSoren
Copy link
Member

This PR includes a few small updates for an 0.2.3 release. They are described in NEXT_CHANGES.rst. Briefly:

  • A reader for .xrdml files saved by e.g. PanAnalytical X-ray diffractometers (example here), whcih returns a plain Spectrum object
    This makes use of pythons xml library. I want to improve it after finishing this XML tutorial. For now I just got it to work as quick as possible.
  • A reader for .avg files exported from the Batch ThermoFisher's Avantage Software (example here). This is copied with the absolute minimum necessary modification from old code I had lying around.
  • A new method called CyclicVoltammogram.plot_cycles() which makes beautiful colored stuff like this figure. Nice selling point of ixdat to use in demos :) ... however it should probably be moved to a Plotter
  • A minor debug of the PV Mass Spec ("pfeiffer") reader.

@Ejler , if you have time to look at src/ixdat/readers/xrdml.py and especially .../avantage.py, that would be great. I know you have more advanced code for reading the avantage files, which would be great to replace this code with in a future PR :)

@KennethNielsen , if you could just quickly check that the code is readable, that would be great. It's probably not worth going in detail on the structure readers, which I know can be improved. But what is important, I think, is the way the Spectrum object is initialized.

It may be bad style, but I think these small additions are important for demo'ing ixdat, and so I had people install a pre-release with this unreviewed code for the DTU demo yesterday. That's why I'm PR'ing now even though I know the code could use improvements... and docs and tests are neglected to fall yet further behind.

@ScottSoren
Copy link
Member Author

ScottSoren commented Jun 7, 2022

Hi @KennethNielsen and @Caiwu-L ,

This is a bit bigger PR which will lead to ixdat 0.2.3 release. I think it's important to get it merged quickly to avoid a long-lived feature branch. I think it's important to get both your approval or critique on the basic data structures represented here, summarized below. In addition, I could use from @Caiwu-L a check that the "qexafs" reader makes sense (things such as using "QexafsFFI0" for y when called with technique=="XAS"), and from @KennethNielsen an eye towards whether the structure will be easily adaptable for EC-MS with mass full scans.

I've been thorough with NEXT_CHANGES.rst, so please read that for an overview of what changed from a user's perspective. Here I'll focus on what has changed in the guts and the implications it might have.

The main focus in this (updated) PR is spectrometric data. In ixdat, a spectrum is a 1-D data series (y) that lives in a space defined by one other data series (x), together with associated metadata (including a tstamp). The series x should not be resolved in time (if it is, ixdat would call this a measurement with a single value series). This structure is interfaced by the Spectrum class, which stores its raw data in a single Field.
Ixdat has two structures (as of this PR) of treating collections of related spectra, i.e. spectra sharing the same x:

  • The first structure is for spectra that are in principle taken simultaneously. This results in a MultiSpectrum. Here the y values might have different meanings and different units, but each spectrum should have the same tstamp. This is a useful structure for beamline data, where it is the beam itself and the positioning of the monocromator which are "expensive" (in terms of money and time, respectively), and so it can pay to have lots of detectors taking different kinds of data as the energy is scanned. A MultiSpectrum's only use so far is to list its spectra and return a requested Spectrum when indexed.
  • The second structure is for spectra of the same type taken at various times. This results in a SpectrumSeries. The data in a SpectrumSeries is a two-dimensional y which lives on a space defined by the spectrum x and by a timeseries t. A SpectrumSeries can be indexed by an integer n to return the nth spectrum in the series.

Both MultiSpectrum and SpectrumSeries have a constructor method called from_spectrum_list() which does what the name implies. Adding two Spectrum objects results in a SpectrumSeries. Adding a Measurement and a SpectrumSeries results in a SpectroMeasurement, described below.

With this PR, the basic structure for combining a SpectrumSeries with time-resolved spectral data, and a Measurement with a list of time-resolved values, is a SpectroMeasurement. This is implemented as a subclass of Measurement with an attached SpectrumSeries. The reason for this choice, rather than vice versa, is that Measurement is a more tried-and-tested structure in ixdat, and will usually contain the "control variables" in hyphenated techniques like EC-XAS. This also matches a design decision taken a year ago at Spectro Inlets, here, slide 13, and corrects the faulty design I used (in my defense, was led into by the raw data file structure) later last year for EC-Optical spectroelectrochemistry data.

SpectroECMeasurement is, with this PR, implemented as a specific case SpectroMeasurement but still as general as possible. The peculiarities to the EC-Optical measurements done at MSRH, such as always plotting optical density (-log(y/y0)) rather than raw data (y) are moved to a sub-class of SpectroECMeasurement, ECOpticalMeasurement.

The relevant tables from ixdat's are drawn here with dbdiagram. A couple things might not be immediately intuitive:

  • Spectrum and SpectrumSeries objects are both in the table "spectra". Their distinction is defined by the dimensionality of their Field only, and so does not require a second table.
  • The Field containing the 1-D y data of a Spectrum, the Field containing the 2-D y data of a SpectrumSeries, the DataSeries containing the 1-D x data of either, and the TimeSereis containing the 1-D t data of a SpectrumSeries's Field are all in the table "data_series". Their differences are defined by whether or not their id appears in one or more rows of the "axes_series" table. For a 1-D field, the "axes_series" table will link it to one other data series. For a 2-D field, the "axes_series" table will link it to two other data series.

image

This is reflected, approximately, in the class attributes extra_column_attributes and extra_linkers in the new classes. However, this will be overwritten following the "metaprogramming" changes in progress in #75 and #83. I have not tested any of the new classes against saving and loading in the directory backend, and expect they would fail (though I have tested ECOpticalMeasurement against exporting and re-importing). So please ignore mistakes in the present defining tables code.

To indicate how this structure lends itself to be built further upon, consider the following:

  • ECXASMeasurement, which is now implemented as a place-holder adding nothing to its parent class SpectroECMeasurement, could have methods for e.g. tracking the absorption edge as a function of potential.
  • Adding XRD or XPS spectra will also result in a SpectrumSeries, so minimal new code would be needed for basic support in-situ EC-XRD or EC-XPS data (just adding the hyphenated technique to TECHNIQUE_CLASSES would suffice).

A few coding challenges faced here, which you might have some ideas how to resolve:

  • I can't figure out how to structure the Plotters - should they inherit from each other like the techniques (implemented now with ECOpticalPlotter inheriting from SpectroECMeasurement) or should they own instances each other (implemented now for ECMSPlotter , an instance of which has its own ECPlotter and MSPlotter to generate the lower and upper panels, respectively).
  • __init__ statements give some trouble. I had to do the same awkward call of both parent class __init__s with redundant arguments for SpectroECMeasurement as is done with ECMSMeasurement.
  • Another __init__ problem: if a Measurement's Plotter is initiated in it's init (which is needed for the way we get its plot functions to have the right docstrings), then the Plotter doesn't have access to all of the Measurement's attributes. Thus the awkward initiation of a SpectrumSeriesPlotter without its SpectrumSeries on line 18 of sec_plotter.py

Otherwise, it was all pretty nice to code.
Hope it makes for a nice review!

In addition to the heavy spectrum stuff, there are also a few treats in here for present EC and EC-MS users (main one being `CyclicVoltammogram.plot_cycles``) and XPS and XRD readers for me which still should be improved eventually as per the opening comment of this PR, but I now think it's out of scope for this PR. I hope we can get this merged quick!

Thanks :)

@ScottSoren ScottSoren requested a review from Caiwu-L June 7, 2022 12:35
@ScottSoren ScottSoren changed the title v0.2.3. Two spectrum readers, CV.plot_cycles, PVMS debug v0.2.3: Spectrum readers, spectroelectrochemistry refactor, CyclicVoltammogram.plot_cycles Jun 7, 2022
Copy link

@KennethNielsen KennethNielsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I finished. A couple general things. I couldn't quite figure out the expected level of the read, since I couldn't figure out whether someone else would read it/has read it thoroughly. I read it like I normally would, but didn't check the details of whether all the semantics make sense with regards to the objects used i.e. read through of the semantics mostly dealt with whether the code locally made sense.

That being said. A few general points:

  1. I remember that you before used keyword-only arguments to great effect. I think that is a magnificent idea. Both because we sometimes change signatures and because it improves readability at call site. So maybe consider sprinkling some of that on. Especially e.g. I think more or less all plot functions should make all KW args KWO args.
  2. As far as the DB diagram. Besides from some concerns about what is possible (which we will talk about later) I would suggest to make a calculated spelling error and call the table "spectrums", so that we can keep the id_columns naming conventions simple, as per my comments in the other PR. I know that misspelling on purpose will annoy some people, but in this case, it really is practical.
  3. I think maybe you ran out of steam, but the code seems to be a little light on the doc strings on the user facing methods towards the end.

NEXT_CHANGES.rst Outdated Show resolved Hide resolved
NEXT_CHANGES.rst Outdated Show resolved Hide resolved
src/ixdat/measurements.py Show resolved Hide resolved
src/ixdat/readers/xrdml.py Outdated Show resolved Hide resolved
src/ixdat/spectra.py Show resolved Hide resolved
self._xseries = None
self._spectra = None

@property

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confess to being a little confused about the way that fields and xseries interact with self._xseries. It seems like there are different algorithms for calculating what will be stored in the property, depending on whether you as for fields or xseries first.

No wait, it is the same result. Then I'm more curious about why it is being calculated in fields and all and not just either not handled at all or retrieved via the xseries property.

Copy link
Member Author

@ScottSoren ScottSoren Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you once again caught my sloppy not-propery-tested coding. I've fixed up the method and added some comments. But the mistakes I can see had to do with the assert being in the wrong place and a forgotten _, and don't actually address your specific question. To that:

_xseries was set when the property fields was called. All the fields have to thave the same or equivalent xseries. So the property xseries did indeed work like the typical cache. I've rewritten it though to be explicit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. I think my main trigger was the part where the cache was being written on in two methods. That is now gone, but what I'm still (subjectively) slightly wondering is if it wouldn't be cleaner with:

    def fields(self):
        """Make sure Fields are loaded and have the same xseries"""
        xseries = self.xseries
        for i, f in enumerate(self._fields):
            if isinstance(f, PlaceHolderObject):
                # load or "unpack" any fields for which only the id's were loaded:
                self._fields[i] = f.get_object()
                # If all the xseries are the same, every field shoud have an xseries identical to the first one
                # retrieved via self.xseries
                assert self._fields[i].axes_series[0] == xseries
        # Now we've loaded any place-holder fields and checked their xseries are equal.
        return self._fields

Copy link
Member Author

@ScottSoren ScottSoren Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does look cleaner, but the problem is that the self.xseries property calls self.fields the first time, so it would be an infinite recursion.

src/ixdat/spectra.py Show resolved Hide resolved
src/ixdat/spectra.py Outdated Show resolved Hide resolved
@ScottSoren
Copy link
Member Author

Hi @KennethNielsen , thanks for the review! As usual, you caught a lot of mistakes and omissions and the code improved as a result. I've responded to each of the comments and only left a few unresolved where I couldn't immediately implement the suggestions.
I completely agree with your top-level comments as well. I've made all the spectrum and spectromesurement plotting functions key-word only, and I've changed the table name from "spectra" to "spectrums". I added a note on grammer in the spectra.py module docstring justifying the use of two distinct plurals for spectrum. I've updated the dbdiagram:
https://dbdiagram.io/d/629f3b5b54ce2635277667f5
Let me know if it looks good to you.

Copy link

@KennethNielsen KennethNielsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. There is one comment that I unresolved, but with a subjective comment, so you decide whether you think it is better. No-matter what, I consider it ready for merge on my account.

self._xseries = None
self._spectra = None

@property

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. I think my main trigger was the part where the cache was being written on in two methods. That is now gone, but what I'm still (subjectively) slightly wondering is if it wouldn't be cleaner with:

    def fields(self):
        """Make sure Fields are loaded and have the same xseries"""
        xseries = self.xseries
        for i, f in enumerate(self._fields):
            if isinstance(f, PlaceHolderObject):
                # load or "unpack" any fields for which only the id's were loaded:
                self._fields[i] = f.get_object()
                # If all the xseries are the same, every field shoud have an xseries identical to the first one
                # retrieved via self.xseries
                assert self._fields[i].axes_series[0] == xseries
        # Now we've loaded any place-holder fields and checked their xseries are equal.
        return self._fields

Copy link

@Caiwu-L Caiwu-L left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly just reviewed the qexafs reader. It all make sense to me, haven't find problems. Using QexafsFFI0 as y when calling technique 'XAS' is right for sample measuring under fluorescence mode, and this mode is usually used in operando EC-XAS experiment.
There is also a transition mode which is usually used to measure ex-situ standard sample. It has the same data format with the current data, just need to use lnIt/I0 instead of QexafsFFI0 as y in this case.
And finally, l failed to figure out the 'span problem' we saw today in spectrum_plotter :)

@ScottSoren ScottSoren merged commit fc8f267 into main Jun 10, 2022
ScottSoren pushed a commit that referenced this pull request Jun 10, 2022
v0.2.3: Spectrum readers, spectroelectrochemistry refactor, CyclicVoltammogram.plot_cycles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants