v0.2.3: Spectrum readers, spectroelectrochemistry refactor, CyclicVoltammogram.plot_cycles #73

ScottSoren · 2022-05-18T08:18:09Z

This PR includes a few small updates for an 0.2.3 release. They are described in NEXT_CHANGES.rst. Briefly:

A reader for .xrdml files saved by e.g. PanAnalytical X-ray diffractometers (example here), whcih returns a plain Spectrum object
This makes use of pythons xml library. I want to improve it after finishing this XML tutorial. For now I just got it to work as quick as possible.
A reader for .avg files exported from the Batch ThermoFisher's Avantage Software (example here). This is copied with the absolute minimum necessary modification from old code I had lying around.
A new method called CyclicVoltammogram.plot_cycles() which makes beautiful colored stuff like this figure. Nice selling point of ixdat to use in demos :) ... however it should probably be moved to a Plotter
A minor debug of the PV Mass Spec ("pfeiffer") reader.

@Ejler , if you have time to look at src/ixdat/readers/xrdml.py and especially .../avantage.py, that would be great. I know you have more advanced code for reading the avantage files, which would be great to replace this code with in a future PR :)

@KennethNielsen , if you could just quickly check that the code is readable, that would be great. It's probably not worth going in detail on the structure readers, which I know can be improved. But what is important, I think, is the way the Spectrum object is initialized.

It may be bad style, but I think these small additions are important for demo'ing ixdat, and so I had people install a pre-release with this unreviewed code for the DTU demo yesterday. That's why I'm PR'ing now even though I know the code could use improvements... and docs and tests are neglected to fall yet further behind.

ScottSoren · 2022-06-07T12:33:14Z

Hi @KennethNielsen and @Caiwu-L ,

This is a bit bigger PR which will lead to ixdat 0.2.3 release. I think it's important to get it merged quickly to avoid a long-lived feature branch. I think it's important to get both your approval or critique on the basic data structures represented here, summarized below. In addition, I could use from @Caiwu-L a check that the "qexafs" reader makes sense (things such as using "QexafsFFI0" for y when called with technique=="XAS"), and from @KennethNielsen an eye towards whether the structure will be easily adaptable for EC-MS with mass full scans.

I've been thorough with NEXT_CHANGES.rst, so please read that for an overview of what changed from a user's perspective. Here I'll focus on what has changed in the guts and the implications it might have.

The main focus in this (updated) PR is spectrometric data. In ixdat, a spectrum is a 1-D data series (y) that lives in a space defined by one other data series (x), together with associated metadata (including a tstamp). The series x should not be resolved in time (if it is, ixdat would call this a measurement with a single value series). This structure is interfaced by the Spectrum class, which stores its raw data in a single Field.
Ixdat has two structures (as of this PR) of treating collections of related spectra, i.e. spectra sharing the same x:

The first structure is for spectra that are in principle taken simultaneously. This results in a MultiSpectrum. Here the y values might have different meanings and different units, but each spectrum should have the same tstamp. This is a useful structure for beamline data, where it is the beam itself and the positioning of the monocromator which are "expensive" (in terms of money and time, respectively), and so it can pay to have lots of detectors taking different kinds of data as the energy is scanned. A MultiSpectrum's only use so far is to list its spectra and return a requested Spectrum when indexed.
The second structure is for spectra of the same type taken at various times. This results in a SpectrumSeries. The data in a SpectrumSeries is a two-dimensional y which lives on a space defined by the spectrum x and by a timeseries t. A SpectrumSeries can be indexed by an integer n to return the nth spectrum in the series.

Both MultiSpectrum and SpectrumSeries have a constructor method called from_spectrum_list() which does what the name implies. Adding two Spectrum objects results in a SpectrumSeries. Adding a Measurement and a SpectrumSeries results in a SpectroMeasurement, described below.

With this PR, the basic structure for combining a SpectrumSeries with time-resolved spectral data, and a Measurement with a list of time-resolved values, is a SpectroMeasurement. This is implemented as a subclass of Measurement with an attached SpectrumSeries. The reason for this choice, rather than vice versa, is that Measurement is a more tried-and-tested structure in ixdat, and will usually contain the "control variables" in hyphenated techniques like EC-XAS. This also matches a design decision taken a year ago at Spectro Inlets, here, slide 13, and corrects the faulty design I used (in my defense, was led into by the raw data file structure) later last year for EC-Optical spectroelectrochemistry data.

SpectroECMeasurement is, with this PR, implemented as a specific case SpectroMeasurement but still as general as possible. The peculiarities to the EC-Optical measurements done at MSRH, such as always plotting optical density (-log(y/y0)) rather than raw data (y) are moved to a sub-class of SpectroECMeasurement, ECOpticalMeasurement.

The relevant tables from ixdat's are drawn here with dbdiagram. A couple things might not be immediately intuitive:

Spectrum and SpectrumSeries objects are both in the table "spectra". Their distinction is defined by the dimensionality of their Field only, and so does not require a second table.
The Field containing the 1-D y data of a Spectrum, the Field containing the 2-D y data of a SpectrumSeries, the DataSeries containing the 1-D x data of either, and the TimeSereis containing the 1-D t data of a SpectrumSeries's Field are all in the table "data_series". Their differences are defined by whether or not their id appears in one or more rows of the "axes_series" table. For a 1-D field, the "axes_series" table will link it to one other data series. For a 2-D field, the "axes_series" table will link it to two other data series.

This is reflected, approximately, in the class attributes extra_column_attributes and extra_linkers in the new classes. However, this will be overwritten following the "metaprogramming" changes in progress in #75 and #83. I have not tested any of the new classes against saving and loading in the directory backend, and expect they would fail (though I have tested ECOpticalMeasurement against exporting and re-importing). So please ignore mistakes in the present defining tables code.

To indicate how this structure lends itself to be built further upon, consider the following:

ECXASMeasurement, which is now implemented as a place-holder adding nothing to its parent class SpectroECMeasurement, could have methods for e.g. tracking the absorption edge as a function of potential.
Adding XRD or XPS spectra will also result in a SpectrumSeries, so minimal new code would be needed for basic support in-situ EC-XRD or EC-XPS data (just adding the hyphenated technique to TECHNIQUE_CLASSES would suffice).

A few coding challenges faced here, which you might have some ideas how to resolve:

I can't figure out how to structure the Plotters - should they inherit from each other like the techniques (implemented now with ECOpticalPlotter inheriting from SpectroECMeasurement) or should they own instances each other (implemented now for ECMSPlotter , an instance of which has its own ECPlotter and MSPlotter to generate the lower and upper panels, respectively).
__init__ statements give some trouble. I had to do the same awkward call of both parent class __init__s with redundant arguments for SpectroECMeasurement as is done with ECMSMeasurement.
Another __init__ problem: if a Measurement's Plotter is initiated in it's init (which is needed for the way we get its plot functions to have the right docstrings), then the Plotter doesn't have access to all of the Measurement's attributes. Thus the awkward initiation of a SpectrumSeriesPlotter without its SpectrumSeries on line 18 of sec_plotter.py

Otherwise, it was all pretty nice to code.
Hope it makes for a nice review!

In addition to the heavy spectrum stuff, there are also a few treats in here for present EC and EC-MS users (main one being `CyclicVoltammogram.plot_cycles``) and XPS and XRD readers for me which still should be improved eventually as per the opening comment of this PR, but I now think it's out of scope for this PR. I hope we can get this merged quick!

Thanks :)

KennethNielsen

Ok, so I finished. A couple general things. I couldn't quite figure out the expected level of the read, since I couldn't figure out whether someone else would read it/has read it thoroughly. I read it like I normally would, but didn't check the details of whether all the semantics make sense with regards to the objects used i.e. read through of the semantics mostly dealt with whether the code locally made sense.

That being said. A few general points:

I remember that you before used keyword-only arguments to great effect. I think that is a magnificent idea. Both because we sometimes change signatures and because it improves readability at call site. So maybe consider sprinkling some of that on. Especially e.g. I think more or less all plot functions should make all KW args KWO args.
As far as the DB diagram. Besides from some concerns about what is possible (which we will talk about later) I would suggest to make a calculated spelling error and call the table "spectrums", so that we can keep the id_columns naming conventions simple, as per my comments in the other PR. I know that misspelling on purpose will annoy some people, but in this case, it really is practical.
I think maybe you ran out of steam, but the code seems to be a little light on the doc strings on the user facing methods towards the end.

NEXT_CHANGES.rst

development_scripts/reader_testers/test_avantage_reader.py

development_scripts/reader_testers/test_qexafs_reader.py

src/ixdat/measurements.py

src/ixdat/readers/xrdml.py

src/ixdat/spectra.py

KennethNielsen · 2022-06-08T13:37:59Z

src/ixdat/spectra.py

+        self._xseries = None
+        self._spectra = None
+
+    @property


I confess to being a little confused about the way that fields and xseries interact with self._xseries. It seems like there are different algorithms for calculating what will be stored in the property, depending on whether you as for fields or xseries first.

No wait, it is the same result. Then I'm more curious about why it is being calculated in fields and all and not just either not handled at all or retrieved via the xseries property.

Oh, you once again caught my sloppy not-propery-tested coding. I've fixed up the method and added some comments. But the mistakes I can see had to do with the assert being in the wrong place and a forgotten _, and don't actually address your specific question. To that:

_xseries was set when the property fields was called. All the fields have to thave the same or equivalent xseries. So the property xseries did indeed work like the typical cache. I've rewritten it though to be explicit.

Ah ok. I think my main trigger was the part where the cache was being written on in two methods. That is now gone, but what I'm still (subjectively) slightly wondering is if it wouldn't be cleaner with:

def fields(self): """Make sure Fields are loaded and have the same xseries""" xseries = self.xseries for i, f in enumerate(self._fields): if isinstance(f, PlaceHolderObject): # load or "unpack" any fields for which only the id's were loaded: self._fields[i] = f.get_object() # If all the xseries are the same, every field shoud have an xseries identical to the first one # retrieved via self.xseries assert self._fields[i].axes_series[0] == xseries # Now we've loaded any place-holder fields and checked their xseries are equal. return self._fields

That does look cleaner, but the problem is that the self.xseries property calls self.fields the first time, so it would be an infinite recursion.

src/ixdat/spectra.py

ScottSoren · 2022-06-08T19:04:17Z

Hi @KennethNielsen , thanks for the review! As usual, you caught a lot of mistakes and omissions and the code improved as a result. I've responded to each of the comments and only left a few unresolved where I couldn't immediately implement the suggestions.
I completely agree with your top-level comments as well. I've made all the spectrum and spectromesurement plotting functions key-word only, and I've changed the table name from "spectra" to "spectrums". I added a note on grammer in the spectra.py module docstring justifying the use of two distinct plurals for spectrum. I've updated the dbdiagram:
https://dbdiagram.io/d/629f3b5b54ce2635277667f5
Let me know if it looks good to you.

KennethNielsen

Looks great. There is one comment that I unresolved, but with a subjective comment, so you decide whether you think it is better. No-matter what, I consider it ready for merge on my account.

KennethNielsen · 2022-06-09T11:38:32Z

src/ixdat/spectra.py

+        self._xseries = None
+        self._spectra = None
+
+    @property


Ah ok. I think my main trigger was the part where the cache was being written on in two methods. That is now gone, but what I'm still (subjectively) slightly wondering is if it wouldn't be cleaner with:

def fields(self): """Make sure Fields are loaded and have the same xseries""" xseries = self.xseries for i, f in enumerate(self._fields): if isinstance(f, PlaceHolderObject): # load or "unpack" any fields for which only the id's were loaded: self._fields[i] = f.get_object() # If all the xseries are the same, every field shoud have an xseries identical to the first one # retrieved via self.xseries assert self._fields[i].axes_series[0] == xseries # Now we've loaded any place-holder fields and checked their xseries are equal. return self._fields

Caiwu-L

I mainly just reviewed the qexafs reader. It all make sense to me, haven't find problems. Using QexafsFFI0 as y when calling technique 'XAS' is right for sample measuring under fluorescence mode, and this mode is usually used in operando EC-XAS experiment.
There is also a transition mode which is usually used to measure ex-situ standard sample. It has the same data format with the current data, just need to use lnIt/I0 instead of QexafsFFI0 as y in this case.
And finally, l failed to figure out the 'span problem' we saw today in spectrum_plotter :)

v0.2.3: Spectrum readers, spectroelectrochemistry refactor, CyclicVoltammogram.plot_cycles

ScottSoren and others added 8 commits April 12, 2022 14:18

write an xrdml reader

4b69bcf

Merge branch 'main' into xrdml

b7a52a3

place for avantage reader

c853c17

can specify label for MSPlotter.plot_measurement

458f262

avantage reader with old 2019 pyThetaProbe code

9e3feba

CyclicVoltammagram.plot_cycles

1f2e900

ixdat 0.2.3dev

404d0e9

debug PVMS reader, prepare pre-release

8e8973a

ScottSoren requested a review from KennethNielsen May 18, 2022 08:18

pass flake8

a022928

ScottSoren mentioned this pull request May 18, 2022

Error when plotting ECMSMeasurement #70

Closed

ScottSoren added 16 commits May 18, 2022 10:41

select_values docstr #72 and plot_measurement warnings #70

487a5f6

prep for qexafs reader

d0820eb

XAS Spectrum and MultiSpectrum reading

80caa0a

Spectrum.read_set for spectrum series

af330a4

get_file_list helper function for read_set's

edf77fd

adding: measurement + spectrum_series

a17cfa4

fix spectrum plotter y direction

38c6d9f

hyphenation of EC and XAS

d5a87a8

SpectroMeasurement inheritance structure

664f011

get MsrhSECReader working with updated SEC structure

3152238

aux files as objects not just series

c67f726

update Measurement and SpectrumSeries adding

a0353b0

fix SpectroMeasurement plotters

d2becc2

fix multispectrum indexing names

6166e6a

update NEXT_CHANGES

f66e1d3

remove unused imports

bcb2e99

ScottSoren requested a review from Caiwu-L June 7, 2022 12:35

ScottSoren changed the title ~~v0.2.3. Two spectrum readers, CV.plot_cycles, PVMS debug~~ v0.2.3: Spectrum readers, spectroelectrochemistry refactor, CyclicVoltammogram.plot_cycles Jun 7, 2022

table definitions update

ff54d5f

ScottSoren force-pushed the xrdml branch from be32cb2 to ff54d5f Compare June 7, 2022 15:22

KennethNielsen requested changes Jun 8, 2022

View reviewed changes

ScottSoren added 2 commits June 8, 2022 17:49

implement Kenneth's comments on #73

f9131cc

finish implementing Kenneth's review of #73

a80efba

ScottSoren requested a review from KennethNielsen June 8, 2022 19:05

KennethNielsen approved these changes Jun 9, 2022

View reviewed changes

Caiwu-L reviewed Jun 9, 2022

View reviewed changes

ScottSoren and others added 2 commits June 10, 2022 14:43

debug: xspan and MultiSPectrum

8a55120

Merge branch 'main' into xrdml

e5e4830

ScottSoren merged commit fc8f267 into main Jun 10, 2022

ScottSoren pushed a commit that referenced this pull request Jun 10, 2022

Merge pull request #73 from ixdat/xrdml

6e63b56

v0.2.3: Spectrum readers, spectroelectrochemistry refactor, CyclicVoltammogram.plot_cycles

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.3: Spectrum readers, spectroelectrochemistry refactor, CyclicVoltammogram.plot_cycles #73

v0.2.3: Spectrum readers, spectroelectrochemistry refactor, CyclicVoltammogram.plot_cycles #73

ScottSoren commented May 18, 2022

ScottSoren commented Jun 7, 2022 •

edited

KennethNielsen left a comment

KennethNielsen Jun 8, 2022

ScottSoren Jun 8, 2022 •

edited

KennethNielsen Jun 9, 2022

ScottSoren Jun 9, 2022 •

edited

ScottSoren commented Jun 8, 2022

KennethNielsen left a comment

KennethNielsen Jun 9, 2022

Caiwu-L left a comment •

edited

v0.2.3: Spectrum readers, spectroelectrochemistry refactor, CyclicVoltammogram.plot_cycles #73

v0.2.3: Spectrum readers, spectroelectrochemistry refactor, CyclicVoltammogram.plot_cycles #73

Conversation

ScottSoren commented May 18, 2022

ScottSoren commented Jun 7, 2022 • edited

KennethNielsen left a comment

Choose a reason for hiding this comment

KennethNielsen Jun 8, 2022

Choose a reason for hiding this comment

ScottSoren Jun 8, 2022 • edited

Choose a reason for hiding this comment

KennethNielsen Jun 9, 2022

Choose a reason for hiding this comment

ScottSoren Jun 9, 2022 • edited

Choose a reason for hiding this comment

ScottSoren commented Jun 8, 2022

KennethNielsen left a comment

Choose a reason for hiding this comment

KennethNielsen Jun 9, 2022

Choose a reason for hiding this comment

Caiwu-L left a comment • edited

Choose a reason for hiding this comment

ScottSoren commented Jun 7, 2022 •

edited

ScottSoren Jun 8, 2022 •

edited

ScottSoren Jun 9, 2022 •

edited

Caiwu-L left a comment •

edited