Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement units? #146

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

How to implement units? #146

wants to merge 2 commits into from

Conversation

ScottSoren
Copy link
Member

@ScottSoren ScottSoren commented Dec 11, 2023

Long overdue enhancement finally in progress!

This is a request for comment on how best to implement units.

So far I've just given the Unit class that had been included without content in ixdat an attribute u which is the pint unit, and wrote functions for the minimal amount of manipulation such that grab can take a unit_name and a t_unit_name as an arguments.

The API enhancement is demo'd here: https://github.com/ixdat/ixdat/blob/units/development_scripts/demo_units.py . Try it out!

(A lot of the code change has to do with debugging the way the references to axes_series are passed on when a Field is copied. Before, it would erroneously try to load an axis series from a directory backend. Now it correctly retrieves it from the MemoryBackend.)

Is this the way we want units to be used in ixdat? What additional methods do we need? As an incomplete lists (please add!):

  1. all plotting functions which take units as inputs should use pint for the unit conversions (MS plotter doesn't allow "mA" as unit  #65)
  2. A unit specification should be able to determine whether normalized or raw current is returned or plotted (Unit specification on reused axes #130) Help! How do you see this being implemented??
  3. DataSeries appending should try a unit conversion when units don't match and raise an error if a conversion can't get them to match
  4. More plotting functions should take units arguments (Enable choosing "unit" also in EC plotter  #63) .... is there a smart way to implement this for many plotting functions at once?

@AnnaWiniwarter
Copy link
Contributor

Hi Soren,

Amazing that you're tackling this finally. pint looks really good!
I think the simple API for grab makes sense.
I think in addition to that there should be a global way to set your prefered units for a Measurement object. Like, when you import (or any time after), you should be able to choose units for the data, very similar to how you add a calibration (or a background once we've implemented that). So that then, when you write Measurement.plot() it will automaticallt plot the time in hours and the current in mA. Maybe even pass the order of magnitude on, if you do anything to the data. I.e. if you select mA for the MS signal, and you calibrate it, it should choose nmol/s (or mmol/s if that's less confusing, but right now its A to umol/s, so thats why I thought it might make sense to keep the scaling factor similar).
Does this make sense?

Ad your points:

  1. Yes, agreed. Pint should be everywhere.
  2. No idea how to implement (unless, can you make it part of the label? and then read in the label when you add to an axis?) but should definitely be there
  3. Example for that? Is that a real issue?
  4. Maybe this would be solved with the units being a property of the Measurment?

@AlexanderKrabbe
Copy link
Contributor

I really think pint is a nice unit library.

I think the unit and unit conversion should lie in the measurement class and partially in the plotting classes and not when one use .grab()

I think it is more intuitive that Measurement can set a specific unit for the various dataseries Measurement.set_unit(unit_names)

And it seems like pint has a nice implementation of plotting on Matplotlib plots using units: https://pint.readthedocs.io/en/stable/user/plotting.html

It looks like, there is a lot of converting in the TimeSeries if a TimeSeries is stored in Hours and plotted in Hours then the tseries is first converted to seconds to add the tstamp of the tseries and subtract the tstamp of the measurement before converted back into hours all three operation creating a new copy of the data. I don't know if this is memory costly, but it seems like it could be more elegant.

ad 1, by using pint, should we then incoporate it in the data or simply just in the unit? example now is that you use ureg.convert(DataSeries, "IXDAT UNIT NAME", "USER UNIT NAME") instead of DataSeries * ureg.unit_name which would give DataSeries.to("USER UNIT NAME") for conversion. This might be an issue to store the data in a database, if the data is too tightly integrated with pint becoming a <Quantity(1e-9, "A")>?

ad 2, I am not sure, what the issue really is here? If the data is normalised to something wouldn't this change the unit, hence this is already in the units if it is raw or normalised data?

ad 3. This would be the optimal operation when appending new data.

ad 4. If the pint plotting with matplotlib is used I think this could be implemented in the base_mpl_plotter instead of the individual plotting classes?

@ScottSoren
Copy link
Member Author

Thanks, @AlexanderKrabbe and @AnnaWiniwarter for the thoughts here!

Responses

  • On default units. Brilliant, @AnnaWiniwarter and @AlexanderKrabbe!. I think it will just be a dictionary measurement.default_units which grab checks if no units are specified. So, only if no unit is requested and no default unit is available will grab return the numpy array in the units it's stored in (the default in the database). Measurement classes could start with a hard-coded default units dictionary that the user could overwrite.
    • API: meas.set_default_unit("current", "mA/cm^2") and, to check, print(meas.default_units).
      MSMeasurement would have to have an extra method to simultaneously put in a default unit for all (or a list of) masses.
    • API: ms_meas.set_default_signal_unit("nA") and ms.set_default_flux_unit("pmol/s", mol_list=["H2", "O2"])
  • On why grab converts time data to seconds first and then back to the requested unit:
    When an ixdat Measurement (meas) plots, exports, or grabs its data, time is always with respect to meas.tstamp. However, the absolute time of each data point is saved in the relevant TimeSeries (tseries) by means of tseries.tstamp. In each case, tstamp is unix time in [s], because that is how unix time is always stated. However, the time data of the tseries might be in another unit, and the user might be requesting data in another unit. Two ways to deal with this.
      1. put the pint unit ureg("s") onto both tstamps before adding the one from tseries and subtracting the one from meas, and let pint do the conversions needed to correctly add and subtract.
      1. convert everything to [s] before adding and subtracting the magnitudes.
        I've gone with (2), but could do (1) instead if you have an argument why it's better or more intuitive?
  1. @AlexanderKrabbe , I'm not sure I understand - is the question why the pint unit of a DataSeries (series) is series.unit.u instead of just series.unit? It could be the latter, I guess. Leaving the existing Unit class in ixdat as a wrapper just felt more safe... I can't explain exactly why right now, but things that come to mind are (i) deciding what to do when units are None and (ii) making sure that series.unit_name (which is what gets saved) is the most sensible format for the database. Both of those could be handled by more extensive property getters and setters of DataSeries, but having a specialized class to wrap it seems more pythonic for now. But pint could be better integrated. One low-hanging fruit is just a property DataSeries.quantity which returns self.data * self.unit.u. Other ideas of how to better integrate?
  2. Yes, for example, "mA/cm^2" is normalized current and "mA" is non-normalized current. But who (what method) in ixdat should have the job of parsing those strings and deciding which DataSeries to look up or which Calibration to look for the quantity in?
  3. Biologic is kind to us and makes current in all of its text exports in mA. But it's not hard to imagine a potentiostat that exports in different units depending on the technique or I-range, on files that we could want to append after. It would also serve as a sanity check.
  4. Ooh, that matplotlib with pint thing is very cool. Will give it a try :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants