Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition post-processing pipeline infrastructure to the Hamilton library #387

Merged
merged 11 commits into from Mar 1, 2024

Conversation

leroyvn
Copy link
Member

@leroyvn leroyvn commented Jan 16, 2024

Description

Not ready for merge

This PR transitions our post-processing pipelines to the Hamilton library. This addresses the following issues we had:

  • The old pipeline steps are objects with hidden state, which makes understanding what they do, testing them and debugging them very complicated.
  • Adding data checks would have required additional implementation work; Hamilton has a validation framework, which we don't use yet, but it will be there when we'll have time to work on this.
  • Hamilton has graph representation features which we can now use to visualize the post-processing pipeline.
  • It is now very simple to intercept intermediate data and execute only part of the pipeline. This, in particular, makes it possible to get raw data and delay post-processing if the user wants to do so.

Changes are as follows:

  • Registered modes are now stored as Mode instances (allows to build filtered mode lists based on mode flags).
  • The mi_render() function was fixed to solved an array-to-int conversion bug.
  • DistantFluxMeasure can now be instantiated with the factory alias distantflux.
  • The BinSet class's interface is extended with properties returning bin bounds and centres.
  • The BinSet and WavelengthSet classes now inherit from a common parent type SpectralSet. This is required to use these types in a Hamilton pipeline.
  • The spectral subpackage got a testing makeover and underwent some refactoring to make it easier to maintain.
  • The MultiDeltaSpectrum underwent refactoring and a testing makeover for improved maintainability.
  • The post-processing infrastructure now uses the Hamilton framework.
  • A detailed tutorial on post-processing pipelines is now included in the documentation.

To do

  • Add API docs
  • Verify that tutorials still run as expected
  • Add user guide / tutorial
  • Document changes

Checklist

  • The code follows the relevant coding guidelines
  • The code generates no new warnings
  • The code is appropriately documented
  • The code is tested to prove its function
  • The feature branch is rebased on the current state of the main branch
  • I updated the change log if relevant
  • I give permission that the Eradiate project may redistribute my contributions under the terms of its license

@leroyvn leroyvn force-pushed the hamilton branch 2 times, most recently from 5abcec7 to 2ccd7c5 Compare January 18, 2024 17:00
@leroyvn leroyvn changed the base branch from main to next January 18, 2024 17:01
@leroyvn leroyvn force-pushed the hamilton branch 3 times, most recently from 2da5eea to 3052f69 Compare January 24, 2024 08:49
@leroyvn leroyvn force-pushed the hamilton branch 8 times, most recently from a354683 to b4d9e1c Compare March 1, 2024 11:13
@leroyvn leroyvn marked this pull request as ready for review March 1, 2024 12:16
@leroyvn leroyvn force-pushed the hamilton branch 3 times, most recently from ad3ca43 to 20acde2 Compare March 1, 2024 15:34
This commit modifies the mode handling infrastructure in two ways:

* It changes the mode registry value type from dict to Mode. This, in
  practice, has the modes() function return Mode instances instead of
  the keyword arguments used to generate them.
* It adds the possibility to filter the output of the `modes()`,
  function typically using lambdas.
… WavelengthSet

This commit adds several properties to the `BinSet` class. In 
particular, the `wavelengths` property returns the central wavelengths 
of all bins in the bin set and serves a purpose similar to
`WavelengthSet.wavelengths`.

For compatibility with the Hamilton framework, we also introduce a 
parent class for `BinSet` and `WavelengthSet`.
…ate bin selection tests

This commit contains three changes:

* The BinSet.arange() function is refactored to leverage vector 
  calculus idioms and allow passing scalar values.
* The MultiDeltaSpectrum class is fixed (bin selection routine had a bug
  and would raise if many wavelengths were passed), refactored (the 
  `wavelengths` constructor argument is now required) and tested.
* The tests for bin selection methods are relocated together with their
  parent class and refactored (there was a lot of fat to cut!).
The bin set generation logic was split based on argument type using
single dispatch. This implementation was overly complex, given that we 
only need to handle xarray datasets and lists of them. The new 
implementation is much simpler and only addresses the most general case,
wrapping datasets into a list if necessary.
@leroyvn leroyvn merged commit d0a83dd into eradiate:next Mar 1, 2024
@leroyvn leroyvn deleted the hamilton branch March 1, 2024 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant