Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] - Define new object organization (Data / Results / Base / Algorithm) #291

Open
wants to merge 50 commits into
base: main
Choose a base branch
from

Conversation

TomDonoghue
Copy link
Member

@TomDonoghue TomDonoghue commented Jul 19, 2023

This is an exploratory PR for re-organizing the objects in specparam.

New Objects

Data, Results, and Base objects

This PR defines new sets of objects Data and Results, which define the core properties of managing data and managing model results respectively. These establish the object layouts for spectral models (fitting function and algorithm agnostic). This includes no information about what model to fit or how to fit it - but means any model built on top of this base should always have the same basic API for common and data elements. This also allows for a conceptual distinction between 'Data' and 'Results' elements. There are separate objects for 1D / 2D / 3D data & associated results.

Here, we also add a set of Base objects, which combine Data and Results objects, as well as adding any elements that require the combination of data & results. At this point, the Base object defines the core API / interface for the data & results components, without having an algorithm or any fit functions to fit - establishing a common platform for fitting.

Note that none of these objects are really expected to be used independently or accessed / used by the user, and even if defining a new algorithm, etc, should only need to be inherited from without really interacting with. Related, the Data objects can be used to manage / check data independently, but the Results and Base objects don't actually offer the capacity to do anything without having a defined algorithm.

Algorithm Objects

With the Base objects, we now need to define an algorithm, with the idea being that a separate object defines the algorithm - and only the algorithm. In doing so, an Algorithm object should be aware of Base (meaning, it should use and expected the methods and attributes that Base define (as a combination of Data and Results). If so, then by combining a Base and Algorithm object we get a functional spectral parameterization object, with a common core for managing data & results, that operates with a specified algorithm.

In this current PR, the SpectralFitAlgorithm object (name could definitely be changed) defines the current algorithm. If this object is used as the Algorithm object, everything should end up the same as 1.0. The idea here is if someone wants to define a new fitting algorithm, then they can define a new Algorithm object, without having to implement any of the underlying basics, and getting the same overall API out at the end.

Note that in separating out the algorithm, this PR currently exposes the same set of 'public' settings as before, but now also does allow for passing through updates to 'private' settings (see the SpectralFitAlgorithm definition).

Model Objects

At this point, we have objects that define the basic data, results, and algorithm features - and just need to combine these into a user facing object, that also defines the user facing API (doing things like printing, plotting, getting results). This is what is now in the SpectralModel, SpectralGroupModel, etc objects (now in model and group files), which, from the user perspectives, should act exactly the same as they did prior to this update.

Note - as it stands, the SpectralModel object inherits directly from the current Algorithm object, and defines the API on top of that. To put in a new fitting algorithm, one has to redefine the model objects, updated to use a new algorithm object (see outline below) - by using the same Base (& additional model methods), the API should end up identical, with a different internal fit procedure. Further updates should be able to add helpers / shortcuts to redefine model objects with new fit functions, without too much work on the user end to reconstitute the objects.

Overall Recap

From the user perspective, nothing should change, objects such as SpectralModel should work the same.

From a developer / internal perspective, we now have the following object organization:

  • Data objects: structured object for managing data (with versions for 1D, 2D, 3D, etc)
  • Results objects: structured object for managing model fit results (with versions for 1D, 2D, 3D, etc)
  • Base objects: combinations of Data & Results, plus any additional methods that cover both aspects
  • Algorithm objects: defines a fitting procedure
  • Model objects: combines base & algorithm objs to create a model object (plus some additional user facing methods)

Code Examples

Importing New Base Objects

With the caveat that there is no real use case to using the Base objects directly - there objects can be imported and defined:

# Define and add data to a BaseData object
fd = BaseData()
fd.add_data(freqs, powers)

# Define a base results object, and check modes
fr = BaseResults('fixed', 'gaussian')
fr.aperiodic_mode, fr.periodic_mode

# Check can define common base objects
base = CommonBase()
base1d = BaseObject()
base2d = BaseObject2D()

Algorithm Objects: Opening Up Access to "private" parameters

One of the original goals of trying some of the updates here was to open up explicit access to the "private" settings, for particular use cases that might want to tweak them, without complicating the Model object for standard usage. This works for that - as before, we can pass through "private" settings into the standard model object, using **model_kwargs :

fm = SpectralModel(cf_bound=1.)
fm.fit(freqs, powers)

Where this is going: defining new fit algorithms

A key goal of this update is to support the generalization of fitting, and while I think this will help with supporting new / different fit functions (updates to come), the more obvious support this adds is that it should support adding a fit procedure relatively easily. At least, as compared to previous organization, I do not think it would be very easy / obvious to define a new fit function while maintaining other more general functionality, whereas as now an interested party should be able to define a brand new Algorithm object, and plug it in (inherit from Base / put into API object) and it should work. We can add a bit of guidance of what needs to be followed for this to work - but I think it should be quite tractable.

As an outline of what it would look like to add a new fit function, see this (functional, but schematic) example:

# Define a customized algorithm object, that updates the fit procedure
class CustomAlgorithm():
    def _fit(*args, **kwargs):
        print("what if... a new fit procedure model!")
        
# Create new model object
class CustomSpectralModel():

    def __init__(self, *args, **kwargs):
        """Initialize model object."""

        BaseObject.__init__(self, aperiodic_mode='custom', periodic_mode='custom',
                            debug_mode=model_kwargs.pop('debug_mode', False), verbose=verbose)

        CustomAlgorithm.__init__(self)

# Initialize and run fitting from custom algorithm
fm = CustomModel()
fm.fit(freqs, powers)

Additional notes and examples

Below are some additional notes and examples that are somewhat tangential (the code examples are partly from older explorations, and all still work - but they are more like cool / interesting extras than anything that is actually important in relation to this update.

Having a minimal Model object

In the current version, the Base objects can be joined with an Algorithm object to form an operational minimal fit object, without the additional methods that are defined on the full model objects (basically, without having the public facing API). It is now fairly easy to create a functional 'minimal fit object' - having a functionally complete object for fitting the model, with user facing methods on top of that (reports, plots, fancy parameter access, etc) - though, again, this might have no clear use case.

This code shows how to make a minimal fit object:

# Define a minimal model object, by adding the algorithm to the base
class MinimalSpectralModel(SpectralFitAlgorithm, BaseObject):
    
    def __init__(self, **model_kwargs):
        
        BaseObject.__init__(self, aperiodic_mode=model_kwargs.pop('aperiodic_mode', 'fixed'),
                            periodic_mode='gaussian', debug_mode=model_kwargs.pop('debug_mode', False),
                            verbose=model_kwargs.pop('verbose', True))
        SpectralFitAlgorithm.__init__(self, **model_kwargs)

# Define and run model fitting on the minimal model object
fb = MinimalSpectralModel()
fb.fit(freqs, powers)

Note: although more minimal of an object, the Base object does the exact same as the previous object for fitting, so is no faster. We did chat briefly on a "minimum / speedy" version of the object / fit function that reduces the number of checks, etc that are done. I don't think there's much to do on that - we are pretty efficient on that stuff, and our time is spent fitting the actual model.

Editing a fit procedure

More nuanced than redefining a brand new function, is what was included in the previous version about editing fit approaches (including redefining Base objects). I think the current layout suggests slightly different mechanics to do so, while supporting all the same possibilities.

For example, we can customize an existing Algorithm object to modify a fit:

# Define a custom algorithm object that aliases the robust AP fit to re-use the simple fit
class NewAlgoNoRobustAP(SpectralFitAlgorithm, BaseObject):
    def _robust_ap_fit(self, *args, **kwargs):
        return self._simple_ap_fit(*args, **kwargs)
# Use some type magic to update the object
CustomModelNoRobustAP = type('CustomModelNoRobustAP', (NewAlgoNoRobustAP,), dict(SpectralModel.__dict__)) 

Interesting result - with basic simulated data, it's less catastrophic a change than I anticipated 🤷:
Screen Shot 2023-08-07 at 1 59 50 PM

Before, there was an example showing something similar by retrofitting an existing object with a new Base (now Algorithm) object. If we go in the current direction, the recommended procedure for a brand new algorithm would probably be different - but FWIW, this does still work:

# Define a customized algorithm object, that updates the fit procedure
class CustomAlgorithm(SpectralFitAlgorithm, BaseObject):
    def fit(*args, **kwargs):
        print("what if... a new fit procedure model!")
        
# Register new CustomModel class that redefine the algorithm object for SpectralModel
CustomModel = type('CustomModel', (CustomAlgorithm,), dict(SpectralModel.__dict__)) 

# Initialize and run fitting from custom algorithm
fm = CustomModel()
fm.fit(freqs, powers)

Relevant previous text (for posterity):
The above object has an overloaded fit function, and so prints out the message ("what if... a new fit procedure model!"), but otherwise has everything defined on the normal SpectralModel object! This means that be filling out the new fit function with a new procedure, one could built out new algorithm approaches, using the toolbox, but without needing to edit inside the module.

A benefit of moving in this direction (as well as being a bit more modular) is that the explicit isolation of the actual fitting from all the "management" code opens up the possibility for plugging in different fit functions to the same overlying object to interact with the model.

A heads up though - I'm not sure what is shown (relating to redefining underlying objects) is something we want to lean into exposing / promoting for people to try custom fit approaches - but I find it cool, and also it could be useful as a development feature to explore how different fit procedures work.

Thoughts & Conclusions

This update opens up our set of objects, and re-organizes the 'do it all together' approach, into a set of distinct objects, that can be combined, that should allow for developing different fit algorithms, and more easily using different fit functions. The argument from modularity is that this is a much more organized layout, even if the increased number of entities may seem (on the face of it) more complicated. In so far as specparam is a framework for spectral parameterization (more than any specific algorithm or function for doing so), I think something like this is probably what we want - but there's definitely room for tweaks and different ideas. As it stands, I think this current change is useful in terms of doing what we want it to do (opening private setting access), and has potentially useful benefits for going forward (opens up more access to defining custom fit procedures).

Reviewing

This is still at a pretty high-level stage of deciding on key strategy - so in terms of reviewing, I'm not really looking for detailed review, but rather strategy discussions - so any conceptual / strategic check-ins are more than welcome!

@TomDonoghue TomDonoghue changed the base branch from main to name July 19, 2023 04:36
@TomDonoghue TomDonoghue added 2.0 Targetted for the specparam 2.0 release. idea / discussion A potential idea to consider / discuss. labels Jul 19, 2023
@ryanhammonds
Copy link
Contributor

This looks really cool/useful! In the last case you gave, could the super class be SpectralModel? Or what is the benefit of using the base class + the type magic when defining a new class/procedure? For example, the bit where type is being used to register the new base to SpectralModel could be replaced with inheriting SpectralModel directly, I think:

class CustomModelNoRobustA(SpectralModel):    
    def _robust_ap_fit(self, *args, **kwargs):
        return self._simple_ap_fit(*args, **kwargs)

cfm = CustomModelNoRobustAP()
cfm.fit(freqs, powers)

@TomDonoghue
Copy link
Member Author

TomDonoghue commented Jul 20, 2023

@ryanhammonds - yeh, the more I think about this, the more I think I over-engineered the last part about redefining the base class lol. What you posted works, and I don't think that redefining base classes as I did adds any real benefit for this kind of changes. Overall, I think simpler approaches to define new objects and overload specific methods are possible, even with version 1.0 (in general, I think). I was having fun digging into redefining base classes and using type - I didn't know either of these things worked before digging in here and got excited.

What I find useful about this change is a) I think it makes the kind of procedure editing we're talking about more obvious in some ways (even if it doesn't necessarily clearly change what is possible), but also I'm thinking about this more prospectively - if we move this direction, SpectralModel now defines an API, BaseModel defines a particular algorithm (call it the 'fooof' algo for convenience now), but defining whole new fitting approaches would I think be much easier in this framework (even if still possible otherwise). For example - I can imagine defining a new BaseModel that fits exponential functions in linear-linear space (to test & compare), whereas that feels like it would be more annoying to try and do by editing the existing model object. There is still some work to be done across these objects to fully support this kind of generalization (passing in new underlying fit-objects), that's part of what I want to further explore along these lines (might work, might not lol).

@ryanhammonds
Copy link
Contributor

Cool! I agree with you that this is useful and in the direction of being more flexible. The distinction between the base and model/api is nice. I think it would make sense to standardize or have requirements for new/custom models. Like a new class should implement a .fit and set an _ap_fit attribute, etc, to ensure some level of compatibility with plotting/reporting tools.

@TomDonoghue
Copy link
Member Author

@ryanhammonds - yeh, fully agreed, on having some kind of standardization / template for new models. I've started to explore in this direction - recent commits look at the idea of a BaseData class that could standardize data input / management, regardless of the fit procedure, and we could consider a FitTemplate class that structures the expected / required aspects of a Fit procedure! I'll keep poking around at this and try some things out

@TomDonoghue TomDonoghue changed the title Exploration: Define a new BaseModel object [WIP] (Exploration) - Define new Base objects Aug 7, 2023
@TomDonoghue TomDonoghue changed the title [WIP] (Exploration) - Define new Base objects [WIP] (Exploration) - Define new object organization (including new Base & Algorithm objects) Aug 7, 2023
@TomDonoghue
Copy link
Member Author

TomDonoghue commented Aug 7, 2023

@ryanhammonds & @voytek - I'm tagging you back in here, since while the theme of this PR is about the same as before, the details of the changes and implementations have changed a lot as I explore. Main details are in the now heavily edited first comment - let me know of any thoughts about this: I'm looking for general reads on whether we like this direction.

Also: sorry it's so long - I've been somewhat over-writing the PR descriptions to try and explore the ideas (and hopefully the text here can / will be adapted into docs & release notes as we choose what to include).

@TomDonoghue TomDonoghue changed the title [WIP] - Define new object organization (Base / Data / Fit) [WIP] - Define new object organization (Data / Results / Base / Algorithm) Apr 10, 2024
@TomDonoghue TomDonoghue changed the title [WIP] - Define new object organization (Data / Results / Base / Algorithm) [ENH] - Define new object organization (Data / Results / Base / Algorithm) Apr 10, 2024
@TomDonoghue TomDonoghue removed the idea / discussion A potential idea to consider / discuss. label Apr 10, 2024
@fooof-tools fooof-tools deleted a comment from codecov-commenter Apr 10, 2024
@TomDonoghue
Copy link
Member Author

TomDonoghue commented Apr 10, 2024

Picking up on this!

Overall, I updated the first comment with a now up to date overview / description of the current changes.

Recent updates:

  • now that Time / Event models are merged into main, I merged that here, so these objects now use the new organization
  • cleaned up the object organization (notably, 'Fit' -> 'Results'), such that I _think _ the new organization is cleaner / clearer

For @ryanhammonds & @voytek - I'm now pretty confident that this is the the overall way to go, as I can see a way (that I'm pretty confident works) to finalize the new organization to support variable fit functions and fit algorithms. I haven't merged this yet, pending some broader user tests that nothing goes weird (given how broad the changes here). In terms of review - let me know anything you notice if you try it out / take a quick look - but since this is going to be further updated (mainly - #298), I think if the overall strategy looks good, and doesn't seem to be breaking, we can move on this, and do an overall deep-dive review for the end results after more things are merged in.

Copy link
Contributor

@ryanhammonds ryanhammonds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomDonoghue I took a quick look and here are some (maybe half-baked) thoughts:

Data objects:

An alternative approach may have been to have a single base data class that accepts an arbitrary nd-array, with the power on the last dim, or an axis arg to specify which dim power is along, defaulting to the last dim, then reshape to 2d, fit, then results back to nd. So like input shape (2, 3, 4, 100) to (24, 100), fit, then results back to (2, 3, 4, 100).

Also, the reshape approach would take a 1d and also reshape to (1, n), then fit, then reshape results to original 1d. This would force everything to 2d before fitting to allow easy parallelization and a single interface - removing the need for both SpectralGroupModel and SpectralModel. SpectralModel could handle any nd input, simplifying things on the user.

However, it looks like there is currently specific/custom processing for various dims that may require a non-general solution that I haven't wrapped my head around yet.

Custom procedures:

A custom model I've written in the past is a robust ap fit, followed by peak fitting, removing the simple ap fit, since I found a way to get robust fit to be accurate enough for a particular use case, and dropping the simple fit sped up computation a bit. Doing this seems difficult still, unless I wanted to overwrite the entire ._fit method? One possibility is:

class CustomAlgorithm(Base):
    def _robust_ap_fit(self, *args, **kwargs):
        # Custom robust ap fit code
        ...

    def _simple_ap_fit(self, *args, **kwargs):
	pass

# In SpectralFitAlgorithm._fit
...

aperiodic_params_ = self._simple_ap_fit(self.freqs, self._spectrum_peak_rm)

if aperiodic_params_ is not None:
    self.aperiodic_params_ = aperiodic_params_

...

Currently self.aperiodic_params_ would get updated with None if I overwrite simple_ap_fit, I think. If doing this is possible with the PR, sorry for not digging deeper! Also, maybe this is too specific of a use-case.

specparam/objs/data.py Show resolved Hide resolved
@fooof-tools fooof-tools deleted a comment from codecov-commenter Apr 30, 2024
@fooof-tools fooof-tools deleted a comment from codecov-commenter May 4, 2024
@fooof-tools fooof-tools deleted a comment from codecov-commenter May 4, 2024
@fooof-tools fooof-tools deleted a comment from codecov bot Jun 8, 2024
@fooof-tools fooof-tools deleted a comment from codecov bot Jun 8, 2024
@fooof-tools fooof-tools deleted a comment from codecov bot Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0 Targetted for the specparam 2.0 release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants