Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The API is not great #24

Closed
KronosTheLate opened this issue May 11, 2023 · 14 comments
Closed

The API is not great #24

KronosTheLate opened this issue May 11, 2023 · 14 comments
Labels
enhancement New feature or request

Comments

@KronosTheLate
Copy link
Contributor

KronosTheLate commented May 11, 2023

EDIT: See later comments, as the proposed API has changed significantly during this thread.

I find the current implementation of

  1. finding peaks
  2. redefining pks with the return values from peakproms and peakwidths
    to be rather weird and un-ergonomic. Something like what is implemented in https://github.com/tungli/Findpeaks.jl seems way more inntuitive for me, where there is only one function, and the peaks can be filtered by prominence and width directly by keyword arguments.

If a redo is on the table, I just want to mention an idea I have had:
With the package called "Peaks", I was expecting "findpeaks" to be the main function. It could return a FoundPeaks object (Can not be same as package, perhaps other names for the type is better) that contains the peaks, the indices, the prominence, and the widths. It would also be nice to be able to provide the x-values, and get the corresponding x-values for the peaks and widths in terms of x-values, rather than having to work with indices always. Ofc the indices should also be part of the returned object.

Then, the API could be something like

xs = 0:0.1:10
ys = rand(length(xs))
pks = findpeaks(ys)  # Only have indices as x-units
pks = findpeaks(xs, ys)  # Return x-coordinate of peaks, and width in units of the x-coordinate

# Access fields, reads almost like english words.
# Internal implementation could be pks.fieldname, allowing 
# internal changes to be non-breaking in the future to remain flexible

index(pks)
location(pks)  # for getting x-vals
width(pks)
width(pks, ind=false)  # kwarg to select indices or x-vals as unit
prominence(pks)
seperation(pks)  # Or just exemplify with  `diff(index(pks))` or `diff(location(pks)`
@halleysfifthinc
Copy link
Owner

I acknowledge there are some minor inconveniences due to the API design, but I don't think the alternatives I'm aware of (e.g. a struct) would be better overall.

  • Single function with filtering by kwargs VS separate filtering functions
    • tl;dr: Separate filtering functions allow controlling the order (wrt. filtering by prominence, width, height) and behavior (strict or not) of operations, which isn't possible if the filtering is done in the main function (ie in a fixed order).
    • In my experience, there are often peaks I wish to filter/remove that have some similar characteristics (prominence, width, or height) to other peaks I wish to keep. With separate filtering functions, I can choose a filtering order/behavior that removes unwanted peaks using dissimilar characteristics before filtering by the similar characteristics (e.g. and can choose a more relaxed threshold for the similar characteristics, now that there are no bad apples that would slip through). In contrast, a combined filtering function would potentially apply the filter of similar characteristics first, and would apply the same strictness setting to all filters.
    • I'm open to a PR that adds a function with all the filtering functionality built-in, but it would need to use an existing filter order (ie following MATLAB and/or scipy, hopefully they agree).
  • Returning peaks as a struct
    • Unnecessarily complex imo, and would require a lot of care to avoid introducing a lot of extra allocations (some would be unavoidable). Working with indices directly is always going to be the simplest and most broadly composable.

redefining pks with the return values from peakproms and peakwidths to be rather weird and un-ergonomic

Providing mutating ! functions and copying, non-mutating functions, as I do with peakproms, peakwidths, etc, is a common Julian style. Having both functions offers maximum flexibility/composability.

It would also be nice to be able to provide the x-values, and get the corresponding x-values for the peaks and widths in terms of x-values, rather than having to work with indices always.

I don't understand what you mean here.

@KronosTheLate
Copy link
Contributor Author

I don't understand what you mean here.

On this point, I was hoping to be able to provide a vector of x-values, wher peak of index i would have an x-coordinate of xvals[i], and automatically get peaks width and location in terms of x-coordinated. I have however changed my mind, as I take your point that this is less efficient, and complicated the API significantly. Perhaps a design filosophy note could be added, stating that
"This package only operates with the vector containing the signal, and it's indices. Not the corresponding values of a vector containing the independent variable. This simplifies the API, and increases efficiency. If the corresponding independent variables of peaks and their widths is needed, this is best done manually outside the functions provided in this package."

Or perhaps a much shorter of a similar message. I realize that this message is heavily taylored to what I needed to hear, and that it may not be that relevant for other users.

@KronosTheLate
Copy link
Contributor Author

tl;dr: Separate filtering functions allow controlling the order (wrt. filtering by prominence, width, height) and behavior (strict or not) of operations, which isn't possible if the filtering is done in the main function (ie in a fixed order).

This is an excellent point, and one that I have not at all thought of. My current best idea of an API is a single function findpeaks that would have somewhat complicated internals, but that would allow all that we want/need in terms of functionality and API:
It would take a keyword argument filters, a named tuple that specifies filtering options (minprom, maxprom, window (w), minwidth, maxwidth, minheight, maxheight). The order of the filters would determine the order of the filtering to allow full control. I then imagine relheight and strict to be seperate keyword arguments.

The return value could be a named tuple, containing (peakkinds, widths, proms, heights, widths, leftedges, rightedges). If the user does not want to spent computation resources calculating any of the quantities, specify a related filer as NaN, and the return will be be NaN for e.g. the widths, leftedges and rightedges. This would maintain the return type Vector{<:Number} in any case, maintaining type stability.

This requires only knowledge of one function, and much less code to perform filtering. This is a very rough idea, and I am very curious if you feel like some solution in this direction is at all possible, and if you see the value add I do from not having separate functions, and from not having to pass around the relevant vectors outside the functions.

@KronosTheLate
Copy link
Contributor Author

KronosTheLate commented May 14, 2023

I am not a fan of my earlier suggestions, and I really see the value of having ultimate controll over what is calculated and filtered by, and in what order. I believe we can achieve everything we both want by the following proposal:

The proposal

  • The starting point is a function findpeaks, a name which is more consistent with the package name Peaks.jl than the current findmaxima.
  • The return of findpeaks is a named tuple (let's say we bind it to pks for later reference), with fields inds and data.
  • There will be a set of functions that calculate and filter by features, such as peakproms!, peakheights!, peakwidths!, peakseps!.
  • Calling e.g. peakproms! on pks the then adds a field proms by Base.merge, and if min and/or max keywords are set, the contents of the named tuple will be filtered accordingly.
  • The min and max keywords will be consistent for the feature calculating/filtering functions.

This all means that the named tuple will in essence just bundle up the contents to facilitate passing the right things into the functions that filer by and calculate features. This should add negligible overhead and require no new types, while simplifying the API significantly and keep ultimate control over order of filtering. The filtering/calculating functions can also still have strict set true or false at each step for ultimate controll, and the fields and functions are consistent: peakproms adds the field proms, peakwidths adds widths (along with leftedge and rightedge), etc. The filtering/calculating functions all follow the same prefix in their name, and almost the same methods: Taking the named tuple pks as the first argument, and filtering by min and max, with only minor differences in potential additional keywords.

To take it one step further, we could make it so that if a named tuple is not passed to the filtering/calculating functions as the first argument, a function is returned that performs the action of adding the feature and filtering. This would allow easy chaining of operations:

pks = findpeaks(signal) |> peakproms!(min=3) |> peakwidths!(max=2)

That is almost as elegant as a single function with keyword arguments, but without giving up any control, performance, and without redundant calculations - all the strengths of the current API, as I see it.

A key benefit of this approach is also that you do not need to consult the docs every time to check the order of the e.g. 4 returned vectors of peakwidths, instead relying on named fields of a named tuple. There is no overhead as far as I know, and named tuples can still be indexed and destructured, almost making it a superset of the old API.

How do you feel about this approach?

Note - A function sortpeaks! would be nice to have. A rough draft is something like

"""
    sortpeaks!(pks, property)
    sortpeaks!(pks, property; inds, rev)

Sort the peaks `pks` by the property `property`. 
To discard some of the elements of the sorted peaks, supply the keyword argument `inds` to 
specify which indices in the sorted peaks you want to keep. For more detail, see `partialsortperm`, which is used internally.

To sort in reverse (descending order), set the keyword argument `rev` to `true`.
"""
function sortpeaks!(pks::NamedTuple, property::Symbol; inds=eachindex(first(pks)), rev::Bool=false)
    # check if pks has field `property`
    mask = partialsortperm(pks.property, 1:n; rev)
    for i in eachindex(pks)
        pks[i] = pks[i][mask][end-n+1 : end]
    end
    return pks
end

This function would allow to only keep the 5 widest peaks, or the 3 highest peaks, which is non-trivial without it due to having multiple vectors in pks that should be synchronized.

@halleysfifthinc
Copy link
Owner

That sounds very functional, and would also solve my only minor gripe about the current API, which is that the order of the output arguments and input arguments prevents direct splatting peakproms into peakwidths, e.g.:

pks, proms = peakproms(argmaxima(x), y))
peakwidths(pks, y, proms)

making it a superset of the old API

👌 Nice.

I don't have time to get to this in the near future, but if you put together a PR, I will certainly review it.

@KronosTheLate
Copy link
Contributor Author

Nice. I will see if I find the time to put together a proposal soon. I will initially make the current functionality internal by prefixing _'s and not exporting, and building the new API on top of that as a first approximation.

@halleysfifthinc halleysfifthinc added the enhancement New feature or request label May 15, 2023
@KronosTheLate
Copy link
Contributor Author

I have been unable to squeeze in the time to make a proposal for the API I am imagining. In a few days, I am leaving my computer behind for the summer (until about 2 weeks into september). It is crazy how a few hours of voluntary coding-work translates to weeks and months of calendar time...

I expect to give this a go, then, if you do not beat me to it. But I just wanted to tell you, so that you know that I will not be working on this for a long time.

@KronosTheLate
Copy link
Contributor Author

I have realized that adding fields to a named tuple is not possible, so mutating versions of the functions could not be made. Instead, the user will have to rebind the original variable name to the output each time a feature is added.

Also, adding fields changes the type. It is possble this makes type stability impossible, but I am not sure.

Dictionaries could solve both problems, at the cost of adding a dependance and giving up the amazing field-access syntax pks.proms, replacing it with the rather tedious `pks["proms"]. I am wondering if using a Dictionary (the benefit over Dict is that order is preserved) while fields are being added, and finally advising users to create a NamedTuple from it.

My current though is that it is better to live with unmutability and potential type instability, for the field access syntax and more simplicity.

@halleysfifthinc
Copy link
Owner

halleysfifthinc commented Aug 8, 2023

Mutating a tuple isn't possible, but same as with structs, mutable fields in a immutable struct/object can be mutated; I think that is what would be needed here? I don't believe that using a vector in a tuple copies the vector, so it should be possible to accept a NamedTuple (containing index, values, etc vectors) as an input argument, mutate those vectors, and then return a new NamedTuple (eg containing old and new data, such as peak proms) without making any copies of the original/mutated vectors?

I'm not sure type stability is a concern, since that only applies at the function return type level, and if switching to a NamedTuple design, the returned value/type would be consistent for a given function/input eltype.

@KronosTheLate
Copy link
Contributor Author

Yes, the vectors can be mutated. I was thinking more of adding fields to the named tuple when I talked of mutating.

I think you may be right - a certain inout type will have a known output type. The output type is different depending on if the field existed previously or not, but as that is reflected in the input type, it is likely not an issue. NamedTuples it is.

I will still be away from my computer for a month, but I do like having the exact plan ready ^_^

@KronosTheLate
Copy link
Contributor Author

KronosTheLate commented Sep 14, 2023

I have a draft up and running at https://github.com/KronosTheLate/Peaks.jl/tree/api_rework !

It is still missing a lot of detail, which I tried to collect into a "ToDo" section in the README. Most notably, I have just hidden the old functions by prefixing a "_", and called them internally in the new functions. I do however feel like the user-facing functions work as I imagine, and feel ready to discuss the proposed API with you (item 1 in the ToDo). If/when we get the API right, we move on to the following bullet points, and after that I believe the API will be reworked into something better!

Some example code to get started:

julia> using Peaks   # Using the api_rework branch

julia> x = Float64[1, 2, 3, 5, 5, 0, 2, 0, 8, 9, 8, 9, 0];  # If `x` is a vector of eltype int, I get an error. Should be looked into

julia> function test1(x)
           pks = findpeaks(x)
           pks = peakproms!(pks, min=0.5)
           pks = peakwidths!(pks)
           pks = peakheights!(pks)
           return pks
       end
test1 (generic function with 1 method)

julia> function test2(x)
           pks = findpeaks(x) |> peakproms!(min=0.5) |> peakwidths!() |> peakheights!()
           return pks
       end
test2 (generic function with 1 method)

julia> test1(x)
(data = [1.0, 2.0, 3.0, 5.0, 5.0, 0.0, 2.0, 0.0, 8.0, 9.0, 8.0, 9.0, 0.0], inds = [4, 7, 10, 12], proms = [4.0, 2.0, 1.0, 1.0], widths = [2.4000000000000004, 1.0, 1.0, 0.5555555555555554], edges = [(3.0, 5.4), (6.5, 7.5), (9.5, 10.5), (11.5, 12.055555555555555)], heights = [5.0, 2.0, 9.0, 9.0])

julia> values(test1(x)) .== values(test2(x))
(true, true, true, true, true, true)

@halleysfifthinc
Copy link
Owner

halleysfifthinc commented Sep 15, 2023

That example looks promising! Please open a draft PR so that I can review/add comments & suggestions.

Ideally, I'd prefer to have both interfaces (the old and the new) available if possible (for backwards compatibility, etc).

@KronosTheLate
Copy link
Contributor Author

Please open a draft PR

I am a little busy these days with school, and https://juliapackagecomparisons.github.io/. I will get around to it at some point ^_^

@halleysfifthinc
Copy link
Owner

No rush! I appreciate your interest and contributions.

@KronosTheLate KronosTheLate mentioned this issue Sep 22, 2023
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants