Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: bubble entropy (description is WIP) #384

Closed
kahaaga opened this issue Jan 15, 2024 · 4 comments
Closed

Feature: bubble entropy (description is WIP) #384

kahaaga opened this issue Jan 15, 2024 · 4 comments

Comments

@kahaaga
Copy link
Member

kahaaga commented Jan 15, 2024

The "bubble entropy" is a method that has seen quite some usage. Framed in terms of our API, the authors do two things: 1) define a new outcome space, 2) define a ComplexityEstimator based on that outcome space.

The method works by 1) embedding the input time series with dimension m, 2) for every state vector x_i, count the number of steps n_i necessary to sort a state vector in ascending order using the naive bubble sort algorithm, 3) estimate a histogram from the distribution of n_is, 4) plug it into the Shannon entropy formula.

This is repeat for dimension m+1, and the bubble entropy measure is a scaled difference between the entropies computed for dimensions m+1 and m. Therefore, it falls in the category of "entropy-like" measures, since it does not directly quantify an entropy. However, since it is compatible with our OutcomeSpace API, we should implement it as such, and define a ComplexityEstimator based on that.

A BubbleSortCount{m} <: OutcomeSpace type can be defined, where m is the embedding dimension. Since, there's always a minimum 0 and a maximum n(n−1)/2 number of swaps require to sort a vector of length n, the outcome space is finite. We can implement encode, but not decode (it isn't well-defined ... I think).

  • encode(::BubbleSortCountEncoder, x::AbstractVector) converts x into an integer in the range 0:n(n−1)/2 (the number of swaps required to sort x).
  • decode(::BubbleSortCountEncoder, i::Integer) does... what?

HOWEVER: in the public docs, we nowhere state that decode must be implemented. So I'm kind of in favor of just implementing it as an encoding too, but just throwing a warning for decode. The method will still be useful, and can not only be used with the API here, but also iimmediately be used to define new multivariate measures in CausalityTools.jl if we implement codify(::BubbleSortCount).

Algorithm

Skjermbilde 2024-01-15 kl  11 42 26
@Datseris
Copy link
Member

encode(::BubbleSortCountEncoder, x::AbstractVector) converts x into an integer in the range 0:n(n−1)/2 (the number of swaps required to sort x).

I don't see how this is possible. encode should give the same encoding irrespectively of the order of the vectors in the state space set (embedded set). That's also why the encoding doesn't depend on the input data, but only one 1 element of the input data. For sorting this not the case. If the original dataset is already sorted, encode(e, x) would always return 0 irrespectively of x.

I think for this feature it makes sense to have the outcome space, but not the encoding. I find this more similar to the spectral entropy and the PowerSpectrum outcome space.

@kahaaga
Copy link
Member Author

kahaaga commented Jan 16, 2024

encode(::BubbleSortCountEncoder, x::AbstractVector) converts x into an integer in the range 0:n(n−1)/2 (the number of swaps required to sort x).
don't see how this is possible. encode should give the same encoding irrespectively of the order of the vectors in the state space set (embedded set).

encode works strictly on single state vectors, not the entire embedding. Each vector has a certain amplitude distribution, which maps directly onto an integer in the range 0:n(n−1)/2, because 0 is the minimum number of swaps for the bubble sort algorithm to render the state vector sorted, and 0:n(n−1)/2 is the maximum number of swaps.

Many state vectors in the same embedding can map to the same integer, because the number of sorting operations required to sort them is the same, even though the vectors are different. The order of the embedding does matter here, because the input to encode is a state vector, not an embedding.

I've implemented my suggestion in #390. This should be clear from the source code there

@Datseris
Copy link
Member

Oooooooooooioh now I understand what you mean by "vector to be sorted". I thought you meant: sort the vectors wtihin the embedded set. You mean: sort the elements of each statespace point, i.e., the same as getting the sortperm for the patterns.

OKay, all good, I'll look at the PR now.

@kahaaga
Copy link
Member Author

kahaaga commented Jan 16, 2024

Closed by #390

@kahaaga kahaaga closed this as completed Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants