You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While trying to visualize the predictions on this example, I am having difficulties understanding the data_gen() function and y_sample_to_frame().
From what I read, a data generator on image processing has the purpose of providing smaller chunks of the data. Is there any recommended way to predict an entire song?
1.1. On a test data, of one song.
n_hop = 256
nsp_y = 5637632
I end up receiving 20 chunks of len 22022, which is not equivalent to the entire song.
Shouldn't I need 256 (5637632// 256) of those?
1.2 Using predict_generator returns only 22022 predictions... which leads me back to question 1
On y_sample_to_frame().
n_hop = N_HOP
nsp_y = len(y)
ret = np.array([np.round(np.mean(y[max(0, (i - 1) * n_hop): min(nsp_y, (i + 1) * n_hop)])) \
for i in range(nsp_y // n_hop)], dtype=np.int)
Could you provide some comments on line 3?
In fact, I am trying to modify your example and see how it performs on SALAMI dataset. But it seems that the understanding of this two functions is fundamental. I have found relatively less information about the pre-processing of data for music structure analysis.
Sorry if my questions are not very clearly formulated, any extra information or source would be helpful.
Thanks in advance
The text was updated successfully, but these errors were encountered:
There are many heuristics (e.g. averaging the prediction, major voting). It's hard to pick one though.
1.1 Where is it happening exactly? The original code doesn't have lines for testing other tracks, so probably it's about how you'd implement?
I think (!) it's to generate labels of which the rate is aligned to the prediction rate.
While trying to visualize the predictions on this example, I am having difficulties understanding the data_gen() function and y_sample_to_frame().
1.1. On a test data, of one song.
n_hop = 256
nsp_y = 5637632
I end up receiving 20 chunks of len 22022, which is not equivalent to the entire song.
Shouldn't I need 256 (5637632// 256) of those?
1.2 Using predict_generator returns only 22022 predictions... which leads me back to question 1
Could you provide some comments on line 3?
In fact, I am trying to modify your example and see how it performs on SALAMI dataset. But it seems that the understanding of this two functions is fundamental. I have found relatively less information about the pre-processing of data for music structure analysis.
Sorry if my questions are not very clearly formulated, any extra information or source would be helpful.
Thanks in advance
The text was updated successfully, but these errors were encountered: