New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
My output is different from expected_outputs #15
Comments
Hi, thanks for reporting this issue. The code that's implemented here is not exactly the same as the one that obtained that performance. This is because we wanted to release an implementation that's compatible with One key difference that I already identified and might be causing the problem you report is #13. This means that the aggregation strategy for overlapping windows between buffers (see Figure 4 in the paper) needs to be a Hamming-weighted average instead of a normal average. I don't have a lot of free time to implement this right now and the code that I have is not compatible with RxPY, but this is definitely something that will be fixed in the future. If you happen to have the time, I would gladly merge a PR with this fix. |
I modified it according to #13 the content in, but the effect was improved very little. I'll try to do this fix gladly.But I don't understand what the problem is. |
The problem is in this line: intersection = np.stack([
buffer.crop(required, fixed=required.duration)
for buffer in buffers
]) Off the top of my head, it would look something like this: hamming_windows = ... # They should be temporally aligned to the buffers
# Multiply by aligned window
if strategy == "hamming":
buffers = [h * b for h, b in zip(hamming_windows, buffers)]
# Stack aligned output windows
intersection = np.stack([
buffer.crop(required, fixed=required.duration)
for buffer in buffers
])
# Divide by the sum of the weights
if strategy == "hamming":
counts = np.hstack([
h.crop(required, fixed=required.duration)
for h in hamming_windows
]).sum(axis=1)
aggregation = np.sum(intersection, axis=0) / counts It's not a problem with RxPY at all, what I meant is that the original code is not structured the way it should to work alongside RxPY seamlessly. |
I'm also planning to move this aggregation feature to the functional module (see #12) so it can be used independently from RxPY |
Hi @sablea, I've been working on issue #13 (see here) and I realized that the aggregation strategy shouldn't actually matter for However, I also realized that the last chunks may not be correctly handled in the Could you please confirm that the rest of your RTTM output is identical to the expected output (i.e. from 0s-149s)? |
I'm glad to see your update today!I think the new code can solve this issue well, although my test program is still running. They are not exactly the same, and there is always a millisecond difference in each line. if you want check it,click this,I share it through Google Drive. Nice Work! |
Thanks for the quick answer! The few-millisecond difference might be due to the bug I fixed in #16. |
Ok I just pushed a fix for the output truncation at the end, it turns out that The fix should work for For now you can try to run it from the branch Please let me know if you keep seeing discrepancies, I haven't been able to run tests on VoxConverse files. |
I've recently been working on #35 and this allowed me to quickly benchmark the performance of this implementation on some of the datasets. In any case this will make it easy to benchmark other "flavors" of the pipeline as well as providing an easy-to-run baseline. I'm planning to release it as part of version 0.3. |
I try to use the following command test the model:
python src/main.py path/to/voxconverse_test_wav/aepyx.wav --latency=0.5 --tau=0.576 --rho=0.915 --delta=0.648 --output results/vox_test_0.5s
But I compared my model output with 'expected_outputs/online/0.5s/VoxConverse.rttm', and found that my output ended early
for example:
The last line of my output is
SPEAKER aepyx 1 149.953 1.989 <NA> <NA> speaker0 <NA> <NA>
But regarding the last line of the 'expected_outputs/online/0.5s/VoxConverse.rttm' of the aepyx.wav file is
SPEAKER aepyx 1 168.063 0.507 <NA> <NA> F <NA> <NA>
They ended at different times.
I don’t know if the command I entered is wrong, or if it is due to other reasons
The text was updated successfully, but these errors were encountered: