You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! First, thank you for this well-designed benchmark!
However, I have a question about the tensor size before and after the CNN. The input of CNN is (Cin x T x 1024) (I'm using the channel-first notion for convenience) and there are three max-pooling operations with the kernel size of 8, 8, 4 along the last dimension. I expect this will lead the output of CNN to be (Cout x T x 4), which is inconsistent with (Cout x T x 2) as illustrated in your image of the network architecture. Please correct me if I miss something.
Besides, I'm wondering why you choose to use only a one-layer linear projection without any non-linear activation for the DOA prediction? Is it related to the performance according to your experiments? Thanks!
The text was updated successfully, but these errors were encountered:
Hi @zjysteven thanks for pointing the error. You are right, the CNN output dimension should be (Cout x T x 4). I will update the image soon.
I remember trying ReLU and tanh activations in the penultimate layer of the DOA output, but I dont think I got good results. Similarly, more number of fully-connected layers didn't help either. Both these studies were done for Cartesian DOA output as discussed in the original SELDnet paper, and I used the same model here with only one change, i.e., Spherical DOA coordinates output. So I am not sure if the number of fully-connected layers and different activations would help for Spherical coordinates.
@sharathadavanne Thanks for your quick response! That makes sense since while I am reproducing the baseline, I used more number of fully-connected layers together with ReLU activation, but I couldn't achieve similar DOA results to yours. Thought I might suffer from over-fitting so I asked the above question. Thanks again for your confirmation!
Hi! First, thank you for this well-designed benchmark!
However, I have a question about the tensor size before and after the CNN. The input of CNN is
(Cin x T x 1024)
(I'm using the channel-first notion for convenience) and there are three max-pooling operations with the kernel size of 8, 8, 4 along the last dimension. I expect this will lead the output of CNN to be(Cout x T x 4)
, which is inconsistent with(Cout x T x 2)
as illustrated in your image of the network architecture. Please correct me if I miss something.Besides, I'm wondering why you choose to use only a one-layer linear projection without any non-linear activation for the DOA prediction? Is it related to the performance according to your experiments? Thanks!
The text was updated successfully, but these errors were encountered: