-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Number of test examples #1
Comments
thanks.
we can find the two missing sequences for you.
I don't think i understand your comment about "aglignment of the biological
data and sequences provided will be skewed". Can you clarify?
…On Wed, Oct 31, 2018 at 5:45 PM svgsponer ***@***.***> wrote:
Hi,
Thanks a lot for the great work. I would like to reuse the data you
provide to do a comparison of various techniques on the same task. While
preparing the dataset I found little inconsistency and it would be great if
you could shorlty clarify it for me.
In your paper, you speak of 2001 test examples which corresponds to the
number of examples in *test_src_bio* but *test_src* and *test_tgt* both
only contain 1999 examples. It seems there are two negative examples
missing as there are only 999 avialable. Not a big deal but depending on
which examples are missing the aglignment of the biological data and
sequences provided will be skewed.
Thanks a lot for you clarification.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AHV3fQX2Gk54HIALPkxCYVYipD0jSQXTks5uqhnfgaJpZM4YFYs4>
.
--
conversation enriches understanding, but solitude is the school of genius.
|
Great, thanks a lot for the fast reaction! With the alignment of the biological data, I mean that when a missing sequence is somewhere in the middle of the test_src file all following sequences will be off by one from the corresponding line in test_src_bio. Consequently, when I combine the two files the wrong SCRATCH features are assigned to a sequence. Just to make sure the last column in test_src_bio corresponds to the target variable? |
right. I will try to find it. Its been a year. Maybe its just the first 1999 sequences from src_bio. Why don't you try to run it. The code won't throw an error, because it is just taking the first 1999 sequences. @raghvendra5688 pinging Raghavendra if he remembers. |
Hi, @svgsponer What all methods do you plan to run as we are writing a continuation paper comparing latest deep learning methods on the same dataset? |
Hi, Great thanks a lot! @raghvendra5688 I currently work on various methods that learn linear models in the unlimited length k-mer feature space based on work done for https://github.com/svgsponer/SqLoss. |
We are planning to use GANs and VAE for the same problem. I will update you about results when we have a draft ready. |
Hi,
Thanks a lot for the great work. I would like to reuse the data you provide to do a comparison of various techniques on the same task. While preparing the dataset I found little inconsistency and it would be great if you could shorlty clarify it for me.
In your paper, you speak of 2001 test examples which corresponds to the number of examples in test_src_bio but test_src and test_tgt both only contain 1999 examples. It seems there are two negative examples missing as there are only 999 avialable. Not a big deal but depending on which examples are missing the aglignment of the biological data and sequences provided will be skewed.
Thanks a lot for you clarification.
The text was updated successfully, but these errors were encountered: