-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bracket in listops task? #6
Comments
Thanks for the find. Yes, we need the brackets. If you switch the tokenizer to tensorflow_text.WhitespaceTokenizer() for now it should do the trick. We will push a fix soon. thanks :) Based on this vocab set:
the max seq length shouldn't be more than 2K. |
@vanzytay Thank you for your clarification. However, I am still confused with the length of sequences with brackets. With the above setting, the length of the sequence is >2K. For instance, with the first sequence in the validation set, the length becomes 5231. Is there something that I am missing? Thank you. Below is the input that I mentioned above. |
( ( ( ( ( ( ( ( [MAX 1 ) ( ( ( ( ( [MIN 3 ) ( ( ( [MED 6 ) 1 ) ] ) ) 4 ) 5 ) ] ) ) 0 ) ( ( ( ( ( ( [SM ( ( ( ( ( ( ( ( ( [MED 1 ) 6 ) 2 ) 1 ) 0 ) 8 ) 2 ) 3 ) ] ) ) 9 ) ( ( ( ( [MED 0 ) 3 ) 7 ) ] ) ) 2 ) ( ( ( ( ( ( ( ( ( ( ( [SM ( ( ( ( [MIN 8 ) 9 ) 1 ) ] ) ) 3 ) 0 ) 2 ) 5 ) ( ( ( [MED 7 ) ( ( ( ( [SM 4 ) ( ( ( ( ( ( ( [MAX 4 ) 1 ) ( ( ( ( [MAX 9 ) 4 ) 3 ) ] ) ) 1 ) 9 ) 2 ) ] ) ) 3 ) ] ) ) ] ) ) ( ( ( ( ( ( ( ( ( [MED 1 ) ( ( ( ( [MIN 0 ) ( ( ( ( ( ( ( ( ( ( ( [MIN 7 ) 3 ) 9 ) 2 ) 6 ) 3 ) 2 ) 9 ) ( ( ( ( ( ( ( ( ( [SM 9 ) 2 ) ( ( ( ( ( ( ( ( ( ( [MIN 4 ) 3 ) 5 ) 2 ) 6 ) 2 ) ( ( ( ( ( ( ( [MIN 7 ) 2 ) 9 ) 3 ) 6 ) 6 ) ] ) ) ( ( ( [MED 3 ) 5 ) ] ) ) 2 ) ] ) ) ( ( ( ( ( ( ( ( [MED 6 ) 6 ) 5 ) 9 ) 2 ) ( ( ( ( ( ( ( [SM 0 ) 2 ) 9 ) 8 ) 0 ) 7 ) ] ) ) 6 ) ] ) ) 1 ) 9 ) 3 ) 2 ) ] ) ) 9 ) ] ) ) 7 ) ] ) ) 0 ) ( ( ( ( ( ( ( ( ( [MED 1 ) 4 ) 2 ) 0 ) 1 ) ( ( ( ( ( ( [MAX ( ( ( ( ( [MED 8 ) 2 ) ( ( ( ( ( ( ( [MIN ( ( ( ( ( ( ( ( ( ( ( [MIN 1 ) 5 ) 6 ) 9 ) 9 ) 2 ) 2 ) 5 ) 4 ) 0 ) ] ) ) 1 ) ( ( ( ( [MED 2 ) 6 ) 4 ) ] ) ) 7 ) 2 ) ( ( ( [SM 3 ) 4 ) ] ) ) ] ) ) 7 ) ] ) ) ( ( ( ( ( [MIN 1 ) 8 ) 3 ) 6 ) ] ) ) 4 ) 6 ) 6 ) ] ) ) 8 ) 1 ) ] ) ) 6 ) ( ( ( ( ( ( ( ( ( ( [SM 1 ) 7 ) 5 ) 5 ) 5 ) 6 ) 9 ) 8 ) 4 ) ] ) ) 1 ) 5 ) ] ) ) 8 ) 1 ) 1 ) ] ) ) ] ) ) ( ( ( ( ( [MAX 0 ) ( ( ( [MAX 7 ) ( ( ( ( ( [MED 9 ) ( ( ( ( [MIN 2 ) ( ( ( [SM 9 ) ( ( ( [MIN ( ( ( ( ( ( [MAX 2 ) 6 ) 8 ) ( ( ( ( ( ( ( ( [MED 6 ) 7 ) 6 ) 7 ) 7 ) 6 ) 3 ) ] ) ) 2 ) ] ) ) 6 ) ] ) ) ] ) ) ( ( ( ( ( [MIN 0 ) 4 ) 2 ) 1 ) ] ) ) ] ) ) 4 ) 1 ) ] ) ) ] ) ) ( ( ( ( ( ( [MIN 8 ) 2 ) 5 ) 6 ) 7 ) ] ) ) 9 ) ] ) ) ( ( ( ( ( ( ( ( ( ( ( [MED 2 ) 8 ) 5 ) 0 ) ( ( ( ( [SM ( ( ( ( ( ( ( ( ( [MIN 3 ) 8 ) 6 ) ( ( ( ( ( ( ( ( ( [SM 7 ) 7 ) ( ( ( ( ( [MAX 7 ) ( ( ( ( [SM 3 ) 5 ) 0 ) ] ) ) 3 ) 9 ) ] ) ) 6 ) ( ( ( ( ( ( [MAX ( ( ( ( [MED ( ( ( ( ( ( ( ( ( ( ( [MED 3 ) ( ( ( ( ( ( ( ( ( ( ( [MED 1 ) 3 ) 1 ) 4 ) 9 ) 2 ) 0 ) 5 ) 3 ) 0 ) ] ) ) 1 ) 5 ) ( ( ( ( ( ( ( ( [SM 3 ) 7 ) 5 ) 2 ) 5 ) 8 ) 3 ) ] ) ) ( ( ( ( ( ( ( ( ( ( ( [MAX 2 ) 2 ) 9 ) 5 ) 7 ) 8 ) 3 ) 3 ) 3 ) 0 ) ] ) ) 7 ) 9 ) 8 ) 5 ) ] ) ) 8 ) 2 ) ] ) ) 8 ) 8 ) 1 ) 9 ) ] ) ) 9 ) 4 ) 5 ) ] ) ) ( ( ( ( ( ( ( ( ( [MAX 0 ) 6 ) 6 ) 7 ) 8 ) 3 ) 5 ) 4 ) ] ) ) 5 ) 9 ) 7 ) ] ) ) 9 ) ( ( ( ( ( ( ( ( ( ( ( [MAX 3 ) ( ( ( ( [MAX ( ( ( ( ( ( ( ( [SM 7 ) 0 ) ( ( ( [MIN ( ( ( ( ( ( [MAX 1 ) ( ( ( ( ( ( ( [SM 2 ) 5 ) 1 ) 4 ) 6 ) 1 ) ] ) ) 6 ) 5 ) 5 ) ] ) ) ( ( ( ( ( ( ( ( ( ( [SM 3 ) 3 ) 6 ) ( ( ( ( ( ( ( ( ( ( ( [MAX 1 ) 9 ) 9 ) 3 ) 2 ) 7 ) 5 ) 0 ) 0 ) 0 ) ] ) ) 8 ) 1 ) 4 ) 9 ) 6 ) ] ) ) ] ) ) 3 ) 6 ) 0 ) ( ( ( ( ( ( ( ( [MED 4 ) 2 ) ( ( ( ( ( ( ( ( [MAX 3 ) 8 ) 4 ) 3 ) 1 ) 3 ) 8 ) ] ) ) 7 ) ( ( ( ( ( ( ( ( ( ( ( [MED 2 ) ( ( ( ( ( [MIN 2 ) 4 ) 9 ) 2 ) ] ) ) 5 ) ( ( ( ( [MED 7 ) 6 ) 7 ) ] ) ) 2 ) ( ( ( [MIN 4 ) 9 ) ] ) ) 9 ) 1 ) ( ( ( [MIN 0 ) 3 ) ] ) ) 8 ) ] ) ) 9 ) ( ( ( ( ( ( ( ( ( ( ( [MAX ( ( ( ( ( ( ( ( ( ( [MIN 1 ) 1 ) 7 ) 3 ) 1 ) 8 ) 4 ) 0 ) 2 ) ] ) ) 2 ) 6 ) 5 ) 3 ) 4 ) 3 ) ( ( ( [MAX 8 ) 9 ) ] ) ) 5 ) 6 ) ] ) ) ] ) ) ] ) ) 0 ) ( ( ( [MAX 6 ) 3 ) ] ) ) ] ) ) ( ( ( ( ( ( ( ( ( ( ( [SM 2 ) ( ( ( ( ( ( [MIN 2 ) 8 ) 7 ) ( ( ( ( ( ( ( [MAX ( ( ( ( ( ( [MIN ( ( ( ( ( [SM 6 ) 8 ) 2 ) 6 ) ] ) ) ( ( ( ( ( [MIN 3 ) 1 ) 1 ) 4 ) ] ) ) 9 ) 9 ) 4 ) ] ) ) 6 ) 0 ) ( ( ( ( ( ( ( ( ( [SM 1 ) 7 ) ( ( ( ( [MAX 3 ) 5 ) 2 ) ] ) ) 5 ) 3 ) 6 ) 8 ) 8 ) ] ) ) ( ( ( ( ( ( [MAX 6 ) 6 ) ( ( ( ( ( ( ( ( [MIN 5 ) 0 ) 5 ) 1 ) 7 ) 2 ) 0 ) ] ) ) 8 ) 5 ) ] ) ) 9 ) ] ) ) 9 ) ] ) ) ( ( ( ( ( [MAX 9 ) ( ( ( ( ( ( [MED 5 ) 6 ) 2 ) ( ( ( ( ( [SM 4 ) 4 ) 1 ) 1 ) ] ) ) ( ( ( ( ( ( ( ( ( [MAX 2 ) ( ( ( ( ( ( ( ( [MAX 0 ) 9 ) 6 ) 9 ) 7 ) 5 ) 7 ) ] ) ) 3 ) 7 ) 9 ) 2 ) 0 ) 3 ) ] ) ) ] ) ) 6 ) ( ( ( [SM 6 ) ( ( ( ( [SM 8 ) 6 ) 7 ) ] ) ) ] ) ) ] ) ) ( ( ( ( ( ( ( ( ( ( [MAX 7 ) 9 ) ( ( ( ( ( ( ( [MED 5 ) 4 ) 9 ) 2 ) 4 ) 4 ) ] ) ) ( ( ( ( ( [SM ( ( ( [MAX 6 ) 8 ) ] ) ) 5 ) 8 ) 3 ) ] ) ) 9 ) 4 ) 7 ) ( ( ( ( ( ( ( ( ( ( [SM 5 ) 2 ) ( ( ( ( ( ( ( ( [MIN 6 ) ( ( ( ( ( ( ( ( [SM 7 ) 4 ) 3 ) 2 ) 5 ) 6 ) 2 ) ] ) ) 2 ) 1 ) 9 ) 4 ) 7 ) ] ) ) 9 ) 2 ) 1 ) 0 ) 9 ) 6 ) ] ) ) 5 ) ] ) ) 9 ) ( ( ( [SM ( ( ( ( ( ( ( [SM 0 ) 2 ) 3 ) 6 ) ( ( ( ( ( ( ( ( ( ( [MED 2 ) 5 ) 1 ) ( ( ( ( ( ( ( ( ( ( ( [MED 0 ) 2 ) 6 ) 6 ) 5 ) 5 ) 3 ) 0 ) 8 ) 4 ) ] ) ) 4 ) 1 ) ( ( ( ( ( ( ( [MED 5 ) 1 ) 3 ) 8 ) 1 ) 3 ) ] ) ) ( ( ( ( ( [MAX 0 ) 3 ) 9 ) 4 ) ] ) ) ( ( ( ( ( ( ( ( ( [SM 7 ) 8 ) 4 ) 4 ) 0 ) 3 ) 8 ) 7 ) ] ) ) ] ) ) 3 ) ] ) ) 2 ) ] ) ) 8 ) 2 ) 6 ) ( ( ( ( ( ( [SM 2 ) ( ( ( ( [MAX 2 ) ( ( ( [MED 5 ) 1 ) ] ) ) 6 ) ] ) ) 3 ) 4 ) 0 ) ] ) ) ] ) ) ( ( ( ( [MIN 1 ) 3 ) 3 ) ] ) ) ( ( ( ( ( ( ( ( ( ( [SM 9 ) ( ( ( ( ( ( ( ( ( ( [SM ( ( ( ( ( ( ( [MED ( ( ( ( ( ( ( ( ( ( [MIN 2 ) ( ( ( ( ( ( ( ( [MIN 7 ) 7 ) 6 ) 2 ) 7 ) 1 ) 0 ) ] ) ) 4 ) ( ( ( ( ( ( ( ( ( ( [SM 8 ) 5 ) 9 ) 1 ) 5 ) 9 ) 2 ) 1 ) 5 ) ] ) ) 6 ) 5 ) 8 ) 4 ) 2 ) ] ) ) 2 ) 9 ) 2 ) ( ( ( ( ( [MIN ( ( ( ( ( ( ( ( ( [MED 3 ) 0 ) 5 ) 3 ) 9 ) 5 ) 2 ) 7 ) ] ) ) 0 ) 6 ) 9 ) ] ) ) 6 ) ] ) ) ( ( ( ( ( [MAX 5 ) 6 ) 9 ) ( ( ( ( ( ( ( ( ( ( ( [MIN 9 ) ( ( ( ( ( ( ( ( ( ( ( [MAX 9 ) 1 ) 8 ) 1 ) 6 ) 5 ) 2 ) 8 ) 4 ) 6 ) ] ) ) ( ( ( [SM 1 ) 2 ) ] ) ) 9 ) 5 ) 5 ) 8 ) 6 ) ( ( ( ( ( ( ( ( ( ( ( [MIN 1 ) 8 ) 1 ) 0 ) 9 ) 0 ) 2 ) 5 ) 5 ) 4 ) ] ) ) ( ( ( ( ( ( ( ( ( [MED 9 ) 3 ) 0 ) 3 ) 0 ) 6 ) 2 ) 4 ) ] ) ) ] ) ) ] ) ) 3 ) ( ( ( [MAX ( ( ( ( ( ( ( ( [MED 2 ) 5 ) ( ( ( ( ( ( ( ( ( [MED 8 ) 1 ) 0 ) 7 ) 0 ) 3 ) 6 ) 6 ) ] ) ) ( ( ( ( ( ( ( ( ( ( ( [MIN 3 ) 6 ) 4 ) 6 ) 7 ) 3 ) 2 ) 1 ) 0 ) 8 ) ] ) ) 4 ) ( ( ( ( ( ( ( ( ( ( ( [SM 5 ) 2 ) 5 ) 3 ) 2 ) 7 ) 9 ) 1 ) 6 ) 2 ) ] ) ) 7 ) ] ) ) 8 ) ] ) ) 7 ) 9 ) 5 ) 7 ) 8 ) ] ) ) 4 ) 2 ) 1 ) 3 ) 0 ) ( ( ( ( ( ( ( ( ( ( ( [SM ( ( ( ( ( ( ( ( ( ( ( [MAX 8 ) 4 ) 0 ) ( ( ( ( ( ( ( [MED 3 ) 1 ) 8 ) 8 ) ( ( ( ( ( ( ( ( [MAX 3 ) 0 ) 0 ) 2 ) 8 ) 8 ) 0 ) ] ) ) ( ( ( ( ( ( [SM 6 ) 4 ) 0 ) 6 ) 3 ) ] ) ) ] ) ) 7 ) ( ( ( ( ( ( ( ( ( ( ( [MED ( ( ( ( [SM 3 ) 2 ) 7 ) ] ) ) 2 ) 2 ) ( ( ( ( ( ( ( [MAX 2 ) 2 ) 5 ) 7 ) 1 ) 5 ) ] ) ) 7 ) 6 ) 8 ) 9 ) 1 ) 4 ) ] ) ) 2 ) 7 ) 5 ) 8 ) ] ) ) 8 ) 5 ) 7 ) ( ( ( ( ( ( ( ( ( ( [MED 4 ) 8 ) 4 ) 9 ) 6 ) 4 ) ( ( ( ( ( [MIN ( ( ( ( ( ( ( [SM 1 ) 9 ) 5 ) 0 ) 7 ) 7 ) ] ) ) 2 ) 4 ) ( ( ( ( ( ( [MED 6 ) 9 ) 5 ) 0 ) 8 ) ] ) ) ] ) ) ( ( ( ( ( ( ( ( ( ( [MED 6 ) 0 ) 3 ) ( ( ( ( ( [SM 2 ) 7 ) 5 ) 5 ) ] ) ) 1 ) 3 ) 0 ) ( ( ( ( ( ( ( [MAX 6 ) 8 ) 7 ) 7 ) 9 ) 8 ) ] ) ) 2 ) ] ) ) 0 ) ] ) ) 3 ) 3 ) 6 ) ( ( ( ( ( ( ( ( ( ( ( [MIN 1 ) 0 ) 9 ) 3 ) 2 ) 2 ) 6 ) 6 ) 5 ) 8 ) ] ) ) ( ( ( ( ( ( ( ( ( [MIN 6 ) 3 ) 5 ) ( ( ( ( [SM 5 ) 6 ) 4 ) ] ) ) 2 ) 0 ) 8 ) ( ( ( ( ( [MAX 0 ) ( ( ( ( [MED 0 ) 5 ) 0 ) ] ) ) 0 ) ( ( ( ( ( [MIN 2 ) 2 ) 1 ) 0 ) ] ) ) ] ) ) ] ) ) ] ) ) 8 ) ] ) ) 7 ) 6 ) 8 ) ( ( ( ( ( ( ( ( ( [MIN 2 ) ( ( ( ( ( ( ( ( ( [MIN ( ( ( ( ( ( ( ( ( [MIN 5 ) ( ( ( ( ( [MIN 4 ) 7 ) 2 ) 2 ) ] ) ) 4 ) ( ( ( ( ( ( ( ( ( [MAX 4 ) 3 ) 4 ) 9 ) 6 ) 4 ) 3 ) 8 ) ] ) ) 2 ) 4 ) 5 ) 3 ) ] ) ) ( ( ( ( ( ( ( ( [MED 8 ) 5 ) 5 ) 4 ) ( ( ( ( ( ( ( ( [MED 8 ) 5 ) 2 ) ( ( ( ( ( ( ( ( ( [MAX 4 ) 5 ) 5 ) 6 ) 5 ) 4 ) 6 ) 8 ) ] ) ) 2 ) 7 ) 5 ) ] ) ) 6 ) 3 ) ] ) ) ( ( ( ( ( ( ( [MIN 0 ) 4 ) 2 ) 1 ) ( ( ( ( [MIN 4 ) 9 ) 2 ) ] ) ) 2 ) ] ) ) 0 ) 9 ) 4 ) 3 ) ( ( ( ( ( ( ( ( ( ( [SM 6 ) 6 ) 4 ) 7 ) 4 ) 2 ) 2 ) 9 ) 7 ) ] ) ) ] ) ) ( ( ( ( ( ( ( ( ( ( ( [MAX ( ( ( ( ( ( ( ( ( ( [MED 7 ) ( ( ( ( ( ( ( ( [SM 8 ) 1 ) ( ( ( ( ( ( ( ( [MED 2 ) 4 ) 7 ) 0 ) 3 ) 1 ) 4 ) ] ) ) ( ( ( ( [MIN 1 ) 9 ) 4 ) ] ) ) 8 ) 0 ) 6 ) ] ) ) ( ( ( ( ( ( ( [MAX ( ( ( ( ( ( [SM 1 ) 4 ) 9 ) 1 ) 1 ) ] ) ) 0 ) 0 ) ( ( ( ( [MAX 7 ) 9 ) 3 ) ] ) ) 3 ) ( ( ( ( ( ( ( ( [SM 7 ) 3 ) 4 ) 8 ) 3 ) 1 ) 4 ) ] ) ) ] ) ) 1 ) 4 ) ( ( ( [MAX 1 ) ( ( ( ( ( ( ( ( ( ( ( [MAX 3 ) 3 ) 0 ) 2 ) 6 ) 4 ) 7 ) 7 ) 0 ) 5 ) ] ) ) ] ) ) 4 ) ( ( ( ( ( ( ( [MIN 9 ) 3 ) 5 ) 1 ) 6 ) 7 ) ] ) ) 3 ) ] ) ) ( ( ( ( ( ( ( ( [MAX ( ( ( ( ( ( ( ( [MAX 4 ) 0 ) 0 ) 4 ) 1 ) 1 ) 9 ) ] ) ) 1 ) 1 ) 7 ) 9 ) 2 ) 1 ) ] ) ) 9 ) 4 ) 1 ) 7 ) 5 ) ( ( ( ( ( ( [MAX 6 ) 4 ) 9 ) 5 ) 1 ) ] ) ) 0 ) ( ( ( ( ( ( ( ( [MAX 4 ) ( ( ( [MED 2 ) ( ( ( ( ( [MIN 7 ) 7 ) 2 ) 2 ) ] ) ) ] ) ) ( ( ( ( [SM 8 ) 8 ) ( ( ( ( ( [MED 5 ) 5 ) 6 ) 4 ) ] ) ) ] ) ) 4 ) 1 ) 1 ) 7 ) ] ) ) ] ) ) ( ( ( [MIN 4 ) 6 ) ] ) ) 0 ) ( ( ( ( ( [SM 1 ) ( ( ( ( ( ( ( ( ( ( [MED 2 ) ( ( ( ( [MED ( ( ( [SM 1 ) 2 ) ] ) ) 1 ) ( ( ( ( ( ( ( ( ( [MIN 2 ) 4 ) 6 ) 5 ) 6 ) 0 ) 9 ) 9 ) ] ) ) ] ) ) 3 ) 5 ) 4 ) 9 ) ( ( ( ( ( [MAX ( ( ( ( ( ( ( ( [MAX 5 ) 3 ) 9 ) 2 ) 9 ) 1 ) 4 ) ] ) ) 2 ) 1 ) 1 ) ] ) ) 5 ) ( ( ( ( ( ( ( [SM 2 ) ( ( ( ( ( ( ( ( ( ( [MIN 9 ) 6 ) 7 ) 5 ) 3 ) 4 ) 9 ) 6 ) 9 ) ] ) ) ( ( ( ( ( [MED 1 ) 3 ) 3 ) 4 ) ] ) ) 0 ) 5 ) ( ( ( ( [MAX 6 ) 6 ) 2 ) ] ) ) ] ) ) ] ) ) 8 ) 8 ) ] ) ) 8 ) 1 ) ] ) ) ( ( ( ( ( ( ( [MIN 4 ) 5 ) 1 ) 7 ) ( ( ( ( ( ( ( ( ( ( ( [MIN 2 ) ( ( ( [SM 3 ) 9 ) ] ) ) 8 ) 7 ) 8 ) ( ( ( ( ( ( ( ( ( ( ( [MAX 6 ) 0 ) 4 ) 8 ) ( ( ( ( ( ( [MIN 1 ) 5 ) 8 ) ( ( ( ( ( ( ( ( ( ( [MAX 5 ) 8 ) 7 ) 6 ) 4 ) 5 ) 4 ) 4 ) 9 ) ] ) ) 0 ) ] ) ) 2 ) 4 ) 0 ) 2 ) ( ( ( ( ( ( ( ( [MAX 8 ) 2 ) 1 ) 4 ) 7 ) 7 ) 6 ) ] ) ) ] ) ) 5 ) ( ( ( ( ( ( ( ( [MIN 6 ) 6 ) 7 ) 8 ) 4 ) 3 ) 7 ) ] ) ) ( ( ( ( ( [MAX 5 ) 1 ) ( ( ( ( ( ( [MED ( ( ( ( [MAX 1 ) 4 ) 8 ) ] ) ) ( ( ( ( ( ( ( ( ( ( [MAX 9 ) 0 ) 6 ) 6 ) 8 ) 0 ) 0 ) 0 ) 1 ) ] ) ) 6 ) 3 ) 4 ) ] ) ) 6 ) ] ) ) 4 ) ] ) ) ( ( ( ( ( ( ( ( ( ( [MAX 5 ) 8 ) 6 ) 0 ) ( ( ( ( ( ( ( ( ( [SM 2 ) 0 ) 0 ) 4 ) 2 ) 0 ) 8 ) 7 ) ] ) ) 6 ) 4 ) ( ( ( ( ( ( [MIN 2 ) 5 ) ( ( ( ( ( ( ( ( ( ( ( [MIN 1 ) 7 ) 9 ) 8 ) 0 ) 8 ) 2 ) 3 ) ( ( ( ( ( ( ( ( [MAX 6 ) 2 ) 6 ) 6 ) 1 ) 0 ) 8 ) ] ) ) ( ( ( ( [SM 2 ) 5 ) 4 ) ] ) ) ] ) ) 7 ) ( ( ( ( ( ( ( ( ( ( [MED 8 ) 1 ) 1 ) ( ( ( ( ( ( ( ( ( [MED 7 ) 8 ) 6 ) 9 ) 7 ) 4 ) 9 ) 3 ) ] ) ) ( ( ( ( ( ( ( ( ( [MAX 7 ) 2 ) 1 ) 7 ) 6 ) 3 ) 5 ) 0 ) ] ) ) 5 ) 4 ) 5 ) 4 ) ] ) ) ] ) ) ( ( ( ( ( ( ( ( ( ( ( [MAX ( ( ( ( ( ( ( ( ( ( ( [MIN 2 ) 7 ) 3 ) 1 ) 2 ) 1 ) 1 ) 6 ) ( ( ( ( ( ( ( [MAX 8 ) 7 ) 1 ) 4 ) 6 ) 7 ) ] ) ) 6 ) ] ) ) 2 ) 7 ) 3 ) ( ( ( ( ( [MED ( ( ( ( ( ( ( ( ( ( [SM 1 ) 5 ) 4 ) 3 ) 8 ) 0 ) 7 ) 9 ) 6 ) ] ) ) 5 ) ( ( ( ( ( ( ( ( ( ( ( [MIN 4 ) 2 ) 2 ) 8 ) 3 ) 5 ) 5 ) 4 ) 6 ) 8 ) ] ) ) 1 ) ] ) ) 8 ) 9 ) 8 ) ( ( ( ( ( ( ( ( ( ( [MED 3 ) 0 ) 7 ) 0 ) ( ( ( [MIN 8 ) 1 ) ] ) ) ( ( ( ( ( [MAX 2 ) 4 ) 0 ) 5 ) ] ) ) 3 ) 7 ) 2 ) ] ) ) 4 ) ] ) ) ] ) ) ] ) ) ] ) ) ] ) ) ( ( ( ( ( ( [SM 8 ) 2 ) 7 ) 8 ) 4 ) ] ) ) 5 ) ( ( ( ( ( ( ( ( ( [MAX 0 ) 2 ) 1 ) ( ( ( ( ( ( ( ( ( ( ( [SM 0 ) 9 ) ( ( ( ( ( [MED 9 ) 9 ) 1 ) 5 ) ] ) ) ( ( ( ( ( ( ( [MIN ( ( ( ( ( ( ( ( ( ( [SM 6 ) 0 ) ( ( ( ( ( ( ( ( ( ( ( [SM ( ( ( [MED 6 ) 8 ) ] ) ) 0 ) 2 ) ( ( ( ( ( ( ( [MIN 6 ) ( ( ( ( ( ( ( [MAX 2 ) 7 ) 6 ) 7 ) 0 ) 9 ) ] ) ) ( ( ( ( [SM 3 ) 5 ) 1 ) ] ) ) 5 ) 0 ) 4 ) ] ) ) 2 ) 4 ) 7 ) 0 ) 0 ) 7 ) ] ) ) 6 ) 2 ) 1 ) 1 ) 6 ) 5 ) ] ) ) ( ( ( ( ( [MIN 0 ) 3 ) 8 ) 2 ) ] ) ) ( ( ( ( ( [MAX 8 ) ( ( ( ( ( ( [SM ( ( ( ( ( ( ( ( ( [MAX ( ( ( ( ( ( ( ( [MAX 4 ) 8 ) 3 ) 2 ) 3 ) 9 ) 8 ) ] ) ) 5 ) 6 ) ( ( ( ( ( ( [SM 6 ) 9 ) 1 ) 7 ) 0 ) ] ) ) ( ( ( ( [MAX 2 ) 5 ) 3 ) ] ) ) 8 ) 4 ) 8 ) ] ) ) 3 ) 9 ) 6 ) 1 ) ] ) ) ( ( ( ( ( [SM ( ( ( ( ( ( [MIN ( ( ( ( ( ( [MED 6 ) 4 ) 0 ) 5 ) 9 ) ] ) ) 3 ) 3 ) 7 ) 5 ) ] ) ) 3 ) 9 ) 2 ) ] ) ) 5 ) ] ) ) 7 ) 4 ) 1 ) ] ) ) 3 ) ( ( ( ( ( ( ( ( ( ( [SM 6 ) 2 ) 8 ) 9 ) 3 ) 8 ) 8 ) 5 ) 2 ) ] ) ) 3 ) ( ( ( ( ( ( ( [SM 9 ) ( ( ( [MED 0 ) 5 ) ] ) ) 4 ) ( ( ( ( ( ( ( ( [MAX ( ( ( [MED ( ( ( ( ( ( ( ( ( ( [MAX 2 ) 2 ) 1 ) ( ( ( ( [SM 7 ) 7 ) 9 ) ] ) ) 8 ) 6 ) 2 ) 7 ) 0 ) ] ) ) 5 ) ] ) ) 2 ) 3 ) 3 ) 7 ) 8 ) 2 ) ] ) ) 8 ) 5 ) ] ) ) 2 ) ( ( ( ( ( ( ( [MIN ( ( ( [MIN 5 ) 0 ) ] ) ) 7 ) 8 ) 5 ) 8 ) 8 ) ] ) ) ] ) ) 8 ) 1 ) 9 ) 4 ) ] ) ) 2 ) 4 ) ] ) ) 0 ) ] ) |
Hi @vanzytay, I also have a question following your earlier comment. The tokenizer import pandas as pd
train_file = pd.read_csv('lra_release_listops-1000_basic_train.tsv', sep='\t')
max_len = 0
for _, l in train_file['Source'].iteritems():
t = l.strip().split()
if len(t) > max_len:
max_len = len(t) The tokenization scheme based on
which is in agreement to what you reported earlier (the encoding is different but the tokens are essentially the same). So I am not sure how a maximum sequence length less than 2K can be achieved when the parentheses of type '(' and ')' are included. Nonetheless, I am still not sure why the parenthesis '(', ')' are needed? The nesting is already encoded by the bracketed parentheses '[', ']'. Many thanks in advance. |
Hi, @adamsolomou, I have one question regarding your opinion. Have you trained the base transformer model without '(' and ')' for tokenization? I wonder the accuracy is similar to the value reported in this repository in such case! Thank you in advance. |
Hi @sihyun-yu, no I have not trained a model yet. I would like to clarify the ambiguity regarding the maximum sequence length first. |
@adamsolomou Thank you for your response. |
Hi! The (, ) are not necessary but we need to close the brackets for [MAX, ]. I've removed (, ) from the tokens and they are under 2K (could you double check this). The original reason for (, ) is only needed for recursive models, which none of our transformer models require. You may simply ignore it / filter ( and ) away. Thanks! |
Hello, thank you for sharing a great benchmark!
I'm focusing on 'listops' benchmark, with the provided codes and hyperparameters.
The paper says the maximum length of the sequence in this task is 2K, but it seems the code excluded all of the brackets in the sequence.
With the consideration of brackets ( '(', ')', '[', ']'), the maximum length becomes 6K which becomes larger sequence compared to the mentioned length in the paper.
Don't we have to consider such brackets as the input in this task?
Thank you.
The text was updated successfully, but these errors were encountered: