Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token indices sequence length is longer than the specified maximum sequence length for this model (3000 > 512). Running this sequence through the model will result in indexing errors #18

Closed
BinchaoPeng opened this issue Apr 8, 2021 · 3 comments

Comments

@BinchaoPeng
Copy link

Token indices sequence length is longer than the specified maximum sequence length for this model (3000 > 512). Running this sequence through the model will result in indexing errors

Traceback (most recent call last):
File "", line 1, in
File "F:\PyCharm 2020.2.1\plugins\python\helpers\pydev_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "F:\PyCharm 2020.2.1\plugins\python\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "E:/Documents/PycharmProjects/bert/getBertWordvec.py", line 7, in
outputs = model(input_ids)
File "F:\Anaconda3\envs\dnabert\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "F:\Anaconda3\envs\dnabert\lib\site-packages\pytorch_transformers\modeling_bert.py", line 707, in forward
embedding_output = self.embeddings(input_ids, position_ids=position_ids, token_type_ids=token_type_ids)
File "F:\Anaconda3\envs\dnabert\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "F:\Anaconda3\envs\dnabert\lib\site-packages\pytorch_transformers\modeling_bert.py", line 252, in forward

hi, my input data length is 3000, so the error has happened. And could I fix it through changing your code such as changge Token indices sequence length?

@Zhihan1996
Copy link
Collaborator

Hi,

To process long sequences, please use --model_type dnalong, and set the max sequence length as a multiple of 512 (e.g., 3072). Then the model should work well.

@BinchaoPeng
Copy link
Author

Ok,I will try,Thanks!

@jerryji1993
Copy link
Owner

Closed #18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants