You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it safe to assume that BERTSUM will perform without issues on input text tokenized using a tokenizer other than Core-NLP as long as we implement sentence splitting?
The text was updated successfully, but these errors were encountered:
Hi, if the sentence splitting is correct, it should work fine. And the tokenizer is not that important, since we always use BERT's subtokenizer after the tokenization.
I was wondering if you could provide a sample of what the tokenized data should look like. I am not able to install the Stanford Tokenizer from Java in Google Colab, so I am not not able to check the output files , and format the output using an alternative tokenizer.
Is it safe to assume that BERTSUM will perform without issues on input text tokenized using a tokenizer other than Core-NLP as long as we implement sentence splitting?
The text was updated successfully, but these errors were encountered: