Do we expect BertSum to work on text tokenized using a tokenizer other than Core-NLP? #32

enzoampil · 2019-05-22T03:58:56Z

Is it safe to assume that BERTSUM will perform without issues on input text tokenized using a tokenizer other than Core-NLP as long as we implement sentence splitting?

nlpyang · 2019-05-22T13:58:15Z

Hi, if the sentence splitting is correct, it should work fine. And the tokenizer is not that important, since we always use BERT's subtokenizer after the tokenization.

enzoampil · 2019-06-02T06:08:24Z

Thank you for confirming!

Santosh-Gupta · 2019-07-18T18:15:20Z

Hello,

I was wondering if you could provide a sample of what the tokenized data should look like. I am not able to install the Stanford Tokenizer from Java in Google Colab, so I am not not able to check the output files , and format the output using an alternative tokenizer.

enzoampil closed this as completed Jun 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we expect BertSum to work on text tokenized using a tokenizer other than Core-NLP? #32

Do we expect BertSum to work on text tokenized using a tokenizer other than Core-NLP? #32

enzoampil commented May 22, 2019

nlpyang commented May 22, 2019

enzoampil commented Jun 2, 2019

Santosh-Gupta commented Jul 18, 2019

Do we expect BertSum to work on text tokenized using a tokenizer other than Core-NLP? #32

Do we expect BertSum to work on text tokenized using a tokenizer other than Core-NLP? #32

Comments

enzoampil commented May 22, 2019

nlpyang commented May 22, 2019

enzoampil commented Jun 2, 2019

Santosh-Gupta commented Jul 18, 2019