Extractive Summarization on Custom txt file not working. #130

kbagalo · 2020-02-18T18:28:18Z

I am trying to run the model on the test data provided in the raw_data folder (dev branch). The summary I am getting is always the 1st sentence of every record in the source file.
Is there a way to change the no of sentences in the summary, and get more sentences as a part of summary?
Tried changing in trainer_ext.py

trainer_ext.py
if ((not cal_oracle) and (not self.args.recall_eval) and len(_pred) == 5):

but it does not work. The arguments I am using are:
-task ext -mode test_text -test_from ../models/bertext_cnndm_transformer.pt -text_src ../raw_data/temp_ext_raw_src.txt -result_path ../results/ootb_output -alpha 0.95 -log_file ../logs/test.log -visible_gpus -1

The text was updated successfully, but these errors were encountered:

xnancy · 2020-03-19T03:13:13Z

I encountered the same problem and after digging around, found the issue is in src/models/data_loader.py. In the load_text function, you need to patch _process_src so that the special tokens '[CLS]' and '[SEP]' are not tokenized by the tokenizer. As it stands, the special tokens are tokenized by BertTokenizer, so the dataloader does not recognize the delims for multiple sentences. A quick fix merging all '[', '[##cl]', '[##s]', '##]' => '[CLS]' and '[', '##se', '##p', '##]' => '[SEP]' in src_subtokens should fix your problem.

guozhonghao1994 · 2020-03-24T04:33:43Z

Hello @xnancy ! Thank you for the answer. But I still can't solve the problem from your word. Could you please post the part where you did any change in data_loader.py? Thanks so much!

kbagalo · 2020-03-24T20:17:18Z

@xnancy you are right, I had found and resolved the issue sometime earlier. The problem is once the special tokens '[CLS]' and '[SEP]' are converted to lower case, they are not recognized as "special" by the tokenizer anymore, and you end up getting 1 single sentence for your entire input. Sharing a workaround here. All you need to do is go to dataloader.py, in the function _process_src(raw), comment out the lower() (line 301) method call.
def _process_src(raw): raw = raw.strip() #lower()
You may want to add the lower at some other time in your data.

phuawenpu · 2020-03-31T09:22:31Z

Hi @kbagalo / @xnancy and others,
this is a newbie question: I tried to run with python train.py -task ext -mode test_text -test_from ../models/bertsum_ext/model_step_148000.pt -text_src ../raw_data/temp_ext_raw_src.txt -result_path ../results -alpha 0.95 -log_file ../logs/test.log -visible_gpus 0

but I got a RuntimeError: Error(s) in loading state_dict for ExtSummarizer: .... the log is attached
test.log

Any help would be appreciated! Thank you,
Wenpu.

nlpyang · 2020-04-01T23:05:21Z

This is indeed a bug, I have pushed a update to fix this.
Sorry for this.

dardodel · 2020-04-02T13:53:25Z

@nlpyang Thanks for your response. But I cannot find the update on your Github. Can you please direct us to it's location? Also, can you please make a simple sample and the code to run it in different modes, abstractive and extractive? I really appreciate it. Thanks.

nikisix · 2020-04-02T15:11:39Z

Anyone have the empty summary (in the gold file) problem?

nikisix · 2020-04-02T16:18:29Z

Was when the -text_tgt parameter was blank

nlpyang · 2020-04-02T19:21:52Z

@nlpyang Thanks for your response. But I cannot find the update on your Github. Can you please direct us to it's location? Also, can you please make a simple sample and the code to run it in different modes, abstractive and extractive? I really appreciate it. Thanks.

pull the repo and you will see the updates.

…ile not working. nlpyang#130

nikisix · 2020-04-08T17:36:37Z

I also found src_subtokens = [token.replace('##.', '[SEP]') for token in src_subtokens] helpful fwiw.

Hellscream64 · 2020-05-11T22:10:56Z

Still getting the one-sentence-error, Anyone else ?

kbagalo changed the title ~~Ext~~ Extractive Summarization on Custom txt file not working. Feb 18, 2020

xnancy mentioned this issue Mar 19, 2020

Extractive summarization with raw text input not working #122

Open

mmcmahon13 pushed a commit to mmcmahon13/PreSumm that referenced this issue Apr 3, 2020

fixing dataloader issue a la Extractive Summarization on Custom txt f…

37ccdae

…ile not working. nlpyang#130

ghost mentioned this issue Jun 5, 2020

raw text -mode test_text -task ext --> min/max lenght not working #164

Open

kbagalo closed this as completed Sep 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extractive Summarization on Custom txt file not working. #130

Extractive Summarization on Custom txt file not working. #130

kbagalo commented Feb 18, 2020 •

edited

Loading

xnancy commented Mar 19, 2020

guozhonghao1994 commented Mar 24, 2020

kbagalo commented Mar 24, 2020

phuawenpu commented Mar 31, 2020

nlpyang commented Apr 1, 2020

dardodel commented Apr 2, 2020

nikisix commented Apr 2, 2020

nikisix commented Apr 2, 2020 •

edited

Loading

nlpyang commented Apr 2, 2020

nikisix commented Apr 8, 2020

Hellscream64 commented May 11, 2020

Extractive Summarization on Custom txt file not working. #130

Extractive Summarization on Custom txt file not working. #130

Comments

kbagalo commented Feb 18, 2020 • edited Loading

xnancy commented Mar 19, 2020

guozhonghao1994 commented Mar 24, 2020

kbagalo commented Mar 24, 2020

phuawenpu commented Mar 31, 2020

nlpyang commented Apr 1, 2020

dardodel commented Apr 2, 2020

nikisix commented Apr 2, 2020

nikisix commented Apr 2, 2020 • edited Loading

nlpyang commented Apr 2, 2020

nikisix commented Apr 8, 2020

Hellscream64 commented May 11, 2020

kbagalo commented Feb 18, 2020 •

edited

Loading

nikisix commented Apr 2, 2020 •

edited

Loading