some improvements #5

olivier-compilatio · 2019-04-04T13:27:07Z

Hi,
After reading the paper, I wanted to try the software. I like the idea of using WE vectors instead of just creating vectors from the yet-to-summarize text.
It didn't work at first so I dug into the code and made some changes :
1 : the stopword remover wasn't working : stopword_remover.add_keyword(stopword, "") makes it replace stopword by itself, because "" is understood as none. I just did a quick fix but it works.
2 : In centroid_word_embeddings.py , the word_vector_cache were bugged : the dictionary in the first if was new, so empty, and the condition was never entered. I modified the dict with the embedding model dict.
3 : I updated the load_gensim_embedding_model function. Seems the loading is simpler now than it was when you wrote the code!
I hope those fixes will help you,
Olivier

A-Gulati · 2019-06-17T20:27:13Z

Hi Oliver, any chance you could show me how to run the model on a sample subset of documents? I'm trying to work everything out!

olivier-compilatio · 2019-06-18T06:52:17Z

It's quite simple, here is a python script you can try and modify.
`from text_summarizer.centroid_word_embeddings import CentroidWordEmbeddingsSummarizer
from text_summarizer.centroid_word_embeddings import load_gensim_embedding_model
import gensim.downloader as api

api.info() # show info about available models/datasets
model = api.load("glove-wiki-gigaword-300")

cwes = CentroidWordEmbeddingsSummarizer(
model, debug=True, bow_param=2, length_param=5, position_param=4
)
with open("sample.txt") as f:
text = f.read().replace("\n", " ")
cwes.summarize(text)
`

A-Gulati · 2019-06-18T12:05:23Z

After installing some dependencies, this worked. Thanks Oliver!

blazejdolicki · 2019-07-03T07:33:19Z

Oh I wish I read this merge request before trying out the repo, instead of spending a few hours to fix the same bugs you solved.

olivier-compilatio added 4 commits April 4, 2019 13:22

modified lead.py so the stopword remover actually remove words!

162f5ab

fixed the bug from stopword_eraser that weren't erasing any stopword...

a3a28c5

bugfix : embedding dic wasn't the one to look for.

dce8cbf

modified gensim dowload call

d376113

blazejdolicki mentioned this pull request Jul 3, 2019

Is the extractive baseline correct? sosuperic/MeanSum#11

Open

gaetangate merged commit 8ab4115 into gaetangate:master Nov 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some improvements #5

some improvements #5

olivier-compilatio commented Apr 4, 2019

A-Gulati commented Jun 17, 2019

olivier-compilatio commented Jun 18, 2019

A-Gulati commented Jun 18, 2019

blazejdolicki commented Jul 3, 2019

some improvements #5

some improvements #5

Conversation

olivier-compilatio commented Apr 4, 2019

A-Gulati commented Jun 17, 2019

olivier-compilatio commented Jun 18, 2019

A-Gulati commented Jun 18, 2019

blazejdolicki commented Jul 3, 2019