Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for returning all sentences with ranked order #154

Closed
RafaelWO opened this issue Mar 17, 2021 · 4 comments
Closed

Option for returning all sentences with ranked order #154

RafaelWO opened this issue Mar 17, 2021 · 4 comments
Assignees
Labels

Comments

@RafaelWO
Copy link

RafaelWO commented Mar 17, 2021

First, I really like your project :)

What I stumpled accross is the following: I wanted to write a method that returns the top n (n = return_top) sentences using one of your summarizers, but also the "excluded" sentences:

stemmer = Stemmer("english")
tokenizer = Tokenizer("english")
model = LsaSummarizer(stemmer)
parser = PlaintextParser.from_string(text, tokenizer)
excluded = []
summary = []
for i, sentence in enumerate(model(parser.document, "100%")):
    if not return_top or i < return_top:
        summary.append(str(sentence).strip())
    else:
        excluded.append(str(sentence))

Then I figured out that summary always contains the first return_top sentences in document order. This is due to this line in AbstractSummarizer where the resulting sentences get reorder by document order.

Would it be meaningful for you if an option ([True, False]) for reordering by document order would be added somehow?
Thanks!

@miso-belica
Copy link
Owner

Hi, unfortunately, there is no unified method to get this. For LSA method it's at

ranks = iter(self._compute_ranks(sigma, v))
but every summarization method is different internally so it gets the ranks differently. But if you look into __call__ method of every summarized you will find it.

Thanks for the suggestion, but I don't think I want to add some boolean flags into the API. Maybe a new method, but can you please describe your use-case so could help you maybe a better way? What do you try to achieve?

@miso-belica miso-belica self-assigned this Mar 23, 2021
@RafaelWO
Copy link
Author

RafaelWO commented Mar 25, 2021

Thanks for your answer!

Sure, each method calculates the ranks in a different way, but that's not what I was talking about. I meant that after the sentences are sorted via the algorithm's rank, the are re-sorted in document-order:

infos = sorted(infos, key=attrgetter("order"))

which makes sense, of course. But this makes it impossible to retrieve all sentences in "rating"-order, because due to the line above the sentence order will be the same as in the original document again.

Examples:
Using a summarizer to return the best 2 sentences:

# Sentence order in terms of code from AbstractSummarizer._get_best_sentences() with sentence_count = 2
text = "A bit important. Useless sentence. This is very important! But this too."
# After line 45 (method order)
infos = ["But this too.", "This is very important!", "A bit important.", "Useless sentence."]
# After line 49 (reduced to sentence_count)
infos = ["But this too.", "This is very important!"]
# After line 51 (document order)
infos = ["This is very important!", "But this too."]  # this is fine :)

Using a summarizer to return all sentences:

# Sentence order in terms of code from AbstractSummarizer._get_best_sentences() with sentence_count = 100%
text = "A bit important. Useless sentence. This is very important! But this too."
# After line 45 (method order)
infos = ["But this too.", "This is very important!", "A bit important.", "Useless sentence."]
# After line 49 (reduced to sentence_count)
infos = ["But this too.", "This is very important!", "A bit important.", "Useless sentence."]  # That is the result I would want in this case
# After line 51 (document order)
infos = ["A bit important.", "Useless sentence.", "This is very important!", "But this too."]  # this is bad because it is just the document order

Goal: I'm trying to get back the whole document re-ordered with a specific method.

I hope you understand what I mean :)

@miso-belica
Copy link
Owner

Yeah, I know what you mean. By What do you try to achieve? I tried to find out why do you need it. Like a business use case for this. What is your motivation to do it? Because as I said there is no way how to do it currently, but if you are doing something that would be potentially bigger use-case (more people would find it useful) I could implement it somehow. OR you could send a PR. But if it's something just for you, maybe a better way is just to inherit LSA summarizer and overwrite the method _get_best_sentences so sentences are not sorted there for you.

@RafaelWO
Copy link
Author

What do you try to achieve?

I'm trying to list all sentences of a document according to importance :)

@RafaelWO RafaelWO closed this as completed Dec 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants