Skip to content
forked from boudinfl/sume

Sume is an implementation of the concept-based ILP model for summarization.

License

Notifications You must be signed in to change notification settings

PedroPovedaQ/sume

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sume

The sume module is an automatic summarization library written in Python.

Description

sume contains the following extraction algorithms:

A typical usage of this module is:

import sume

# directory from which text documents to be summarized are loaded. Input
# files are expected to be in one tokenized sentence per line format.
dir_path = "/tmp/"

# create a summarizer, here a concept-based ILP model
s = sume.models.ConceptBasedILPSummarizer(dir_path)

# load documents with extension 'txt'
s.read_documents(file_extension="txt")

# compute the parameters needed by the model
# extract bigrams as concepts
s.extract_ngrams()

# compute document frequency as concept weights
s.compute_document_frequency()

# prune sentences that are shorter than 10 words, identical sentences and
# those that begin and end with a quotation mark
s.prune_sentences(mininum_sentence_length=10,
                  remove_citations=True,
                  remove_redundancy=True)

# solve the ilp model
value, subset = s.solve_ilp_problem()

# outputs the summary
print '\n'.join([s.sentences[j].untokenized_form for j in subset])

Citing the sume module

If you use sume, please cite the following paper:

Contributors

  • Florian Boudin
  • Hugo Mougard

About

Sume is an implementation of the concept-based ILP model for summarization.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%