Skip to content

Latest commit

 

History

History
18 lines (9 loc) · 1.24 KB

README.md

File metadata and controls

18 lines (9 loc) · 1.24 KB

README

This project has three approaches to solve the Single Document Summarization problem.

LSA_summary.py : This algorithm uses the Latent Semantic analysis approach to solve the problem.

KMedoid_summarize.py : This algorithm uses sentence similarity and K-medoid clustering approach to solve the problem.

Lexrank.py : This algorithm uses tf-idf values based on training corpus, sentence similarity based on the tf-idf vectors and then a graph based Lex Rank approach to solve the problem.

baseline_summarizer.py : This is a baseline approach which picks first k sentences of a document(This is based on research done in the field of document summarization which says the first few sentences are the most relevant sentences)

Clustering.py : This algorithm builds relevant clusters from a training corpus. The cluster which is most similar to the test document will be used to to extract the idf values for the test document.

idf_model.py : This algorithm builds a model file containing relevant idf values for words in the training corpus.

plot_bar.py : This code analyzes the performance of the summarizers discussed above against some of the best online summarizer tools by plotting a bar graph for documents and their Fscores from ROGUE evaluation system.