Text Summarizer

About the project

This project is offered as a service through a simple flask website wherein a user can either provide a URL or enter text manually and get an elegant summary of the provided text. The underlying principal on which the algorithm works is extractive summarisation, where the important sentences or phrases from the original text are identified and extracted to form a summary.

Landing Page	Mode of input: URL	Summary Page

How it works?

We request the page source with urllib and then parse that page with BeautifulSoup and extract the text from the <p> tags.
Text Processing: First the paragraphs are converted into sentences then we remove all special characters, numbers, punctuation, and stop words (words such as is, an, a, the, for that do not add value to the meaning of a sentence) from the extracted text.
Tokenization: The text is divided into a series of tokens using the sent_tokenize() function of nltk. Tokenizing the sentences is done to get all the words present in the sentences.
A frequency table of the words is created to evaluate the weighted occurrence frequency of the words. The approach of weighing is based on frequencies i.e. every word/term is assigned a weight using tf-idf (term frequency – inverted document frequency) approach. The weight of a term = term frequency * inverse of document frequency.
We then substitute words with their weighted frequencies choosing all sentences above a certain weight threshold and ordering the selected sentences as they appear in the original article.
Finally after getting all the required parameters we generate the summary.

Prerequisites

urllib: for requesting a webpage
bs4: for parsing the web page
lxml: for processing html and xml with python
nltk: for performing natural language processing tasks

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
static		static
templates		templates
README.md		README.md
app.py		app.py
main.py		main.py
summarise.py		summarise.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Summarizer

About the project

How it works?

Prerequisites

About

Languages

rittikbasu/text-summarizer

Folders and files

Latest commit

History

Repository files navigation

Text Summarizer

About the project

How it works?

Prerequisites

About

Topics

Resources

Stars

Watchers

Forks

Languages