Skip to content

This is NLP based project, completed during FALL of 2020 for CSE 4022 - Natural Language Processing. Nepali Text Summarizer circulates on the idea of tf-idf and cosine similarity.

Notifications You must be signed in to change notification settings

sarry971/Nepali-Text-Summarizer-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Nepali-Text-Summarizer

In this project, we have implemented the idea of Extractive Text Summary for Nepali text. Extractive Text Summary is one of the type of Text Summarization technique based on the Output Type. The idea of Extractive Text Summarization is to give each sentence in the text a significant score and subsequently extracts the original topmost sentences concatenate them to give the summary of the text. This technique doesn't generate any new text on its own.

Pipeline for Nepali Text Summarizer Project:

Text Preprocessing --> sentence tokenziation & word tokenization --> calculate TFIDF score for each word in a sentence --> compute cosine similarities values between each pair of sentences using TFIDF and compute average for each sentences --> take top M sentences based on cosine similarity avg and get their sentence index --> sort the index's sentence and put in summary

This project mainly focuses on summarizing Nepali Text so while preprocessing we have precisely bounded our text for Nepali however it accepts Devanagiri text.

About

This is NLP based project, completed during FALL of 2020 for CSE 4022 - Natural Language Processing. Nepali Text Summarizer circulates on the idea of tf-idf and cosine similarity.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published