#Extractive Summarization

In [7]:
text = """Financial literacy is the ability to understand and make use of a variety of financial skills, including personal financial management, budgeting, and investing. It also means comprehending certain financial principles and concepts, such as the time value of money, compound interest, managing debt, and financial planning.

Achieving financial literacy can help individuals to avoid making poor financial decisions. It can help them become self-sufficient and achieve financial stability. Key steps to attaining financial literacy include learning how to create a budget, track spending, pay off debt, and plan for retirement.

Educating yourself on these topics also involves learning how money works, setting and achieving financial goals, becoming aware of unethical/discriminatory financial practices, and managing financial challenges that life throws your way.
Personal Finance Basics
Personal finance is where financial literacy translates into individual financial decision-making. How do you manage your money? Which savings and investment vehicles are you using? Personal finance is about making and meeting your financial goals, whether you want to own a home, help other members of your family, save for your children’s college education, support causes that you care about, plan for retirement, or anything else.

Among other topics, it encompasses banking, budgeting, handling debt and credit, and investing. Let’s take a look at these basics to get you started.

Introduction to Bank Accounts
A bank account is typically the first financial account that you’ll open. Bank accounts can hold and build the money you'll need for major purchases and life events. Here’s some background on bank accounts and why they are step one in creating a stable financial future.

"""

In [8]:
import re
# Remove leading/trailing whitespace and extra newline characters
text = text.strip()
# Remove commented lines
text = re.sub(r'^\s*#.*$', '', text, flags=re.MULTILINE)

# Remove extra whitespace within the text
text = re.sub(r'\s+', ' ', text)
# Convert to lowercase
text = text.lower()

In [9]:
print(text)

financial literacy is the ability to understand and make use of a variety of financial skills, including personal financial management, budgeting, and investing. it also means comprehending certain financial principles and concepts, such as the time value of money, compound interest, managing debt, and financial planning. achieving financial literacy can help individuals to avoid making poor financial decisions. it can help them become self-sufficient and achieve financial stability. key steps to attaining financial literacy include learning how to create a budget, track spending, pay off debt, and plan for retirement. educating yourself on these topics also involves learning how money works, setting and achieving financial goals, becoming aware of unethical/discriminatory financial practices, and managing financial challenges that life throws your way. personal finance basics personal finance is where financial literacy translates into individual financial decision-making. how do you 

**Frequency-based Approach**

In [10]:
import nltk
nltk.download('punkt') # punkt tokenizer for sentence tokenization
nltk.download('punkt_tab')
nltk.download('stopwords') # list of stop words, such as 'a', 'an', 'the', 'in', etc, which would be dropped
from collections import Counter # Imports the Counter class from the collections module, used for counting the frequency of words in a text.
from nltk.corpus import stopwords # Imports the stop words list from the NLTK corpus
from nltk.tokenize import sent_tokenize, word_tokenize # Imports the sentence tokenizer and word tokenizer from the NLTK tokenizer module.


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [11]:
# this function would take 2 inputs, one being the text, and the other being the summary which would contain the number of lines
def generate_summary(text, n):
# Tokenize the text into individual sentences
  sentences = sent_tokenize(text)

# Tokenize each sentence into individual words and remove stopwords
  stop_words = set(stopwords.words('english'))
# tokenize each sentence from sentences into individual words using the word_tokenize function of nltk.tokenize module
# removes any stop words and non-alphanumeric characters from the resulting list of words and converts them all to lowercase
  words = [word.lower() for word in word_tokenize(text) if word.lower() not in stop_words and word.isalnum()]

# Compute the frequency of each word
  word_freq = Counter(words)

# Compute the score for each sentence based on the frequency of its words
# After this block of code is executed, sentence_scores will contain the scores of each sentence in the given text,
# where each score is a sum of the frequency counts of its constituent words

# empty dictionary to store the scores for each sentence
  sentence_scores = {}

  for sentence in sentences:
    sentence_words = [word.lower() for word in word_tokenize(sentence) if word.lower() not in stop_words and word.isalnum()]
    sentence_score = sum([word_freq[word] for word in sentence_words])
    if len(sentence_words) < 30:
      sentence_scores[sentence] = sentence_score

# checks if the length of the sentence_words list is less than 30 (parameter can be adjusted based on the desired length of summary sentences)
# If condition -> true, score of the current sentence is added to the sentence_scores dictionary with the sentence itself as the key
# This is to filter out very short sentences that may not provide meaningful information for summary generation

# Select the top n sentences with the highest scores
  summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:n]
  summary = ' '.join(summary_sentences)

  return summary

In [12]:
summary = generate_summary(text, 3)
summary_sentences = summary.split('. ')
formatted_summary = '.\n'.join(summary_sentences)

print(formatted_summary)

educating yourself on these topics also involves learning how money works, setting and achieving financial goals, becoming aware of unethical/discriminatory financial practices, and managing financial challenges that life throws your way.
financial literacy is the ability to understand and make use of a variety of financial skills, including personal financial management, budgeting, and investing.
personal finance basics personal finance is where financial literacy translates into individual financial decision-making.


**TF-IDF Approach**

In [13]:
from sklearn.feature_extraction.text import TfidfVectorizer
# importing cosine_similarity function to compute the cosine similarity between two vectors.
from sklearn.metrics.pairwise import cosine_similarity
# importing nlargest to return the n largest elements from an iterable in descending order.
from heapq import nlargest

In [14]:
def generate_summary_tfidf(text, n):
# Tokenize the text into individual sentences
  sentences = sent_tokenize(text)

# Create the TF-IDF matrix
  vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(3,4) )
  tfidf_matrix = vectorizer.fit_transform(sentences)

# Compute the cosine similarity between each sentence and the document
  sentence_scores = cosine_similarity(tfidf_matrix[-1], tfidf_matrix[:-1])[0]

# Select the top n sentences with the highest scores
  summary_sentences = nlargest(n, range(len(sentence_scores)), key=sentence_scores.__getitem__)

  summary_tfidf = ' '.join([sentences[i] for i in sorted(summary_sentences)])

  return summary_tfidf

In [15]:
summary = generate_summary_tfidf(text, 3)
summary_sentences = summary.split('. ')
formatted_summary = '.\n'.join(summary_sentences)

print(formatted_summary)

financial literacy is the ability to understand and make use of a variety of financial skills, including personal financial management, budgeting, and investing.
it also means comprehending certain financial principles and concepts, such as the time value of money, compound interest, managing debt, and financial planning.
achieving financial literacy can help individuals to avoid making poor financial decisions.


**Count Vectorizer**

In [16]:
from nltk.tokenize import sent_tokenize, word_tokenize
from sklearn.feature_extraction.text import CountVectorizer


# Count Vectorizer Approach
def generate_summary_cv(text, n):
  # Tokenize the text into individual sentences
  sentences = sent_tokenize(text)

  # Create the Count Vectorizer matrix
  vectorizer = CountVectorizer(stop_words='english')
  count_matrix = vectorizer.fit_transform(sentences)

  # Compute the cosine similarity between each sentence and the document
  sentence_scores = cosine_similarity(count_matrix[-1], count_matrix[:-1])[0]

  # Select the top n sentences with the highest scores
  summary_sentences = nlargest(n, range(len(sentence_scores)), key=sentence_scores.__getitem__)

  summary_count = ' '.join([sentences[i] for i in sorted(summary_sentences)])

  return summary_count


In [17]:
summary = generate_summary_cv(text, 3)
summary_sentences = summary.split('. ')
formatted_summary = '.\n'.join(summary_sentences)

print(formatted_summary)

financial literacy is the ability to understand and make use of a variety of financial skills, including personal financial management, budgeting, and investing.
introduction to bank accounts a bank account is typically the first financial account that you’ll open.
bank accounts can hold and build the money you'll need for major purchases and life events.


In [18]:
pip install sumy

Collecting sumy
  Downloading sumy-0.11.0-py2.py3-none-any.whl.metadata (7.5 kB)
Collecting docopt<0.7,>=0.6.1 (from sumy)
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting breadability>=0.1.20 (from sumy)
  Downloading breadability-0.1.20.tar.gz (32 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pycountry>=18.2.23 (from sumy)
  Downloading pycountry-24.6.1-py3-none-any.whl.metadata (12 kB)
Downloading sumy-0.11.0-py2.py3-none-any.whl (97 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m97.3/97.3 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pycountry-24.6.1-py3-none-any.whl (6.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m56.6 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: breadability, docopt
  Building wheel for breadability (setup.py) ... [?25l[?25hdone
  Created wheel for breadability: filename=brea

**Luhn Summarizer**

In [19]:
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.luhn import LuhnSummarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words

In [20]:
def summarize_luhn(paragraph, sentences_count=2):
    parser = PlaintextParser.from_string(paragraph, Tokenizer("english"))

    summarizer = LuhnSummarizer(Stemmer("english"))
    summarizer.stop_words = get_stop_words("english")

    summary = summarizer(parser.document, sentences_count)
    return summary

In [21]:
sentences_count = 3
summary = summarize_luhn(text, sentences_count)

for sentence in summary:
  print(sentence)

financial literacy is the ability to understand and make use of a variety of financial skills, including personal financial management, budgeting, and investing.
educating yourself on these topics also involves learning how money works, setting and achieving financial goals, becoming aware of unethical/discriminatory financial practices, and managing financial challenges that life throws your way.
personal finance basics personal finance is where financial literacy translates into individual financial decision-making.


**Edmundson Summarizer**

In [22]:
from sumy.summarizers.edmundson import EdmundsonSummarizer

In [23]:
def summarize_Edmundson(paragraph, sentences_count=2, bonus_words=[''], stigma_words=[''], null_words=['']):
    parser = PlaintextParser.from_string(paragraph, Tokenizer("english"))

    summarizer = EdmundsonSummarizer(Stemmer("english"))
    summarizer.stop_words = get_stop_words("english")

    summarizer.bonus_words = bonus_words

    summarizer.stigma_words = stigma_words

    summarizer.null_words = null_words

    summary = summarizer(parser.document, sentences_count)
    return summary

In [24]:
sentences_count = 3
summary = summarize_Edmundson(text, sentences_count)

for sentence in summary:
  print(sentence)

financial literacy is the ability to understand and make use of a variety of financial skills, including personal financial management, budgeting, and investing.
it also means comprehending certain financial principles and concepts, such as the time value of money, compound interest, managing debt, and financial planning.
here’s some background on bank accounts and why they are step one in creating a stable financial future.


**LSA Summarizer**

In [25]:
from sumy.summarizers.lsa import LsaSummarizer

In [26]:
def summarize_LSA(paragraph, sentences_count=2):
    parser = PlaintextParser.from_string(paragraph, Tokenizer("english"))

    summarizer = LsaSummarizer(Stemmer("english"))
    summarizer.stop_words = get_stop_words("english")

    summary = summarizer(parser.document, sentences_count)
    return summary

In [27]:
sentences_count = 3
summary = summarize_LSA(text, sentences_count)

for sentence in summary:
  print(sentence)

key steps to attaining financial literacy include learning how to create a budget, track spending, pay off debt, and plan for retirement.
personal finance is about making and meeting your financial goals, whether you want to own a home, help other members of your family, save for your children’s college education, support causes that you care about, plan for retirement, or anything else.
bank accounts can hold and build the money you'll need for major purchases and life events.


**TextRank**

In [28]:
# Load Packages
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.text_rank import TextRankSummarizer

In [29]:
parser = PlaintextParser.from_string(text,Tokenizer("english"))

In [30]:
# Summarize using sumy TextRank
summarizer = TextRankSummarizer()
summary =summarizer(parser.document,3)
text_summary=""

In [31]:
for sentence in summary:
  print(sentence )

financial literacy is the ability to understand and make use of a variety of financial skills, including personal financial management, budgeting, and investing.
educating yourself on these topics also involves learning how money works, setting and achieving financial goals, becoming aware of unethical/discriminatory financial practices, and managing financial challenges that life throws your way.
personal finance is about making and meeting your financial goals, whether you want to own a home, help other members of your family, save for your children’s college education, support causes that you care about, plan for retirement, or anything else.


**KL Sum algorithm**

In [32]:
from sumy.summarizers.kl import KLSummarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words

def summarize_kl_sum(paragraph, sentences_count):
    parser = PlaintextParser.from_string(paragraph, Tokenizer("english"))

    summarizer = KLSummarizer(Stemmer("english"))
    summarizer.stop_words = get_stop_words("english")

    summary = summarizer(parser.document, sentences_count)
    return summary

summary = summarize_kl_sum(text, 3)

for sentence in summary:
  print(sentence)

it also means comprehending certain financial principles and concepts, such as the time value of money, compound interest, managing debt, and financial planning.
it can help them become self-sufficient and achieve financial stability.
educating yourself on these topics also involves learning how money works, setting and achieving financial goals, becoming aware of unethical/discriminatory financial practices, and managing financial challenges that life throws your way.
