#Extractive Summarization

In [None]:
text = """Customer Feedback:

The customer expressed satisfaction with the overall supply of products but mentioned occasional delays in the supply chain during peak demand periods.
They appreciated the quality of Reliance's petrochemical products, especially the high-grade polymers.
A few minor issues regarding packaging were brought up, which need to be addressed.
Customer's Future Requirements:

The customer anticipates increased demand for polymer products in the next quarter due to a new project launch.
They are interested in exploring Reliance's sustainable and green petrochemical offerings to meet their sustainability goals.
Reliance Petrochemicals’ New Solutions:

Introduced the customer to Reliance’s new line of biodegradable plastics and high-performance elastomers.
Provided a demo of the latest product enhancements and technical specifications.
Supply Chain & Delivery Commitments:

Discussion focused on how Reliance can ensure more consistent delivery during peak seasons.
Proposed a real-time tracking system for better supply chain visibility, which the customer showed interest in.
Collaboration Opportunities:

The customer is open to a potential partnership for a joint research project in developing specialized polymers.
Agreed to follow up with the technical teams on both sides for a deeper exploration.
Action Items:
For Reliance Petrochemicals:

Investigate and resolve the packaging issues mentioned by the customer.
Prepare a proposal for improving supply chain efficiency, especially during high-demand periods.
Schedule a technical meeting to explore the research collaboration on specialized polymers.
For the Customer:

Provide Reliance with their quarterly demand forecast to help plan production and delivery schedules.
Share sustainability requirements for products they are interested in, especially the biodegradable range."""

**Frequency-based Approach**

In [None]:
import nltk
nltk.download('punkt') # punkt tokenizer for sentence tokenization
nltk.download('stopwords') # list of stop words, such as 'a', 'an', 'the', 'in', etc, which would be dropped
from collections import Counter # Imports the Counter class from the collections module, used for counting the frequency of words in a text.
from nltk.corpus import stopwords # Imports the stop words list from the NLTK corpus
from nltk.tokenize import sent_tokenize, word_tokenize # Imports the sentence tokenizer and word tokenizer from the NLTK tokenizer module.


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [None]:
# this function would take 2 inputs, one being the text, and the other being the summary which would contain the number of lines
def generate_summary(text, n):
# Tokenize the text into individual sentences
  sentences = sent_tokenize(text)

# Tokenize each sentence into individual words and remove stopwords
  stop_words = set(stopwords.words('english'))
# tokenize each sentence from sentences into individual words using the word_tokenize function of nltk.tokenize module
# removes any stop words and non-alphanumeric characters from the resulting list of words and converts them all to lowercase
  words = [word.lower() for word in word_tokenize(text) if word.lower() not in stop_words and word.isalnum()]

# Compute the frequency of each word
  word_freq = Counter(words)

# Compute the score for each sentence based on the frequency of its words
# After this block of code is executed, sentence_scores will contain the scores of each sentence in the given text,
# where each score is a sum of the frequency counts of its constituent words

# empty dictionary to store the scores for each sentence
  sentence_scores = {}

  for sentence in sentences:
    sentence_words = [word.lower() for word in word_tokenize(sentence) if word.lower() not in stop_words and word.isalnum()]
    sentence_score = sum([word_freq[word] for word in sentence_words])
    if len(sentence_words) < 30:
      sentence_scores[sentence] = sentence_score

# checks if the length of the sentence_words list is less than 30 (parameter can be adjusted based on the desired length of summary sentences)
# If condition -> true, score of the current sentence is added to the sentence_scores dictionary with the sentence itself as the key
# This is to filter out very short sentences that may not provide meaningful information for summary generation

# Select the top n sentences with the highest scores
  summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:n]
  summary = ' '.join(summary_sentences)

  return summary

In [None]:
summary = generate_summary(text, 6)
summary_sentences = summary.split('. ')
formatted_summary = '.\n'.join(summary_sentences)

print(formatted_summary)

Customer Feedback:

The customer expressed satisfaction with the overall supply of products but mentioned occasional delays in the supply chain during peak demand periods.
Customer's Future Requirements:

The customer anticipates increased demand for polymer products in the next quarter due to a new project launch.
Reliance Petrochemicals’ New Solutions:

Introduced the customer to Reliance’s new line of biodegradable plastics and high-performance elastomers.
Supply Chain & Delivery Commitments:

Discussion focused on how Reliance can ensure more consistent delivery during peak seasons.
For the Customer:

Provide Reliance with their quarterly demand forecast to help plan production and delivery schedules.
Action Items:
For Reliance Petrochemicals:

Investigate and resolve the packaging issues mentioned by the customer.


**TF-IDF Approach**

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
# importing cosine_similarity function to compute the cosine similarity between two vectors.
from sklearn.metrics.pairwise import cosine_similarity
# importing nlargest to return the n largest elements from an iterable in descending order.
from heapq import nlargest

In [None]:
def generate_summary_tfidf(text, n):
# Tokenize the text into individual sentences
  sentences = sent_tokenize(text)

# Create the TF-IDF matrix
  vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(3,4) )
  tfidf_matrix = vectorizer.fit_transform(sentences)

# Compute the cosine similarity between each sentence and the document
  sentence_scores = cosine_similarity(tfidf_matrix[-1], tfidf_matrix[:-1])[0]

# Select the top n sentences with the highest scores
  summary_sentences = nlargest(n, range(len(sentence_scores)), key=sentence_scores.__getitem__)

  summary_tfidf = ' '.join([sentences[i] for i in sorted(summary_sentences)])

  return summary_tfidf

In [None]:
summary = generate_summary_tfidf(text, 6)
summary_sentences = summary.split('. ')
formatted_summary = '.\n'.join(summary_sentences)

print(formatted_summary)

Customer Feedback:

The customer expressed satisfaction with the overall supply of products but mentioned occasional delays in the supply chain during peak demand periods.
They appreciated the quality of Reliance's petrochemical products, especially the high-grade polymers.
A few minor issues regarding packaging were brought up, which need to be addressed.
Customer's Future Requirements:

The customer anticipates increased demand for polymer products in the next quarter due to a new project launch.
They are interested in exploring Reliance's sustainable and green petrochemical offerings to meet their sustainability goals.
Reliance Petrochemicals’ New Solutions:

Introduced the customer to Reliance’s new line of biodegradable plastics and high-performance elastomers.


**Count Vectorizer**

In [None]:
from nltk.tokenize import sent_tokenize, word_tokenize
from sklearn.feature_extraction.text import CountVectorizer


# Count Vectorizer Approach
def generate_summary_cv(text, n):
  # Tokenize the text into individual sentences
  sentences = sent_tokenize(text)

  # Create the Count Vectorizer matrix
  vectorizer = CountVectorizer(stop_words='english')
  count_matrix = vectorizer.fit_transform(sentences)

  # Compute the cosine similarity between each sentence and the document
  sentence_scores = cosine_similarity(count_matrix[-1], count_matrix[:-1])[0]

  # Select the top n sentences with the highest scores
  summary_sentences = nlargest(n, range(len(sentence_scores)), key=sentence_scores.__getitem__)

  summary_count = ' '.join([sentences[i] for i in sorted(summary_sentences)])

  return summary_count


In [None]:
summary = generate_summary_cv(text, 6)
summary_sentences = summary.split('. ')
formatted_summary = '.\n'.join(summary_sentences)

print(formatted_summary)

Customer Feedback:

The customer expressed satisfaction with the overall supply of products but mentioned occasional delays in the supply chain during peak demand periods.
They appreciated the quality of Reliance's petrochemical products, especially the high-grade polymers.
Customer's Future Requirements:

The customer anticipates increased demand for polymer products in the next quarter due to a new project launch.
They are interested in exploring Reliance's sustainable and green petrochemical offerings to meet their sustainability goals.
Reliance Petrochemicals’ New Solutions:

Introduced the customer to Reliance’s new line of biodegradable plastics and high-performance elastomers.
Prepare a proposal for improving supply chain efficiency, especially during high-demand periods.


In [None]:
pip install sumy

Collecting sumy
  Downloading sumy-0.11.0-py2.py3-none-any.whl.metadata (7.5 kB)
Collecting docopt<0.7,>=0.6.1 (from sumy)
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting breadability>=0.1.20 (from sumy)
  Downloading breadability-0.1.20.tar.gz (32 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pycountry>=18.2.23 (from sumy)
  Downloading pycountry-24.6.1-py3-none-any.whl.metadata (12 kB)
Downloading sumy-0.11.0-py2.py3-none-any.whl (97 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m97.3/97.3 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pycountry-24.6.1-py3-none-any.whl (6.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m40.5 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: breadability, docopt
  Building wheel for breadability (setup.py) ... [?25l[?25hdone
  Created wheel for breadability: filename=brea

**Luhn Summarizer**

In [None]:
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.luhn import LuhnSummarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words

In [None]:
def summarize_luhn(paragraph, sentences_count=2):
    parser = PlaintextParser.from_string(paragraph, Tokenizer("english"))

    summarizer = LuhnSummarizer(Stemmer("english"))
    summarizer.stop_words = get_stop_words("english")

    summary = summarizer(parser.document, sentences_count)
    return summary

In [None]:
sentences_count = 6
summary = summarize_luhn(text, sentences_count)

for sentence in summary:
  print(sentence)

The customer expressed satisfaction with the overall supply of products but mentioned occasional delays in the supply chain during peak demand periods.
They are interested in exploring Reliance's sustainable and green petrochemical offerings to meet their sustainability goals.
Supply Chain & Delivery Commitments:
The customer is open to a potential partnership for a joint research project in developing specialized polymers.
Investigate and resolve the packaging issues mentioned by the customer.
Schedule a technical meeting to explore the research collaboration on specialized polymers.


**Edmundson Summarizer**

In [None]:
from sumy.summarizers.edmundson import EdmundsonSummarizer

In [None]:
def summarize_Edmundson(paragraph, sentences_count=2, bonus_words=[''], stigma_words=[''], null_words=['']):
    parser = PlaintextParser.from_string(paragraph, Tokenizer("english"))

    summarizer = EdmundsonSummarizer(Stemmer("english"))
    summarizer.stop_words = get_stop_words("english")

    summarizer.bonus_words = bonus_words

    summarizer.stigma_words = stigma_words

    summarizer.null_words = null_words

    summary = summarizer(parser.document, sentences_count)
    return summary

In [None]:
sentences_count = 6
summary = summarize_Edmundson(text, sentences_count)

for sentence in summary:
  print(sentence)

Customer Feedback:
The customer expressed satisfaction with the overall supply of products but mentioned occasional delays in the supply chain during peak demand periods.
Customer's Future Requirements:
The customer anticipates increased demand for polymer products in the next quarter due to a new project launch.
Provide Reliance with their quarterly demand forecast to help plan production and delivery schedules.
Share sustainability requirements for products they are interested in, especially the biodegradable range.


**LSA Summarizer**

In [None]:
from sumy.summarizers.lsa import LsaSummarizer

In [None]:
def summarize_LSA(paragraph, sentences_count=2):
    parser = PlaintextParser.from_string(paragraph, Tokenizer("english"))

    summarizer = LsaSummarizer(Stemmer("english"))
    summarizer.stop_words = get_stop_words("english")

    summary = summarizer(parser.document, sentences_count)
    return summary

In [None]:
sentences_count = 6
summary = summarize_LSA(text, sentences_count)

for sentence in summary:
  print(sentence)

The customer anticipates increased demand for polymer products in the next quarter due to a new project launch.
Introduced the customer to Reliance’s new line of biodegradable plastics and high-performance elastomers.
Proposed a real-time tracking system for better supply chain visibility, which the customer showed interest in.
The customer is open to a potential partnership for a joint research project in developing specialized polymers.
Prepare a proposal for improving supply chain efficiency, especially during high-demand periods.
Provide Reliance with their quarterly demand forecast to help plan production and delivery schedules.


**TextRank**

In [None]:
# Load Packages
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.text_rank import TextRankSummarizer

In [None]:
parser = PlaintextParser.from_string(text,Tokenizer("english"))

In [None]:
# Summarize using sumy TextRank
summarizer = TextRankSummarizer()
summary =summarizer(parser.document,6)
text_summary=""

In [None]:
for sentence in summary:
  print(sentence )

The customer expressed satisfaction with the overall supply of products but mentioned occasional delays in the supply chain during peak demand periods.
They appreciated the quality of Reliance's petrochemical products, especially the high-grade polymers.
The customer anticipates increased demand for polymer products in the next quarter due to a new project launch.
Introduced the customer to Reliance’s new line of biodegradable plastics and high-performance elastomers.
Proposed a real-time tracking system for better supply chain visibility, which the customer showed interest in.
The customer is open to a potential partnership for a joint research project in developing specialized polymers.


**KL Sum algorithm**

In [None]:
from sumy.summarizers.kl import KLSummarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words

def summarize_kl_sum(paragraph, sentences_count):
    parser = PlaintextParser.from_string(paragraph, Tokenizer("english"))

    summarizer = KLSummarizer(Stemmer("english"))
    summarizer.stop_words = get_stop_words("english")

    summary = summarizer(parser.document, sentences_count)
    return summary

summary = summarize_kl_sum(text, 10)

for sentence in summary:
  print(sentence)

Customer Feedback:
The customer expressed satisfaction with the overall supply of products but mentioned occasional delays in the supply chain during peak demand periods.
A few minor issues regarding packaging were brought up, which need to be addressed.
Customer's Future Requirements:
The customer anticipates increased demand for polymer products in the next quarter due to a new project launch.
Proposed a real-time tracking system for better supply chain visibility, which the customer showed interest in.
Collaboration Opportunities:
Investigate and resolve the packaging issues mentioned by the customer.
Prepare a proposal for improving supply chain efficiency, especially during high-demand periods.
For the Customer:
