<a href="https://colab.research.google.com/github/kkrusere/NLP-Text-Summarization/blob/main/text_summarization_using_abstractive_method.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from google.colab import drive
import os
#mounting google drive
drive.mount('/content/drive')
########################################
#changing the working directory
os.chdir("/content/drive/MyDrive/NLP_Data")

!pwd


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/NLP_Data


## <center>**NLP Text Summarization** <center><em>
**<center>Abstractive Summarization</center>**

Text summarization refers to the technique of shortening long pieces of text, with the intention of creating a coherent and fluent summary having only the main points outlined in the document. Basically, the process of creating shorter text without removing the semantic structure of text.
</em></center>
<br>
<center><img src="https://github.com/kkrusere/NLP-Text-Summarization/blob/main/assets/mchinelearning_text_sum.png?raw=1" width=600/></center>

***Project Contributors:*** Kuzi Rusere<br>
**MVP streamlit App URL:** N/A

In [2]:
import nltk
import spacy

from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.probability import FreqDist
import string
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

stop_words = set(stopwords.words("english"))

# Load spaCy model
nlp = spacy.load('en_core_web_sm')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


**Abstractive summarization** involves understanding the core ideas of the text and then creating a new, condensed version that expresses those ideas, potentially using different words and phrasing. Unlike extractive summarization, which relies on selecting sentences or phrases from the text, abstractive summarization generates summaries that may not directly reuse sentences from the original text but instead create a human-like paraphrased version of the summary. It can be more complex because it requires the ability to truly understand the text and create meaningful new text that represents it.

For our example text, we are going use this brief explainer of the history of Chaos theory

In [3]:
text = """
In 1961, a meteorologist by the name of Edward Lorenz made a profound discovery. Lorenz was utilising the new-found power of computers in an attempt to more accurately predict the weather. He created a mathematical model which, when supplied with a set of numbers representing the current weather, could predict the weather a few minutes in advance.
Once this computer program was up and running, Lorenz could produce long-term forecasts by feeding the predicted weather back into the computer over and over again, with each run forecasting further into the future.Accurate minute-by-minute forecasts added up into days, and then weeks.
One day, Lorenz decided to rerun one of his forecasts. In the interests of saving time he decided not to start from scratch; instead he took the computer’s prediction from halfway through the first run and used that as the starting point.
After a well-earned coffee break, he returned to discover something unexpected. Although the computer’s new predictions started out the same as before, the two sets of predictions soon began diverging drastically. What had gone wrong?
Lorenz soon realised that while the computer was printing out the predictions to three decimal places, it was actually crunching the numbers internally using six decimal places.
So while Lorenz had started the second run with the number 0.506, the original run had used the number 0.506127.
A difference of one part in a thousand: the same sort of difference that a flap of a butterfly’s wing might make to the breeze on your face. The starting weather conditions had been virtually identical. The two predictions were anything but.
Lorenz had found the seeds of chaos. In systems that behave nicely - without chaotic effects - small differences only produce small effects. In this case, Lorenz’s equations were causing errors to steadily grow over time.
This meant that tiny errors in the measurement of the current weather would not stay tiny, but relentlessly increased in size each time they were fed back into the computer until they had completely swamped the predictions.
Lorenz famously illustrated this effect with the analogy of a butterfly flapping its wings and thereby causing the formation of a hurricane half a world away.
A nice way to see this “butterfly effect” for yourself is with a game of pool or billiards. No matter how consistent you are with the first shot (the break), the smallest of differences in the speed and angle with which you strike the white ball will cause the pack of billiards to scatter in wildly different directions every time.
The smallest of differences are producing large effects - the hallmark of a chaotic system.
It is worth noting that the laws of physics that determine how the billiard balls move are precise and unambiguous: they allow no room for randomness.
What at first glance appears to be random behaviour is completely deterministic - it only seems random because imperceptible changes are making all the difference.
The rate at which these tiny differences stack up provides each chaotic system with a prediction horizon - a length of time beyond which we can no longer accurately forecast its behaviour.
In the case of the weather, the prediction horizon is nowadays about one week (thanks to ever-improving measuring instruments and models).
Some 50 years ago it was 18 hours. Two weeks is believed to be the limit we could ever achieve however much better computers and software get.
Surprisingly, the solar system is a chaotic system too - with a prediction horizon of a hundred million years. It was the first chaotic system to be discovered, long before there was a Chaos Theory.
In 1887, the French mathematician Henri Poincaré showed that while Newton’s theory of gravity could perfectly predict how two planetary bodies would orbit under their mutual attraction, adding a third body to the mix rendered the equations unsolvable.
The best we can do for three bodies is to predict their movements moment by moment, and feed those predictions back into our equations …
Though the dance of the planets has a lengthy prediction horizon, the effects of chaos cannot be ignored, for the intricate interplay of gravitation tugs among the planets has a large influence on the trajectories of the asteroids.
Keeping an eye on the asteroids is difficult but worthwhile, since such chaotic effects may one day fling an unwelcome surprise our way.
On the flip side, they can also divert external surprises such as steering comets away from a potential collision with Earth.

"""

#### **Abstractive Summarization** using NLTK, spaCy, Gensim, and Sumy

Abstractive summarization using traditional NLP libraries like NLTK, spaCy, Gensim, and Sumy can be more challenging since these libraries are more commonly used for extractive summarization. However, we can create a basic approach to mimic abstractive summarization by combining various techniques.

1. Preprocessing the Text
- Before we start with the summarization, we need to preprocess the text to clean and prepare it.

In [4]:
def preprocess_text(text):
    # Tokenize sentences
    sentences = sent_tokenize(text)

    # Remove stopwords and punctuation
    stop_words = set(stopwords.words('english'))
    processed_sentences = []

    for sentence in sentences:
        words = word_tokenize(sentence)
        filtered_words = [word.lower() for word in words if word.lower() not in stop_words and word not in string.punctuation]
        processed_sentences.append(' '.join(filtered_words))

    return processed_sentences


In [5]:
preprocessed_text = preprocess_text(text)
preprocessed_text

['1961 meteorologist name edward lorenz made profound discovery',
 'lorenz utilising new-found power computers attempt accurately predict weather',
 'created mathematical model supplied set numbers representing current weather could predict weather minutes advance',
 'computer program running lorenz could produce long-term forecasts feeding predicted weather back computer run forecasting future.accurate minute-by-minute forecasts added days weeks',
 'one day lorenz decided rerun one forecasts',
 'interests saving time decided start scratch instead took computer ’ prediction halfway first run used starting point',
 'well-earned coffee break returned discover something unexpected',
 'although computer ’ new predictions started two sets predictions soon began diverging drastically',
 'gone wrong',
 'lorenz soon realised computer printing predictions three decimal places actually crunching numbers internally using six decimal places',
 'lorenz started second run number 0.506 original run u

2. Extract Keywords and Key Phrases
- To create an abstractive summary, we need to identify key phrases and concepts from the text.

In [6]:
from collections import Counter

def extract_keywords(text, num_keywords=10):
    doc = nlp(text)
    # Extract noun chunks (key phrases)
    keywords = [chunk.text for chunk in doc.noun_chunks]
    # Get the most common keywords
    common_keywords = Counter(keywords).most_common(num_keywords)
    return common_keywords

In [7]:
keywords = extract_keywords(text)
keywords

[('Lorenz', 7),
 ('which', 4),
 ('they', 4),
 ('the weather', 3),
 ('the computer', 3),
 ('he', 3),
 ('it', 3),
 ('that', 3),
 ('we', 3),
 ('the current weather', 2)]

3. Generate Sentence Embeddings (Using Gensim)
- We can use Gensim to create sentence embeddings, which will help in understanding the context and semantic similarity between sentences.

In [8]:
from gensim.models import Word2Vec

def generate_sentence_embeddings(sentences):
    # Tokenize sentences
    tokenized_sentences = [word_tokenize(sentence) for sentence in sentences]

    # Train Word2Vec model
    model = Word2Vec(tokenized_sentences, vector_size=100, window=5, min_count=1, workers=4)

    # Generate sentence embeddings by averaging word vectors
    sentence_embeddings = []
    for sentence in tokenized_sentences:
        if len(sentence) > 0:
            sentence_embedding = sum([model.wv[word] for word in sentence if word in model.wv]) / len(sentence)
            sentence_embeddings.append(sentence_embedding)
        else:
            sentence_embeddings.append(None)

    return sentence_embeddings

In [9]:
sentence_embeddings = generate_sentence_embeddings(preprocessed_text)
sentence_embeddings

[array([-1.5709799e-03,  2.4688703e-03,  2.5568870e-03,  1.9268787e-03,
        -3.6906933e-03,  1.7219689e-04,  1.0265656e-03,  3.5070456e-03,
        -6.7889306e-04,  1.2482239e-03,  2.0356001e-03, -4.9342951e-03,
         8.4150326e-04,  3.9611133e-03,  4.4671004e-05,  1.6185560e-04,
         1.6769703e-04, -1.5911397e-03,  1.4256187e-03, -1.9127495e-03,
         2.0720013e-03,  3.0152821e-03,  9.7796856e-04,  5.2917009e-04,
         2.4478650e-03,  1.3367350e-03,  2.5816613e-03, -2.6693805e-03,
        -3.5975473e-03,  1.2704364e-03,  5.1818555e-04, -8.3519716e-04,
        -1.8282429e-03,  1.0235037e-03,  3.1617824e-03,  2.9561226e-03,
         2.6178497e-03,  2.5847489e-03,  2.4531116e-03, -7.1909430e-04,
        -3.0495808e-05,  1.8929821e-03, -3.9368072e-03, -3.2983959e-04,
         6.4781774e-04,  1.2689453e-03, -1.7243668e-03,  4.0463842e-03,
        -2.8574609e-03, -6.8223663e-04, -1.2569262e-03,  4.8764329e-04,
        -1.0062446e-03,  1.7586289e-03,  1.5704347e-03,  1.86002

4. Rank Sentences Based on Keywords and Similarity
- We rank sentences by their relevance to the extracted keywords and the similarity of their embeddings to one another.

In [10]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def rank_sentences(sentences, keywords, sentence_embeddings):
    # Rank based on keyword occurrence
    keyword_sentences = [(sentence, sum(sentence.count(keyword[0]) for keyword in keywords)) for sentence in sentences]

    # Rank based on similarity (optional, more for extractive purposes)
    if sentence_embeddings:
        similarity_matrix = cosine_similarity(sentence_embeddings)
        similarity_scores = similarity_matrix.sum(axis=1)
        combined_ranking = [(sentence, keyword_score + similarity_score) for (sentence, keyword_score), similarity_score in zip(keyword_sentences, similarity_scores)]
    else:
        combined_ranking = keyword_sentences

    # Sort sentences by combined score
    ranked_sentences = sorted(combined_ranking, key=lambda x: x[1], reverse=True)

    return ranked_sentences



In [11]:
ranked_sentences = rank_sentences(preprocessed_text, keywords, sentence_embeddings)
ranked_sentences

[('1887 french mathematician henri poincaré showed newton ’ theory gravity could perfectly predict two planetary bodies would orbit mutual attraction adding third body mix rendered equations unsolvable',
  9.23914623260498),
 ('created mathematical model supplied set numbers representing current weather could predict weather minutes advance',
  7.527207374572754),
 ('lorenz utilising new-found power computers attempt accurately predict weather',
  6.4801695346832275),
 ('case weather prediction horizon nowadays one week thanks ever-improving measuring instruments models',
  6.367793560028076),
 ('computer program running lorenz could produce long-term forecasts feeding predicted weather back computer run forecasting future.accurate minute-by-minute forecasts added days weeks',
  6.196478843688965),
 ('two weeks believed limit could ever achieve however much better computers software get',
  6.0695788860321045),
 ('meant tiny errors measurement current weather would stay tiny relentless

5. Generate Abstractive Summary
- Finally, we can create an abstractive summary by paraphrasing and rephrasing the top-ranked sentences.

In [12]:
import random

def paraphrase_sentence(sentence):
    words = word_tokenize(sentence)
    random.shuffle(words)
    paraphrased_sentence = ' '.join(words)
    return paraphrased_sentence

def generate_abstractive_summary(ranked_sentences, num_sentences=3):
    top_sentences = [sentence[0] for sentence in ranked_sentences[:num_sentences]]
    paraphrased_sentences = [paraphrase_sentence(sentence) for sentence in top_sentences]
    summary = ' '.join(paraphrased_sentences)
    return summary


In [13]:
summary = generate_abstractive_summary(ranked_sentences)
summary

'bodies french third planetary poincaré theory newton henri equations 1887 mix unsolvable predict adding body mathematician mutual gravity orbit would perfectly rendered two attraction ’ could showed current numbers model created set mathematical predict weather representing advance supplied could minutes weather weather new-found computers lorenz accurately attempt power utilising predict'

In [14]:
"""
Summary=
        equations adding predict would bodies mathematician unsolvable perfectly theory attraction orbit 1887 henri mutual rendered planetary two
        french body gravity mix newton could poincaré third ’ showed supplied minutes representing predict set created model numbers mathematical
        weather weather current could advance lorenz attempt accurately weather utilising predict power computers new-found
"""

'\nSummary=\n        equations adding predict would bodies mathematician unsolvable perfectly theory attraction orbit 1887 henri mutual rendered planetary two \n        french body gravity mix newton could poincaré third ’ showed supplied minutes representing predict set created model numbers mathematical \n        weather weather current could advance lorenz attempt accurately weather utilising predict power computers new-found\n'

The generated summary is not a good summary of the input text (as expected).

- What the summary is focusing on:
> - The summary primarily highlights the concept of chaos theory and its impact on weather prediction.
> - It mentions Edward Lorenz's work with computer models and the butterfly effect.
> - It also briefly touches upon the solar system as a chaotic system.

* What the summary is missing:
> - The detailed explanation of Lorenz's experiment:
  > > * The text provides a step-by-step account of how Lorenz discovered chaos, including the specific details of his computer model, the rounding error, and the resulting divergence in predictions.
  > > *This is crucial to understanding the core concept of chaos theory.
> - The connection between chaos and determinism:
  > > * The text emphasizes that chaotic systems, while seemingly random, are actually governed by deterministic laws.
    > > * The summary completely omits this important point.
> - The prediction horizon and its implications:
  > > * The text explains the concept of the prediction horizon and its relevance to weather forecasting and the solar system.
  > > * The summary mentions the prediction horizon in relation to the solar system but fails to connect it to the broader theme of chaos theory's limitations on predictability.
> - The role of chaos in the solar system and asteroid trajectories:
  > > * The text discusses how chaos affects the solar system, particularly the movement of asteroids.
  > > * The summary briefly mentions the solar system as chaotic but neglects the specific implications for asteroids and potential collisions with Earth.

- What it lacks:
> - Clarity and coherence:
  > > * The summary feels disjointed and lacks a clear flow of ideas.
  > > * The sentences are poorly connected, making it difficult to follow the overall narrative.
> - Comprehensiveness:
  > > * The summary fails to capture the full scope of the text, omitting key concepts and examples that are crucial for understanding chaos theory.
> - Accuracy:
  > > * The summary oversimplifies some aspects of chaos theory, potentially leading to misunderstandings.

In essence, the summary provides a very superficial overview of chaos theory, focusing mainly on its impact on weather prediction. It misses out on the rich details, explanations, and connections that make the original text informative and engaging.

Lets incooporate some Evaluation:

To evaluate the quality of the summaries generated by our extractive summarization algorithm, we can use several evaluation metrics. The most widely used evaluation metric for summarization tasks is ROUGE (Recall-Oriented Understudy for Gisting Evaluation). Other evaluation metrics include BLEU (Bilingual Evaluation Understudy) and METEOR (Metric for Evaluation of Translation with Explicit ORdering).

Evaluation Metrics for Summarization
- ROUGE (Recall-Oriented Understudy for Gisting Evaluation):
    * ROUGE-1: Measures the overlap of unigrams (single words) between the generated summary and a reference summary.
    * ROUGE-2: Measures the overlap of bigrams (two consecutive words).
    * ROUGE-L: Measures the longest common subsequence (LCS) between the generated and reference summaries, capturing the in-sequence overlap.
- BLEU (Bilingual Evaluation Understudy):
    * Primarily used for machine translation but can be adapted for summarization.
    * Measures n-gram precision of a generated text concerning one or more reference texts.
- METEOR (Metric for Evaluation of Translation with Explicit ORdering):
    * Designed to improve BLEU by addressing problems like synonymy and stemming.
    * It considers unigram matches between generated and reference summaries, applying stemming and synonymy matching.


In [15]:
!pip install rouge-score

from rouge_score import rouge_scorer



In [16]:
def calculate_rouge_scores1(generated_summary, reference_summary):
    """
    Calculate ROUGE scores for a generated summary compared to a reference summary.

    Args:
        generated_summary (str): The generated summary to evaluate.
        reference_summary (str): The reference summary for comparison.

    Returns:
        dict: A dictionary containing ROUGE-1, ROUGE-2, and ROUGE-L scores.
    """
    # Initialize the ROUGE scorer with ROUGE-1, ROUGE-2, and ROUGE-L
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

    # Calculate ROUGE scores
    scores = scorer.score(reference_summary, generated_summary)

    # Print the ROUGE scores
    print("ROUGE Scores:")
    print(f"ROUGE-1: {scores['rouge1'].precision:.3f}")
    print(f"ROUGE-2: {scores['rouge2'].precision:.3f}")
    print(f"ROUGE-L: {scores['rougeL'].precision:.3f}")

    return scores


In [17]:
reference_summary = """Chaos theory is a field of study in mathematics that examines the behavior of dynamical systems that are highly sensitive to initial conditions.
This sensitivity is popularly referred to as the butterfly effect, where a small change in one state of a deterministic nonlinear system can result in large differences in a later state.
Edward Lorenz, a meteorologist, made significant contributions to chaos theory through his work on weather prediction models. He discovered that even tiny errors in the initial
measurements of weather conditions could lead to drastically different forecasts over time. This finding highlighted the inherent limitations in predicting the long-term behavior of
chaotic systems. The concept of a prediction horizon emerged, representing the time limit beyond which accurate predictions become impossible due to the exponential growth of errors.
Chaos theory has implications beyond weather forecasting. For instance, the three-body problem in celestial mechanics demonstrates the chaotic nature of gravitational interactions
between three or more celestial bodies. Even with precise initial conditions, predicting the long-term trajectories of these bodies becomes increasingly difficult due to the
sensitivity to initial conditions."""

# Calculate and display ROUGE scores
calculate_rouge_scores1(summary, reference_summary)

ROUGE Scores:
ROUGE-1: 0.300
ROUGE-2: 0.000
ROUGE-L: 0.140


{'rouge1': Score(precision=0.3, recall=0.08196721311475409, fmeasure=0.12875536480686695),
 'rouge2': Score(precision=0.0, recall=0.0, fmeasure=0.0),
 'rougeL': Score(precision=0.14, recall=0.03825136612021858, fmeasure=0.06008583690987125)}

In [18]:
def calculate_rouge_scores2(generated_summary, original_text):
    """
    Evaluates the quality of a generated summary using ROUGE scores, even without a reference summary.

    This function calculates ROUGE scores by comparing the generated summary against the original text,
    serving as a pseudo-reference. While not ideal, it provides a relative measure of how well the
    summary captures the salient information from the original text.

    Args:
        generated_summary (str): The summary generated by the summarization algorithm.
        original_text (str): The original text from which the summary was generated.

    Returns:
        dict: A dictionary containing ROUGE-1, ROUGE-2, and ROUGE-L scores.
    """
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    # Calculate ROUGE scores
    scores = scorer.score(original_text, generated_summary)

    # Print the ROUGE scores
    print("ROUGE Scores:")
    print(f"ROUGE-1: {scores['rouge1'].precision:.3f}")
    print(f"ROUGE-2: {scores['rouge2'].precision:.3f}")
    print(f"ROUGE-L: {scores['rougeL'].precision:.3f}")
    return scores



In [19]:
# Evaluate the generated summary
calculate_rouge_scores1(summary, text)

ROUGE Scores:
ROUGE-1: 1.000
ROUGE-2: 0.041
ROUGE-L: 0.280


{'rouge1': Score(precision=1.0, recall=0.06596306068601583, fmeasure=0.12376237623762378),
 'rouge2': Score(precision=0.04081632653061224, recall=0.002642007926023778, fmeasure=0.004962779156327543),
 'rougeL': Score(precision=0.28, recall=0.018469656992084433, fmeasure=0.034653465346534656)}

##### **Incorporating BLEU and METEOR Scores for Evaluation**
Now we will evaluate the quality of generated text summaries using two additional evaluation metrics: `BLEU (Bilingual Evaluation Understudy)` and `METEOR (Metric for Evaluation of Translation with Explicit ORdering)`. These metrics complement ROUGE by providing different perspectives on text similarity:

- `BLEU Score`: Measures how close the generated summary is to the reference summary by comparing n-grams.
- `METEOR Score`: Considers synonymy, stemming, and word order, providing a more nuanced evaluation of the generated summary's alignment with the reference summary.

The functions provided will calculate `BLEU` and `METEOR` scores for a generated summary compared to a reference summary. Both scores will help us assess the generated summaries' fluency, informativeness, and relevance more comprehensively.

In [20]:
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from nltk.translate.meteor_score import meteor_score

def calculate_bleu_score(generated_summary, reference_summary):
    """
    Calculate BLEU score for a generated summary compared to a reference summary.

    Args:
        generated_summary (str): The generated summary to evaluate.
        reference_summary (str): The reference summary for comparison.

    Returns:
        float: The BLEU score.
    """
    # Tokenize the summaries
    generated_tokens = word_tokenize(generated_summary)
    reference_tokens = [word_tokenize(reference_summary)]

    # Calculate BLEU score with smoothing
    bleu_score = sentence_bleu(reference_tokens, generated_tokens, smoothing_function=SmoothingFunction().method1)

    # Print the BLEU score
    print("BLEU Score:", bleu_score)

    return bleu_score

def calculate_meteor_score(generated_summary, reference_summary):
    """
    Calculate METEOR score for a generated summary compared to a reference summary.

    Args:
        generated_summary (str): The generated summary to evaluate.
        reference_summary (str): The reference summary for comparison.

    Returns:
        float: The METEOR score.
    """
    # Tokenize the summaries
    generated_tokens = word_tokenize(generated_summary)
    reference_tokens = word_tokenize(reference_summary)

    # Calculate METEOR score
    meteor_score_value = meteor_score([' '.join(reference_tokens)], ' '.join(generated_tokens))

    # Print the METEOR score
    print("METEOR Score:", meteor_score_value)

    return meteor_score_value


def calculate_meteor_score(generated_summary, reference_summary):
    """
    Calculate METEOR score for a generated summary compared to a reference summary.

    Args:
        generated_summary (str): The generated summary to evaluate.
        reference_summary (str): The reference summary for comparison.

    Returns:
        float: The METEOR score.
    """
    # Tokenize the summaries
    generated_tokens = word_tokenize(generated_summary)
    reference_tokens = word_tokenize(reference_summary)

    # Calculate METEOR score using tokenized inputs
    meteor_score_value = meteor_score([reference_tokens], generated_tokens)  # Pass lists of tokens

    # Print the METEOR score
    print("METEOR Score:", meteor_score_value)

    return meteor_score_value

In [21]:
# Calculate and display BLEU score
bleu_score = calculate_bleu_score(summary, reference_summary)

# Calculate and display METEOR score
meteor_score_value = calculate_meteor_score(summary, reference_summary)


BLEU Score: 0.0003282420509515036
METEOR Score: 0.04709141274238226


**ROUGE Score Evaluation**

`calculate_rouge_scores2 Results:`
> - ROUGE-1 Precision: 0.300 (30%)
> - ROUGE-2 Precision: 0.000 (0%)
> - ROUGE-L Precision: 0.160 (16%)

Explanation:

- The `ROUGE-1` precision score of 0.30 indicates that 30% of the unigrams in the generated summary are also found in the reference summary. This shows a moderate overlap in terms of individual words.
- The `ROUGE-2` precision score of 0.0 indicates no overlap of bigrams, which means the generated summary fails to match any two consecutive words from the reference summary. This suggests that the generated summary is missing some contextual or consecutive word relationships present in the reference summary.
- The `ROUGE-L` precision score of 0.16 shows a 16% overlap in the longest common subsequences, which is relatively low and indicates that the generated summary is not very well aligned with the reference summary in terms of structure and fluency.

These scores suggest that while there is some overlap in vocabulary, the generated summary lacks the more nuanced context and structural elements necessary for a high-quality summary.

`calculate_rouge_scores2 Results:`
> - ROUGE-1 Precision: 1.000 (100%)
> - ROUGE-2 Precision: 0.020 (2%)
> - ROUGE-L Precision: 0.260 (26%)

Explanation:

- The `ROUGE-1` precision score of 1.0 (100%) is misleading and suggests that the generated summary contains all the unigrams found in the original text. This is because the generated summary is compared to the original text itself, leading to artificially high precision.
- The `ROUGE-2` precision score of 0.02 (2%) is still low, reinforcing that there is minimal overlap in terms of bigrams.
- The `ROUGE-L` precision score of 0.26 (26%) indicates slightly better structural overlap but is still not substantial.

These scores suggest that comparing the generated summary to the original text does not provide meaningful evaluation metrics. It artificially inflates unigram precision but does not align well with expected human judgments of summary quality.

**BLEU Score Evaluation**
The BLEU (Bilingual Evaluation Understudy) score measures the n-gram overlap between the generated and reference summaries, with a focus on precision.

- BLEU Score: 0.000328

Explanation:
> - A `BLEU` score close to zero indicates that there is virtually no overlap between the generated summary and the reference summary.
> - `BLEU` scores are usually higher when there is a good match of both words and their order.
> - The extremely low score here suggests the generated summary diverges significantly from the reference summary in both content and structure.

**METEOR Score Evaluation**
The METEOR (Metric for Evaluation of Translation with Explicit ORdering) score evaluates summary quality based on the alignment of unigrams, stemming, and synonymy, incorporating both precision and recall.

- METEOR Score: 0.047

Explanation:

> - The `METEOR` score of 0.047 is very low, indicating poor alignment between the generated and reference summaries.
> - This score considers not only the exact matches of words but also similar meanings.
> - The low score suggests that the generated summary does not capture the meaning of the original well, nor does it present a semantically coherent summary.

**Overall Conclusion**

- The ROUGE, BLEU, and METEOR scores collectively suggest that the generated summary does not effectively capture the key points, context, or structure of the original text.
- The generated summary appears to lack coherence, relevant vocabulary, and structured flow, which are essential for effective summarization.
- Further refinement is needed in the summarization model, possibly by using more sophisticated abstractive techniques like transformer-based models (e.g., `BERT, GPT`) to generate summaries that are closer to human judgment and align better with reference texts.


Let's build An Abstractive Text Summarization model from scratch using a sequence-to-sequence (Seq2Seq) architecture.

#### **What is Sequence-to-Sequence `(Seq2Seq)` architecture?**

- Sequence-to-Sequence `(Seq2Seq) `architecture is a type of neural network designed for tasks where the input and output are sequences of varying lengths.
- It's particularly well-suited for natural language processing tasks like machine translation, text summarization, and chatbot conversations.

**Key Components:**

- **Encoder:** The encoder processes the input sequence, token by token, and compresses the information into a fixed-length vector called the `"context"` or `"thought"` vector. This vector aims to encapsulate the entire meaning of the input sequence.

- **Decoder:** The decoder takes the context vector from the encoder and generates the output sequence, token by token. It uses the information in the context vector to predict the next most likely token in the output sequence, conditioned on the tokens generated so far.

**How it works:**
- The input sequence is fed into the encoder, which processes each token and updates its internal state.
- After processing the entire input sequence, the final state of the encoder becomes the context vector.
- The context vector is passed to the decoder as its initial state.
- The decoder starts generating the output sequence, using the context vector and the previously generated tokens to predict the next token.
- This process continues until the decoder generates an end-of-sequence token or reaches a maximum length.

**Advantages:**
- Handles variable-length input and output sequences.
- Can learn complex relationships between input and output sequences.
- Has been successfully applied to a wide range of NLP tasks.

**Limitations:**
- The fixed-length context vector can be a bottleneck for long sequences, as it might not capture all the necessary information.  
- Can be computationally expensive to train, especially for large models and long sequences.

**Advancements:**
- Attention mechanisms have been introduced to overcome the limitations of the fixed-length context vector, allowing the decoder to focus on relevant parts of the input sequence at each step.
- Transformer models, based on the attention mechanism, have achieved state-of-the-art results on many NLP tasks, surpassing traditional Seq2Seq models.

Overall, Seq2Seq architecture is a powerful tool for handling sequential data and has significantly impacted the field of natural language processing.



In [22]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Embedding, Dense, Concatenate, TimeDistributed
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import mixed_precision


# Set random seed for reproducibility
tf.random.set_seed(42)

# Hyperparameters
max_text_len = 400   # Maximum length of input text
max_summary_len = 100  # Maximum length of summary
embedding_dim = 50   # Embedding dimension size
latent_dim = 100   # LSTM units




1. Load and Prepare the Dataset:
- We'll use the datasets library from Hugging Face to load a dataset containing text-summary pairs. For this, we will use the popular "CNN/DailyMail" dataset.

In [23]:
# Hugging Face Datasets. The dataset is available for Python via the datasets library from Hugging Face.
!pip install datasets
from datasets import load_dataset



In [24]:
# Load the CNN/DailyMail dataset with 'default' configuration
dataset = load_dataset("cnn_dailymail", "default")
print(dataset.keys())

# """
# The above returns the following:

# dict_keys(['validation', 'test'])

# """


dict_keys(['validation', 'test'])


- Since the CNN/DailyMail dataset has only 'validation' and 'test' splits available, we need to create a training split from the validation set. We'll split the validation set into training and validation subsets.


In [25]:
from sklearn.model_selection import train_test_split
from datasets import Dataset

# Extract the 'test' set
test_dataset = dataset['test']

# Extract the 'validation' set
validation_set = dataset['validation']

# Convert to a Pandas DataFrame for easy manipulation
df = validation_set.to_pandas()

# Split the validation set into new train and validation sets
train_df, valid_df = train_test_split(df, test_size=0.2, random_state=42)

# Convert back to datasets
train_dataset = Dataset.from_pandas(train_df)
valid_dataset = Dataset.from_pandas(valid_df)


In [26]:
# Extract the texts and summaries from the training dataset
texts = [example['article'] for example in train_dataset]
summaries = [example['highlights'] for example in train_dataset]

# Preprocessing function to clean the texts (e.g., remove punctuation, lowercase)
def preprocess_text(text):
  """
  Preprocesses the input text by removing punctuation, converting to lowercase, and removing extra whitespace.

  Args:
      text (str): The input text to preprocess.

  Returns:
      str: The preprocessed text.
  """
  # Remove punctuation
  text = text.translate(str.maketrans('', '', string.punctuation))
  # Convert to lowercase
  text = text.lower()
  # Remove extra whitespace
  text = ' '.join(text.split())
  return text


In [27]:
# Clean the texts and summaries
texts_cleaned = [preprocess_text(text) for text in texts]
summaries_cleaned = [preprocess_text(summary) for summary in summaries]

# Tokenize texts and summaries
text_tokenizer = Tokenizer()
text_tokenizer.fit_on_texts(texts_cleaned)

summary_tokenizer = Tokenizer()
summary_tokenizer.fit_on_texts(summaries_cleaned)

# Convert texts to sequences and pad them
text_sequences = text_tokenizer.texts_to_sequences(texts_cleaned)
text_sequences_padded = pad_sequences(text_sequences, maxlen=max_text_len, padding='post')

summary_sequences = summary_tokenizer.texts_to_sequences(summaries_cleaned)
summary_sequences_padded = pad_sequences(summary_sequences, maxlen=max_summary_len, padding='post')

# Define vocabulary sizes
vocab_size_text = len(text_tokenizer.word_index) + 1
vocab_size_summary = len(summary_tokenizer.word_index) + 1



In [28]:
# Enable mixed precision policy
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)

- Build the Encoder Model: The below function initializes and builds the encoder part of the Seq2Seq model.

In [29]:
def build_encoder(vocab_size_text, embedding_dim, latent_dim, max_text_len):
    """
    Build the encoder model.

    Args:
        vocab_size_text (int): Size of the vocabulary for the text.
        embedding_dim (int): Dimension of the embedding layer.
        latent_dim (int): Number of units in the LSTM layer.
        max_text_len (int): Maximum length of the input text sequences.

    Returns:
        encoder_model (Model): The built encoder model.
        encoder_states (list): List of encoder states (hidden and cell states).
    """
    encoder_inputs = Input(shape=(max_text_len,))
    encoder_embedding = Embedding(vocab_size_text, embedding_dim, trainable=True)(encoder_inputs)
    encoder_lstm = LSTM(latent_dim, return_state=True)
    encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
    encoder_states = [state_h, state_c]

    encoder_model = Model(encoder_inputs, encoder_states)
    return encoder_model, encoder_states


- Build the Decoder Model: The below function initializes and builds the decoder part of the Seq2Seq model.

In [30]:
def build_decoder(vocab_size_summary, embedding_dim, latent_dim, encoder_states):
    """
    Build the decoder model.

    Args:
        vocab_size_summary (int): Size of the vocabulary for the summary.
        embedding_dim (int): Dimension of the embedding layer.
        latent_dim (int): Number of units in the LSTM layer.
        encoder_states (list): List of encoder states (hidden and cell states).

    Returns:
        decoder_model (Model): The built decoder model.
    """
    decoder_inputs = Input(shape=(None,))
    decoder_embedding = Embedding(vocab_size_summary, embedding_dim, trainable=True)(decoder_inputs)
    decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
    decoder_dense = TimeDistributed(Dense(vocab_size_summary, activation='softmax', dtype='float32'))
    decoder_outputs = decoder_dense(decoder_outputs)

    decoder_model = Model(decoder_inputs, decoder_outputs)
    return decoder_model


- Compile and Train the Seq2Seq Model: This function compiles and trains the Seq2Seq model.

In [31]:
def compile_and_train_seq2seq_model(encoder_model, decoder_model, text_sequences_padded, summary_sequences_padded, vocab_size_summary):
    """
    Compile and train the Seq2Seq model.

    Args:
        encoder_model (Model): The encoder model.
        decoder_model (Model): The decoder model.
        text_sequences_padded (numpy array): Padded input text sequences.
        summary_sequences_padded (numpy array): Padded summary sequences.
        vocab_size_summary (int): Size of the vocabulary for the summary.

    Returns:
        history: Training history object.
    """
    # Define the Seq2Seq model
    model = Model([encoder_model.input, decoder_model.input], decoder_model.output)

    # Compile the model with mixed precision
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    # Print model summary
    model.summary()

    # Early stopping to prevent overfitting
    early_stopping = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=3)

    # Prepare the decoder targets (shifted summaries)
    decoder_target_data = np.array([to_categorical(seq, num_classes=vocab_size_summary) for seq in summary_sequences_padded])

    # Train the model
    history = model.fit(
        [text_sequences_padded, summary_sequences_padded[:, :-1]],  # Inputs (texts and summaries shifted)
        decoder_target_data[:, 1:],  # Targets (shifted summaries)
        epochs=20,
        batch_size=1,
        validation_split=0.2,
        callbacks=[early_stopping]
    )

    return history


In [None]:
# Build the encoder model
encoder_model, encoder_states = build_encoder(vocab_size_text, embedding_dim, latent_dim, max_text_len)

# Build the decoder model
decoder_model = build_decoder(vocab_size_summary, embedding_dim, latent_dim, encoder_states)

# Compile and train the Seq2Seq model
history = compile_and_train_seq2seq_model(encoder_model, decoder_model, text_sequences_padded, summary_sequences_padded, vocab_size_summary)

# Plot training history
def plot_training_history(history):
    """
    Plot the training and validation loss and accuracy from the training history.

    Args:
        history: Training history object from model.fit().
    """
    plt.figure(figsize=(12, 6))

    # Plot training & validation loss values
    plt.subplot(1, 2, 1)
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('Model Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend(['Train', 'Validation'], loc='upper right')

    # Plot training & validation accuracy values
    plt.subplot(1, 2, 2)
    plt.plot(history.history['accuracy'])
    plt.plot(history.history['val_accuracy'])
    plt.title('Model Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend(['Train', 'Validation'], loc='upper left')

    plt.show()

# Plot training history
plot_training_history(history)

#### **Conclusion**