DSTI Deep Learning Project: Text Summarization

Authors

Mayank Bhandari
Ouc-Houang Fogoum Philippe Jacques
Ramesh Singh

Date

November 23, 2023

1. Introduction

Text summarization is a crucial aspect of natural language processing, aiming to condense lengthy texts while retaining essential information. In this project, the focus was on utilizing the google/pegasus-cnn dailymail model for abstractive summarization.

2. Objective

The primary goal of the project was to automatically generate concise and coherent summaries for given texts using the abstractive summarization capabilities of the selected model.

DataSet

Samsum Dataset https://huggingface.co/datasets/samsum

3. Methodology

3.1 Model Selection

The google/pegasus-cnn dailymail model was chosen for its specialization in abstractive summarization. This model has been pre-trained on a diverse dataset, including the CNN/Daily Mail dataset, making it suitable for a wide range of summarization tasks.

4. Tokenization

We employed a fine-tuned Pegasus model tokenizer to break down input text into meaningful units. This tailored tokenizer, designed for optimal compatibility with the Pegasus model, enhances the text summarization pipeline’s effectiveness, ensuring coherent and meaningful abstractive summaries.

4.1 Evaluation Metrics

The model’s performance was evaluated using standard ROUGE scores, including "rouge1," "rouge2," "rougeL," and "rougeLsum." These metrics provided a comprehensive assessment of the quality of the generated summaries compared to human-authored reference summaries.

5. Results

5.1 Model Performance Evaluation

For the model evaluation, we are using ROUGE score.

Rouge Names	Result
rouge1	0.015465
rouge2	0.000297
rougeL	0.015503
rougeLsum	0.015514

Figure 1: Text Summary

5.2 Key Findings

The google/pegasus-cnn dailymail model consistently produced coherent and relevant summaries across a diverse set of input texts, as evidenced by the robust ROUGE scores.

6. Conclusion

In conclusion, the integration of the google/pegasus-cnn dailymail model, fine-tuned and evaluated using ROUGE scores, proved successful in addressing the text summarization task. The abstractive summarization approach enhances the model’s ability to distill essential information from input texts, making it a valuable tool for a variety of applications.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
DeepLearning Project Report.pdf		DeepLearning Project Report.pdf
Deep_Learning.png		Deep_Learning.png
Deep_Learning_DSTI.ipynb		Deep_Learning_DSTI.ipynb
NLP_DSTI.png		NLP_DSTI.png
README.md		README.md
config.json		config.json
generation_config.json		generation_config.json
special_tokens_map.json		special_tokens_map.json
spiece.model		spiece.model
tokenizer.json		tokenizer.json
tokenizer_config.json		tokenizer_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSTI Deep Learning Project: Text Summarization

Authors

Date

1. Introduction

2. Objective

DataSet

3. Methodology

3.1 Model Selection

4. Tokenization

4.1 Evaluation Metrics

5. Results

5.1 Model Performance Evaluation

5.2 Key Findings

6. Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DSTI Deep Learning Project: Text Summarization

Authors

Date

1. Introduction

2. Objective

DataSet

3. Methodology

3.1 Model Selection

4. Tokenization

4.1 Evaluation Metrics

5. Results

5.1 Model Performance Evaluation

5.2 Key Findings

6. Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages