Text-Summarization

Before anything:

What is this?

As described, a testing place for different text summarization tools.

What is the background?

Here's a workflow I created for the complete task. Put it simple, we were asked to complete the task of generating automated summarzations from the Credera weekly updates videos. The videos are on Microsoft Stream, and ultimately we need to generate summarized texts for each video.

For now, we will be focusing on the 3rd part of the workflow, namely, the text summarization.

What are the notebooks about?

They includes different approaches for text summarization. In the following part, I will give simple introductions of each methods.

DistilBERT from transformers

What is DistilBERT?:
- Distil here means (Knowledge) Distillation, just as the literal means, is a compression technique in which a small model is trained to reproduce the behavior of a larger model (or an ensemble of models). It was first introduced in (https://www.cs.cornell.edu/~caruana/compression.kdd06.pdf), then generalized in (https://arxiv.org/abs/1503.02531)
- BERT here stands for Bidirectional Encoder Representations from Transformers, is applying the bidirectional training of Transformer (https://arxiv.org/pdf/1706.03762.pdf), a popular attention model, to language modelling. This is in contrast to previous efforts which looked at a text sequence either from left to right or combined left-to-right and right-to-left training. The paper’s results show that a language model which is bidirectionally trained can have a deeper sense of language context and flow than single-direction language models.
Documentation: https://huggingface.co/transformers/model_doc/distilbert.html
Original paper: Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf: "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter", 2020. https://arxiv.org/abs/1910.01108
Model name: sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6#)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
cleaned_transcripts		cleaned_transcripts
transcripts		transcripts
.gitignore		.gitignore
PyTorch Extractive & Abstractive Summarization.ipynb		PyTorch Extractive & Abstractive Summarization.ipynb
README.md		README.md
Text Summarization Approach Comparisons.ipynb		Text Summarization Approach Comparisons.ipynb
Text_Summarize.ipynb		Text_Summarize.ipynb
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Summarization

Before anything:

About

Releases

Packages

Contributors 2

Languages

franknb/Text-Summarization

Folders and files

Latest commit

History

Repository files navigation

Text-Summarization

Before anything:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages