# Pre-trained Summarization and Factuality

- 📺 **Video:** [https://youtu.be/feLTtTilycY](https://youtu.be/feLTtTilycY)

## Overview
- Fine-tune pre-trained summarization models and assess factual accuracy.
- Use factuality checks like QAFactEval or question answering probes.

## Key ideas
- **Pre-training:** models like BART/T5 provide strong abstractive summaries.
- **Factual consistency:** ensure summaries do not hallucinate facts.
- **Evaluation:** use QA-based metrics or entailment models.
- **Mitigation:** constrained decoding or post-editing improves faithfulness.

## Demo
Compare generated summary statements to source facts using a simple fact checklist, as in the lecture (https://youtu.be/M9ZL6NVr44E).

In [1]:
facts = {'launch time': 'morning', 'mission status': 'successful'}
summary = 'The mission launched in the morning and was successful.'

score = 0
for key, value in facts.items():
    if value in summary.lower():
        score += 1
print('Factual coverage:', score / len(facts))


Factual coverage: 1.0


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 18.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 18.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [HMM-Based Word Alignment in Statistical Translation](https://www.aclweb.org/anthology/C96-2141.pdf)
- [Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models](http://homepages.inf.ed.ac.uk/pkoehn/publications/pharaoh-amta2004.pdf)
- [Minimum Error Rate Training in Statistical Machine Translation](https://www.aclweb.org/anthology/P03-1021/)
- [Eisenstein 18.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Revisiting Low-Resource Neural Machine Translation: A Case Study](https://arxiv.org/abs/1905.11901)
- [In Neural Machine Translation, What Does Transfer Learning Transfer?](https://aclanthology.org/2020.acl-main.688/)
- [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210)
- [Large Language Models Are State-of-the-Art Evaluators of Translation Quality](https://arxiv.org/abs/2302.14520)
- [The use of MMR, diversity-based reranking for reordering documents and producing summaries](https://dl.acm.org/doi/10.1145/290941.291025)
- [LexRank: Graph-based Lexical Centrality as Salience in Text Summarization](https://arxiv.org/abs/1109.2128)
- [A Scalable Global Model for Summarization](https://www.aclweb.org/anthology/W09-1802/)
- [Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization](https://www.aclweb.org/anthology/W17-4511/)
- [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://www.aclweb.org/anthology/2020.acl-main.703/)
- [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)
- [Evaluating Factuality in Generation with Dependency-level Entailment](https://arxiv.org/pdf/2010.05478.pdf)
- [Asking and Answering Questions to Evaluate the Factual Consistency of Summaries](https://arxiv.org/abs/2004.04228)
- [News Summarization and Evaluation in the Era of GPT-3](https://arxiv.org/abs/2209.12356)
- [Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections](https://www.aclweb.org/anthology/P11-1061/)
- [Multi-Source Transfer of Delexicalized Dependency Parsers](https://www.aclweb.org/anthology/D11-1006/)
- [Massively Multilingual Word Embeddings](https://arxiv.org/pdf/1602.01925.pdf)
- [Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond](https://www.aclweb.org/anthology/Q19-1038.pdf)
- [How multilingual is Multilingual BERT?](https://www.aclweb.org/anthology/P19-1493.pdf)
- [Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data](https://aclanthology.org/2020.acl-main.463/)
- [Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?](https://arxiv.org/abs/2104.10809)
- [Entailment Semantics Can Be Extracted from an Ideal Language Model](https://arxiv.org/abs/2209.12407)
- [Experience Grounds Language](https://arxiv.org/abs/2004.10151)
- [VQA: Visual Question Answering](https://arxiv.org/abs/1505.00468)
- [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
- [The Social Impact of Natural Language Processing](https://aclanthology.org/P16-2096.pdf)
- [Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints](https://arxiv.org/pdf/1707.09457.pdf)
- [GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models](https://arxiv.org/abs/2205.12247)
- [Visually Grounded Reasoning across Languages and Cultures](https://arxiv.org/abs/2109.13238)
- [On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?](https://dl.acm.org/doi/10.1145/3442188.3445922)
- [RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models](https://arxiv.org/abs/2009.11462)
- [Datasheets for Datasets](https://arxiv.org/pdf/1803.09010.pdf)
- [Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing](https://dl.acm.org/doi/pdf/10.1145/3351095.3372873)


*Links only; we do not redistribute slides or papers.*