A toolkit for understanding factuality & consistency errors in summarization models.
-
A harness for generating text summaries with automated factuality evaluations
-
- NLI (textual entailment)
-
- Question answering
-
- Other metrics (BERT-Score, Rouge Score, etc.)
-
An interactive query interface for exploring generated summaries (i.e. XSum or custom dataset)
-
- Search for common factuality errors across your dataset (i.e. find all numerical errors)
-
- Explore faithfulness & factuality annotations (if available)
-
An interactive query interface for ngram lookup
-
- search for a ngram query from the dataset
Setup (python 3.8):
pip install -r requirements.txt
pip install .
streamlit run interface/app.py
You can also run interfaces individually, i.e.
streamlit run interface/summary_interface.py
Setup (python 3.8):
pip install -r requirements.dev.txt
pip install -Ue .
Before commiting:
black sumtool/ interface/ scripts/
flake8 sumtool/ interface/ scripts/
-
Create a Github token to access your private repositories. Follow these steps here: Github: Creating a Personal Access Token
-
Create a new Colab notebook and set the runtime type to GPU
-
Add the following commands in the first cell to clone the repository and install the requirements
!git clone https://[your-git-token]@github.com/cs6741/summary-analysis.git
!pip install -r /content/summary-analysis/requirements.txt
- Add the following command to run the text generation script
!python /content/generate_xsum_summary.py --bbc_ids [idx1,idx2] --data_split [train|test]
Pipeline for storage:
- Store generated summaries
- Compute summary metrics for stored summaries using sumtool.
<document_id>:
summary: the generated summary,
metadata: ...metadata for the generated summary, i.e. annotations / score / entropy
<document_id>:
...metrics for a stored summary, i.e. rouge-score, bert-score