Skip to content

heinrichreimer/modern-talking

Repository files navigation

CI Code coverage Issues Commit activity License

🗣️ modern-talking

Modern Talking: Key-Point Analysis using Modern Natural Language Processing

Participation at the Quantitative Summarization – Key Point Analysis Shared Task (data on GitHub).

Usage

Installation

First, install Python 3.9 or higher and then clone this repository. From inside the repository directory, create a virtual environment and activate it:

python3.9 -m venv venv/
source venv/bin/activate

Then, install the test dependencies:

pip install -e .

Run a matcher pipeline

Run a pipeline to train and evaluate a matcher with respect to a given metric:

python -m modern_talking [MATCHER] [MATCHER_OPTIONS] [METRIC]

This will automatically download all datasets, train the matcher on the train set and evaluate the metric for predicted labels on the dev and test set (test evaluation will be skipped if test labels are unknown). Predicted labels are also saved to data/out/predictions-[MATCHER].json in JSON format as described in the shared task documentation.

List available matchers with:

python -m modern_talking --help

List individual matcher's options with:

python -m modern_talking [MATCHER] --help

Examples

Term overlap baseline:

python -m modern_talking term-overlap map

Term overlap baseline (with preprocessing):

python -m modern_talking term-overlap --stemming --stop-words --custom-stop-words --synonyms map

BERT classifier:

python -m modern_talking transformers --type bert --name bert-base-uncased map

Manual evaluation

Evaluate predicted matches in JSON format:

python modern_talking/evaluation/track_1_kp_matching.py data/ data/out/predictions-[METRIC]-[MATCHER].json

Replace data/out/predictions-[METRIC]-[MATCHER].json with the path to a file containing predicted matches in JSON format as described in the shared task documentation.

Testing

Run all unit tests:

pytest

License

This repository is licensed under the MIT License except for the evaluation script from the shared tasks organizers, licensed under the Apache License 2.0.