Skip to content

heinrichreimer/modern-talking

Repository files navigation

CI Code coverage Issues Commit activity License

🗣️ modern-talking

Modern Talking: Key-Point Analysis using Modern Natural Language Processing

Participation at the Quantitative Summarization – Key Point Analysis Shared Task (data on GitHub).

Usage

Installation

First, install Python 3, pipx, and Pipenv. Then install dependencies (may take a while):

pipenv install

Run a matcher pipeline

Run a pipeline to train and evaluate a matcher with respect to a given metric:

pipenv run python -m modern_talking [MATCHER] [MATCHER_OPTIONS] [METRIC]

This will automatically download all datasets, train the matcher on the train set and evaluate the metric for predicted labels on the dev and test set (test evaluation will be skipped if test labels are unknown). Predicted labels are also saved to data/out/predictions-[MATCHER].json in JSON format as described in the shared task documentation.

List available matchers with:

pipenv run python -m modern_talking --help

List individual matcher's options with:

pipenv run python -m modern_talking [MATCHER] --help

Examples

Term overlap baseline:

pipenv run python -m modern_talking term-overlap map

Term overlap baseline (with preprocessing):

pipenv run python -m modern_talking term-overlap --stemming --stop-words --custom-stop-words --synonyms map

BERT classifier:

pipenv run python -m modern_talking transformers --type bert --name bert-base-uncased map

Manual evaluation

Evaluate predicted matches in JSON format:

pipenv run python modern_talking/evaluation/track_1_kp_matching.py data/ data/out/predictions-[METRIC]-[MATCHER].json

Replace data/out/predictions-[METRIC]-[MATCHER].json with the path to a file containing predicted matches in JSON format as described in the shared task documentation.

Testing

Run all unit tests:

pipenv run pytest

License

This repository is licensed under the MIT License except for the evaluation script from the shared tasks organizers, licensed under the Apache License 2.0.