Modern Talking: Key-Point Analysis using Modern Natural Language Processing
Participation at the Quantitative Summarization – Key Point Analysis Shared Task (data on GitHub).
First, install Python 3.9 or higher and then clone this repository. From inside the repository directory, create a virtual environment and activate it:
python3.9 -m venv venv/
source venv/bin/activate
Then, install the test dependencies:
pip install -e .
Run a pipeline to train and evaluate a matcher with respect to a given metric:
python -m modern_talking [MATCHER] [MATCHER_OPTIONS] [METRIC]
This will automatically download all datasets, train the matcher on the train set and evaluate the metric for predicted labels on the dev and test set (test evaluation will be skipped if test labels are unknown).
Predicted labels are also saved to data/out/predictions-[MATCHER].json
in JSON format as described in the shared task documentation.
List available matchers with:
python -m modern_talking --help
List individual matcher's options with:
python -m modern_talking [MATCHER] --help
Term overlap baseline:
python -m modern_talking term-overlap map
Term overlap baseline (with preprocessing):
python -m modern_talking term-overlap --stemming --stop-words --custom-stop-words --synonyms map
BERT classifier:
python -m modern_talking transformers --type bert --name bert-base-uncased map
Evaluate predicted matches in JSON format:
python modern_talking/evaluation/track_1_kp_matching.py data/ data/out/predictions-[METRIC]-[MATCHER].json
Replace data/out/predictions-[METRIC]-[MATCHER].json
with the path to a file containing predicted matches in JSON format as described in the shared task documentation.
Run all unit tests:
pytest
This repository is licensed under the MIT License except for the evaluation script from the shared tasks organizers, licensed under the Apache License 2.0.