Skip to content

j-rossi-nl/verbcl

Repository files navigation

VerbCL

Here is the dataset from the paper:

VerbCL: A Dataset of Verbatim Quotes for Highlight Extraction in Case Law

J. Rossi, S. Vakulenko, E. Kanoulas, 2021

Build the Python Environment

We use poetry as dependency manager.

  • Install poetry with pip install poetry
  • Install dependencies with poetry install
  • Install torch:
    • CPU version with poetry run poe cpu
    • GPU CUDA 10.2 with poetry run poe cuda102
    • GPU CUDA 11.1 with poetry run poe cuda111

This will create a new virtual environment.

  • Enter a shell where the environment is activated: poetry shell

Download the VerbCL Data

The data is available: Here.

Restore Snapshot

  • Python notebook for restore here
  • DIY Instructions:
    • Uncompress the archive on your filesystem (e.g. /data)
    • Declare the data folder /data/VerbCL as the root of a Snapshot Repository Instructions
    • Restore the snapshot verbcl_v1.0 Instructions

Example notebooks

  • Using the persistence API of elasticsearch-dsl in this notebook

Reproduce the Paper

(tbd) All these steps can be executed with our code:

  1. Download court listener
  2. Prepare the dataset
  3. Run baselines

Citation

Paper

Our paper is accepted at CIKM 2021, Resource Track.

Dataset

@misc{rossi-vakulenko-kanoulas-2021, 
  title={VerbCL Dataset}, 
  url={https://uvaauas.figshare.com/articles/dataset/VerbCL\_Dataset/14798878/1}, 
  DOI={10.21942/uva.14798878.v1}, 
  abstractNote={VerbCL is a dataset of US court opinions, where verbatim quotes have been mined.}, 
  publisher={University of Amsterdam / Amsterdam University of Applied Sciences}, 
  author={Rossi, J. and Vakulenko, S. and Kanoulas, E.}, 
  year={2021}, 
  month={Jun} 
} 

Contact

For questions and inquiries, contact: Julien Rossi

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published