## Install

Install required packages and dependencies:

In [None]:
!pip install -r requirements.txt

Install transformers from source (required for tokenizers dependencies):

In [None]:
!pip install git+https://github.com/huggingface/transformers

Set environment variable to disable tokenizers parallelism:

In [None]:
%env TOKENIZERS_PARALLELISM=false

## Demo

Use the `model_name_or_path` argument to specify the model for `run_glue.py` and `run_multiple_choice.py` scripts.

To use the Legal-BERT/Custom Legal-BERT models, pass the following Hugging Face model repository names to the `model_name_or_path` argument:
* Legal-BERT: `zlucia/legalbert` (https://huggingface.co/zlucia/legalbert)
* Custom Legal-BERT: `zlucia/custom-legalbert` (https://huggingface.co/zlucia/custom-legalbert)
* BERT (double): `zlucia/bert-double` (https://huggingface.co/zlucia/bert-double)

OR

Alternatively, download the model files from the casehold Google Drive folder, unzip `models.zip`, and place the folder inside the top-level directory of the casehold GitHub repository. Then, pass the model paths to the `model_name_or_path` argument:
* Legal-BERT: `models/legalbert`
* Custom Legal-BERT: `models/custom-legalbert`
* BERT (double): `models/bert-double`

The following examples run the scripts on the Legal-BERT model. 

Switch the `model_name_or_path` argument to run the scripts on the Custom Legal-BERT model or the BERT (double) model.

### Overruling (or Terms of Service)

#### Compute pretrain loss
To compute per example/average pretrain loss across the full dataset, run the `run_glue.py` script with the arguments specified in the example.
- Pass a file containing the full dataset to `validation_file`.
- Pass `ptl=True`. 
- The script requires a `train_file`, but does not use it when `ptl=True`, so the particular file passed is not important in this case.

Running the `run_glue.py` script with `ptl=True` writes per example pretrain loss (order matches order of examples in `validation_file`) to the file `per_ex_pretrain_loss.csv` in `output_dir`. The script also prints the average pretrain loss across `validation_file` examples.


*Calculate domain specificity (DS) scores*

To calculate the domain specificity (DS) score of a task, take the difference in average pretrain loss on BERT (double) and Legal-BERT $$\overline{L}_{BERT (double)} - \overline{L}_{Legal-BERT}$$

It is also possible to use the script to calculate the DS score of a specific task example $i$ by taking the difference in the example $i$ pretrain loss on BERT (double) and Legal-BERT $$L^{(i)}_{BERT (double)} - L^{(i)}_{Legal-BERT}$$

In [None]:
# Download model from Hugging Face model repository
!python classification/run_glue.py \
  --model_name_or_path zlucia/legalbert \
  --train_file data/overruling/train.csv \
  --validation_file data/overruling/all.csv \
  --ptl=True \
  --max_seq_length 128 \
  --output_dir logs/overruling/legalbert \
  --overwrite_output_dir

In [None]:
# Assumes access to model downloaded from Google Drive
!python classification/run_glue.py \
  --model_name_or_path models/legalbert \
  --train_file data/overruling/train.csv \
  --validation_file data/overruling/all.csv \
  --ptl=True \
  --max_seq_length 128 \
  --output_dir logs/overruling/legalbert \
  --overwrite_output_dir

#### Finetune

To finetune on the dataset, run the `run_glue.py` script with the arguments specified in the example. The hyperparameters specified are the same as those from the paper.
- Pass a file containing the train split to `train_file` and a file containing the split to evaluate/predict on (dev or test split) to `validation_file`.
- Pass `do_train` to train on `train_file`, `do_eval` to evaluate on `validation_file`, and `do_predict` to predict on `validation_file`. 

Running the `run_glue.py` script with `do_train` and `do_eval` trains the specified model on `train_file`, evaluates the trained model on `validation_file`, and writes the trained model/tokenizer files and the evaluation results to the file `eval_results.txt` in `output_dir`. Passing `do_predict` writes the class label predictions on `validation_file` to the file `predictions.csv` in `output_dir`. The script also prints the evaluation results on `validation_file` (evaluation F1, evaluation loss etc.).

In [None]:
# Download model from Hugging Face model repository
!python classification/run_glue.py \
  --model_name_or_path zlucia/legalbert \
  --train_file data/overruling/train.csv \
  --validation_file data/overruling/dev.csv \
  --do_train \
  --do_eval \
  --evaluation_strategy steps \
  --max_seq_length 128 \
  --per_device_train_batch_size=16 \
  --learning_rate=1e-5 \
  --num_train_epochs=2.0 \
  --output_dir logs/overruling/legalbert \
  --overwrite_output_dir \
  --logging_steps 50

In [None]:
# Assumes access to model downloaded from Google Drive
!python classification/run_glue.py \
  --model_name_or_path models/legalbert \
  --train_file data/overruling/train.csv \
  --validation_file data/overruling/dev.csv \
  --do_train \
  --do_eval \
  --evaluation_strategy steps \
  --max_seq_length 128 \
  --per_device_train_batch_size=16 \
  --learning_rate=1e-5 \
  --num_train_epochs=2.0 \
  --output_dir logs/overruling/legalbert \
  --overwrite_output_dir \
  --logging_steps 50

### CaseHOLD

#### Compute pretrain loss
To compute per example/average pretrain loss across the full dataset, run the `run_multiple_choice.py` script with the arguments specified in the example.
- Pass `casehold` as `task_name`.
- Pass the path to the data directory to `data_dir`. 
- Pass `ptl=True`.
- By default, when `ptl=True`, the script loads the full dataset from the file named `all.csv` in the data directory.
- To change the default file names for the splits, edit `utils_multiple_choice.py`.

Running the `run_multiple_choice` script with `ptl=True` writes per example pretrain loss to the file `per_ex_pretrain_loss.csv` (order matches order of examples in loaded dataset) in `output_dir`. The script also prints the average pretrain loss across examples.

*Calculate domain specificity (DS) scores*

To calculate the domain specificity (DS) score of a task, take the difference in average pretrain loss on BERT (double) and Legal-BERT $$\overline{L}_{BERT (double)} - \overline{L}_{Legal-BERT}$$

It is also possible to use the script to calculate the DS score of a specific task example $i$ by taking the difference in the example $i$ pretrain loss on BERT (double) and Legal-BERT $$L^{(i)}_{BERT (double)} - L^{(i)}_{Legal-BERT}$$

In [None]:
# Download model from Hugging Face model repository
!python multiple_choice/run_multiple_choice.py \
  --task_name casehold \
  --model_name_or_path zlucia/legalbert \
  --data_dir data/casehold \
  --ptl=True \
  --max_seq_length 128 \
  --output_dir logs/casehold/legalbert \
  --overwrite_output_dir

In [None]:
# Assumes access to model downloaded from Google Drive
!python multiple_choice/run_multiple_choice.py \
  --task_name casehold \
  --model_name_or_path models/legalbert \
  --data_dir data/casehold \
  --ptl=True \
  --max_seq_length 128 \
  --output_dir logs/casehold/legalbert \
  --overwrite_output_dir

#### Finetune

To finetune on the dataset, run the `run_multiple_choice.py` script with the arguments specified in the example. The hyperparameters specified are the same as those from the paper.
- Pass `casehold` as `task_name`.
- Pass a file containing the train split to `train_file` and a file containing the split to evaluate/predict on (dev or test split) to `validation_file`.
- Pass `do_train` to train, `do_eval` to evaluate, and `do_predict` to predict.
- By default, `ptl=False`, and the script loads the train split as the train dataset from the file named `train.csv` in the data directory and loads the dev split as the evaluation/prediction dataset from the file named `dev.csv` in the data directory. To load the test split as the evaluation/prediction dataset from the file `test.csv` in the data directory, pass `mode=Split.test`.
- To change the default file names for the splits, edit `utils_multiple_choice.py`.

Running the `run_multiple_choice.py` script with `do_train` and `do_eval` trains the specified model on the train dataset, evaluates the trained model on evaluation dataset, and writes the trained model/tokenizer files and the evaluation results to the file `eval_results.txt` in `output_dir`. Passing `do_predict` writes the class label predictions on the evaluation dataset to the file `predictions.csv` in `output_dir`. The script also prints the evaluation results on `validation_file` (evaluation macro F1, evaluation loss etc.).

In [None]:
# Download model from Hugging Face model repository
!python multiple_choice/run_multiple_choice.py \
  --task_name casehold \
  --model_name_or_path zlucia/legalbert \
  --data_dir data/casehold \
  --do_train \
  --do_eval \
  --evaluation_strategy steps \
  --max_seq_length 128 \
  --per_device_train_batch_size=16 \
  --learning_rate=5e-6 \
  --num_train_epochs=3.0 \
  --output_dir logs/casehold/legalbert \
  --overwrite_output_dir \
  --logging_steps 1000

In [None]:
# Assumes access to model downloaded from Google Drive
!python multiple_choice/run_multiple_choice.py \
  --task_name casehold \
  --model_name_or_path models/legalbert \
  --data_dir data/casehold \
  --do_train \
  --do_eval \
  --evaluation_strategy steps \
  --max_seq_length 128 \
  --per_device_train_batch_size=16 \
  --learning_rate=5e-6 \
  --num_train_epochs=3.0 \
  --output_dir logs/casehold/legalbert \
  --overwrite_output_dir \
  --logging_steps 1000