[Private 1st] DACON Judgement of Court

official link (korean): https://dacon.io/competitions/official/236112/overview/description

1. Goal

This challenge aims to develop an AI that predicts legal case outcomes. The significance is a crucial step in exploring how AI can be effectively utilized in the field of law.

2. Overview & Results

The final score is 0.57258 (i.e., 1st place among 837 people).
Overview

3. Reproducibility

Install libraries for text classification models.

python3 -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt

Preprocess all training and testing samples.

python3 preprocess.py

CUDA_VISIBLE_DEVICES=0 python3 extract_embs_for_trainval.py --model google/bigbird-pegasus-large-bigpatent --tag bigbird-pegasus-large-bigpatent
CUDA_VISIBLE_DEVICES=0 python3 extract_embs_for_trainval.py --model google/rembert --tag rembert
CUDA_VISIBLE_DEVICES=0 python3 extract_embs_for_trainval.py --model microsoft/deberta-v2-xxlarge --tag deberta-v2-xxlarge
CUDA_VISIBLE_DEVICES=0 python3 extract_embs_for_trainval.py --model albert-xxlarge-v2 --tag albert-xxlarge-v2

CUDA_VISIBLE_DEVICES=0 python3 extract_embs_for_test.py --model google/bigbird-pegasus-large-bigpatent --tag bigbird-pegasus-large-bigpatent
CUDA_VISIBLE_DEVICES=0 python3 extract_embs_for_test.py --model google/rembert --tag rembert
CUDA_VISIBLE_DEVICES=0 python3 extract_embs_for_test.py --model microsoft/deberta-v2-xxlarge --tag deberta-v2-xxlarge
CUDA_VISIBLE_DEVICES=0 python3 extract_embs_for_test.py --model albert-xxlarge-v2 --tag albert-xxlarge-v2

CUDA_VISIBLE_DEVICES=0 python3 extract_embs_for_llm.py --file ./open/train.json
CUDA_VISIBLE_DEVICES=1 python3 extract_embs_for_llm.py --file ./open/test.json
python3 generate_qa_list_for_llm.py

Run text classification models (i.e., RemBeRT, ALBERT, DeBERTa, and BigBirdPegasus)
Please download pretrained weights following this link.

CUDA_VISIBLE_DEVICES=0 python3 infer_classification_models.py \
--model_names rembert,albert-xxlarge-v2,deberta-v2-xxlarge,bigbird-pegasus-large-bigpatent

Install libraries for large langage models (i.e., vicuna).

deactivate

cd llm

python3 -m venv venv
source ./venv/bin/activate

pip3 install --upgrade pip
pip3 install -e .

git lfs install
git clone https://huggingface.co/lmsys/vicuna-13b-v1.3

Run vicuna-13b-v1.3 using the dataset for few-shot learning.

python3 -m fastchat.serve.controller
python3 -m fastchat.serve.model_worker --model-path vicuna-13b-v1.3 --port 21002
python3 -m run_llm \
--controller-address "http://localhost:21001" --model-name vicuna-13b-v1.3 \
--temperature 0.001 --max-new-tokens 100

Produce the final result by unifying two results from classification and language models.

python3 ensemble_all_results.py

4. Training

Train four classification models.

CUDA_VISIBLE_DEVICES=0 python3 train.py --model bigbird-pegasus-large-bigpatent
CUDA_VISIBLE_DEVICES=0 python3 train.py --model rembert
CUDA_VISIBLE_DEVICES=0 python3 train.py --model deberta-v2-xxlarge
CUDA_VISIBLE_DEVICES=0 python3 train.py --model albert-xxlarge-v2

5. Acknowledgement

Thanks to the authors of Vicuna-13B-v1.3 used in this respository.

If you have any question or find any bug, please email me.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
core		core
figures		figures
llm		llm
open		open
submissions		submissions
weights		weights
.gitignore		.gitignore
README.md		README.md
Summary_KO.ipynb		Summary_KO.ipynb
ensemble_all_results.py		ensemble_all_results.py
extract_embs_for_llm.py		extract_embs_for_llm.py
extract_embs_for_test.py		extract_embs_for_test.py
extract_embs_for_trainval.py		extract_embs_for_trainval.py
generate_qa_list_for_llm.py		generate_qa_list_for_llm.py
infer_classification_models.py		infer_classification_models.py
preprocess.py		preprocess.py
quantify_reproducibility.py		quantify_reproducibility.py
requirements.txt		requirements.txt
train.py		train.py

shjo-april/DACON_Judgement_of_Court_1st_Solution

Folders and files

Latest commit

History

Repository files navigation

[Private 1st] DACON Judgement of Court

1. Goal

2. Overview & Results

3. Reproducibility

4. Training

5. Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages