Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4

Resources

🚀 Preprint
Competition CodaLab Page
Dataset paper
Task Github Repo
Our pretrained adapters (refer to the paper, these adapters would perform best when merged):
- LM adapter
- Triplet adapter

TL;DR

Clinical trials are conducted to assess the effectiveness and safety of new treatments. Clinical Trial Reports (CTR), outline the methodology and findings of a clinical trial, and they are used to design and prescribe experimental treatments. The application of LLMs in critical domains, such as real-world clinical trials, requires further investigations accompanied by the development of novel evaluation methodologies grounded in a more systematic behavioural and causal analyses.

This second iteration is intended to ground NLI4CT in interventional and causal analyses of NLI models, enriching the original NLI4CT dataset with a novel contrast set, developed through the application of a set of interventions on the statements in the NLI4CT test set.

Solution workflow

Without fine-tuning

With fine-tuning

Experiment codes

Environment setup

conda env create -f environment.yml
conda activate semeval_nli4ct

RQ 1: Can zero-shot LLMs perform well?

python scripts/train.py experiment=0_shot/llama2_7b_chat
python scripts/train.py experiment=0_shot/llama2_13b_chat
python scripts/train.py experiment=0_shot/mistral_7b_instruct
python scripts/train.py experiment=0_shot/mistrallite_7b

To run the GPT-4 experiment, use the scripts/notebooks/gpt4_inference.ipynb. You have to add your own OPENAI_API_KEY as an environment variable.

RQ 2: Can smaller LLMs perform on par with GPT-4 with prompting strategies?

In-Context Learning

# 1-shot CoT experiments
python scripts/train.py experiment=1_shot/mistral_7b_instruct retriever=bm25
python scripts/train.py experiment=1_shot/llama2_7b_chat retriever=bm25
python scripts/train.py experiment=1_shot/llama2_13b_chat retriever=bm25
python scripts/train.py experiment=1_shot/mistrallite_7b retriever=bm25

# 2-shot CoT experiments
python scripts/train.py experiment=2_shot/mistral_7b_instruct retriever=bm25
python scripts/train.py experiment=2_shot/llama2_7b_chat retriever=bm25
python scripts/train.py experiment=2_shot/llama2_13b_chat retriever=bm25
python scripts/train.py experiment=2_shot/mistrallite_7b retriever=bm25

Chain-of-Thoughts

python scripts/train.py experiment=cot_0_shot/mistral_7b_instruct
python scripts/train.py experiment=cot_0_shot/llama2_7b_chat
python scripts/train.py experiment=cot_0_shot/llama2_13b_chat
python scripts/train.py experiment=cot_0_shot/mistrallite_7b

In-Context Learning + Chain-of-Thoughts

# Create the ChatGPT explanation
# Replace AZURE_OPENAI_KEY and AZURE_OPENAI_ENDPOINT accordingly
python scripts/generate_cot_chatgpt.py

# 1-shot CoT experiments
python scripts/train.py experiment=cot_1_shot/mistral_7b_instruct retriever=bm25
python scripts/train.py experiment=cot_1_shot/llama2_7b_chat retriever=bm25
python scripts/train.py experiment=cot_1_shot/llama2_13b_chat retriever=bm25
python scripts/train.py experiment=cot_1_shot/mistrallite_7b retriever=bm25

# 2-shot CoT experiments
python scripts/train.py experiment=cot_2_shot/mistral_7b_instruct retriever=bm25
python scripts/train.py experiment=cot_2_shot/llama2_7b_chat retriever=bm25
python scripts/train.py experiment=cot_2_shot/llama2_13b_chat retriever=bm25
python scripts/train.py experiment=cot_2_shot/mistrallite_7b retriever=bm25

RQ 3: Can fine-tuned smaller LLMs perform on par with GPT-4?

Can LoRA fine-tuning improve the performance of LLMs?

python scripts/train.py experiment=fine_tune/llama2_7b_chat
python scripts/train.py experiment=fine_tune/llama2_13b_chat
python scripts/train.py experiment=fine_tune/meditron_7b
python scripts/train.py experiment=fine_tune/mistral_7b_instruct
python scripts/train.py experiment=fine_tune/mistrallite_7b

Can merging LoRA adapters improve the performance of LLMs?

# Train the triplet loss adapter separately
python scripts/fine_tune_contrastive_learning.py experiment=fine_tune/mistrallite_7b

python scripts/train.py experiment=pretrained_0_shot/mistrallite_7b_contrastive_common_avg

To run the pretrained_0_shot/mistrallite_7b_contrastive_common_avg experiment, you can download our pretrained LoRA adapters:

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
configs		configs
docs		docs
in_context_examples		in_context_examples
k8s_runs		k8s_runs
scripts		scripts
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4

Resources

TL;DR

Solution workflow

Without fine-tuning

With fine-tuning

Experiment codes

Environment setup

RQ 1: Can zero-shot LLMs perform well?

RQ 2: Can smaller LLMs perform on par with GPT-4 with prompting strategies?

In-Context Learning

Chain-of-Thoughts

In-Context Learning + Chain-of-Thoughts

RQ 3: Can fine-tuned smaller LLMs perform on par with GPT-4?

Can LoRA fine-tuning improve the performance of LLMs?

Can merging LoRA adapters improve the performance of LLMs?

About

Releases

Packages

Languages

EdinburghClinicalNLP/semeval_nli4ct

Folders and files

Latest commit

History

Repository files navigation

Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4

Resources

TL;DR

Solution workflow

Without fine-tuning

With fine-tuning

Experiment codes

Environment setup

RQ 1: Can zero-shot LLMs perform well?

RQ 2: Can smaller LLMs perform on par with GPT-4 with prompting strategies?

In-Context Learning

Chain-of-Thoughts

In-Context Learning + Chain-of-Thoughts

RQ 3: Can fine-tuned smaller LLMs perform on par with GPT-4?

Can LoRA fine-tuning improve the performance of LLMs?

Can merging LoRA adapters improve the performance of LLMs?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages