<h1>Fine-Tuning LLaMA-2 with QLoRA for Disease-Drug Recommendations</h1>
In this project, I fine-tuned the LLaMA-2 language model using QLoRA (Quantized Low-Rank Adaptation) to improve performance on a disease-drug recommendation task. Below, I explain the methodology, the QLoRA approach, and the dataset used.

<h1>1. Project Objective</h1>
The goal of this fine-tuning task is to train a large language model (LLM) to generate accurate drug recommendations based on given diseases. The dataset is structured to include instructions, input disease names, and corresponding drug outputs.

<h1>2. Dataset</h1>
The dataset follows an instruction-tuning format, which is essential for fine-tuning models like LLaMA-2 to align them with specific tasks.

<h2>Sample Data Structure</h2>

Each data point includes:

* **Instruction:** A guiding statement for the model.

* **Input:** A disease-related query.

* **Output:** The recommended drug.

**Example Entries:**

```json
{
  "instruction": "Please specify the recommended drug for the given disease.",
  "input": "Disease: Melanoma. What is the recommended drug?",
  "output": "talimogene laherparepvec"
},
{
  "instruction": "What is the standard drug treatment for this disease?",
  "input": "Disease: Pharyngitis. What is the recommended drug?",
  "output": "ampicillin"
},
{
  "instruction": "Please specify the recommended drug for the given disease.",
  "input": "Disease: Cough. What is the recommended drug?",
  "output": "dextromethorphan / guaifenesin"
}
```
This structure allows the model to learn to generate contextually appropriate responses for disease-drug queries.

<h1>3. What is QLoRA?</h1>
QLoRA (Quantized Low-Rank Adaptation) is a parameter-efficient fine-tuning method that enables the fine-tuning of large language models (LLMs) on low-resource hardware without requiring the full model to be updated.

<h2>QLoRA Steps:</h2>

**1. 4-bit Quantization:**
* The base model (e.g., LLaMA-2) is loaded in a 4-bit precision format using the bitsandbytes library.
* This drastically reduces memory usage, making it feasible to fine-tune LLMs on consumer GPUs.

**2. Low-Rank Adaptation (LoRA):**
* Instead of updating all model parameters, LoRA introduces low-rank matrices to approximate the weight updates in transformer layers.
* The rank (r) and scaling factor (a) control the LoRA updates.

**3. Efficient Backpropagation:**
* Gradients are applied only to the LoRA parameters, while the base model remains frozen.

<h1>4. Fine-Tuning Workflow</h1>
The fine-tuning process involded the following steps:

<h2>Step 1: Environment Setup</h2>

* **transformers:** For loading and training the model.

* **peft:** For parameter efficient fine-tuning (QLoRA).

* **bitsandbytes:** For 4-bit quantization.

* **datasets:** For processing the dataset.

* **accelerate:** For efficient distrubuted training.

<h2>Step 2: Loading the Dataset</h2>

The dataset was loaded in Hugging Face format and preprocessed into prompts:
```python
f"### Instruction: {instruction}\n### Input: {input}\n### Response: {output}"
```

<h2>Step 3: Model and QLoRA Configuration</h2>

* **Model:**
The LLaMA-2-7B model was loaded using 4-bit quantization:

```python
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", load_in_4bit=True)
````

* **LoRA Configuration:**
LoRA was applied to the attention layers (q_proj, k_proj, v_proj, o_proj)

<h2>Step 4: Training</h2>
Training was configured using TrainingArguments with:

* 3 epochs
* Batch size of 4
* Gradient accumulation to simulate larger batch size
* Cosine learning rate scehduler


<h1>5. Generating Predictions</h1>
After the fine-tuning, I tested the model with example queries to generate disease-drug recommendations:

```python
def generate_response(prompt, model, tokenizer, max_length=256):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_length=max_length, temperature=0.8, top_p=0.9)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example query
prompt = "Disease: Diabetes. What is the recommended drug?"
response = generate_response(prompt, model, tokenizer)
print(response)
```


<h1>6. Results and Applications</h1>
The fine-tuned LLaMA-2 model can now respond to disease-related queries and provide drug recommendations based on its fine-tuned knowledge.

**Example Interaction:**
* **Input:** "Disease: Melanoma. What is the recommended drug?"
* **Output:** "talimogene laherparepvec"

<h1>7. Key Benefits of QLoRA</h1>

* **Memory Efficiency:** Using 4-bit quantization enables fine-tuning on limited hardware resources.
* **Parameter Efficiency:** LoRA updates only a fraction of the model parameters, reducing computational cost.
* **Domain Adaptation:** The model can specialize in a specific domain (e.g., disease-drug recommendations) with minimal effort.

This project successfully fine-tuned LLaMA-2 using QLoRA for disease-drug recommendation tasks, demonstrating the ability of large language models to adapt to specialized medical datasets efficiently.

<h1>Environment Setup</h1>


**Objective:**
This block installs the required Python libraries to set up the environment. These libraries are essential for fine-tuning the model using QLoRA.

**Key Libraries:**
- `transformers`: For model loading and fine-tuning.
- `peft`: For parameter-efficient fine-tuning with LoRA.
- `bitsandbytes`: To enable 4-bit quantization of the model.
- `accelerate`: For efficient distributed training.
- `datasets`: To load and preprocess the dataset.
- `trl`: For training large language models with Reinforcement Learning from Human Feedback (RLHF).

**Expected Outcome:**
The required libraries are installed successfully, and the environment is ready for model fine-tuning.

In [None]:
!pip install -q transformers==4.36.2
!pip install -q peft==0.7.1
!pip install -q bitsandbytes==0.41.3
!pip install -q accelerate==0.25.0
!pip install -q datasets==2.15.0
!pip install -q trl==0.7.4

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.8/126.8 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.2/8.2 MB[0m [31m80.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m106.0 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sentence-transformers 3.2.1 requires transformers<5.0.0,>=4.41.0, but you have transformers 4.36.2 which is incompatible.[0m[31m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.3/168.3 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m23.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m5

<h1>Login to Hugging Face</h1>

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: fineGrained).
The token `last-token` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `last-to

<h1>Load Dataset</h1>

**Objective:**
To load and convert the dataset into a format compatible with Hugging Face tools for easy preprocessing and training.

**Dataset Details:**
- Each entry in the dataset contains three fields: `instruction`, `input`, and `output`.
- The dataset is structured to support instruction-based fine-tuning of the LLaMA-2 model.

**Steps:**
1. Load the JSON dataset containing disease-drug mappings.
2. Convert it into a Hugging Face `Dataset` object for seamless integration with the `transformers` library.

**Expected Outcome:**
The dataset is successfully loaded and converted into the required format, ready for further preprocessing and tokenization.

In [None]:
import torch
from datasets import Dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
    DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model
import json

# Load your dataset
with open('/content/converted_dataset.json', 'r') as f:
    data = json.load(f)

# Convert to HuggingFace dataset format
dataset = Dataset.from_dict({
    'instruction': [item['instruction'] for item in data],
    'input': [item['input'] for item in data],
    'output': [item['output'] for item in data]
})

# Quantization configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

trainable params: 16,777,216 || all params: 6,755,192,832 || trainable%: 0.24836028248556738


<h1>Dataset Preporcessing</h1>

In [None]:
def format_prompt(instruction, input_text, output):
    return f"""### Instruction: {instruction}
### Input: {input_text}
### Response: {output}"""

# Define prompt formatting
def preprocess_function(examples):
    prompts = [format_prompt(i, t, o) for i, t, o in zip(
        examples['instruction'],
        examples['input'],
        examples['output']
    )]

    tokenizer = AutoTokenizer.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    padding_side="right",
    add_eos_token=True
    )

    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"
    tokenized = tokenizer(
        prompts,
        truncation=True,
        max_length=512,
        padding="max_length",
        return_tensors="pt"
    )


    return tokenized

# Tokenize and preprocess the dataset
tokenized_dataset = dataset.map(
    preprocess_function,
    batched=True,
    remove_columns=dataset.column_names
)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

<h1>Model and LoRA Configuration</h1>

**Objective:**
To configure and prepare the LLaMA-2 model with LoRA settings for efficient fine-tuning.

**Key Components:**
1. **Tokenization:**
   - The tokenizer processes input text into tokens that can be fed into the model.
2. **Model Loading:**
   - The LLaMA-2-7b model is loaded in 4-bit precision to reduce memory usage.
3. **LoRA Configuration:**
   - `r`: Rank of the low-rank matrices.
   - `lora_alpha`: Scaling factor for LoRA updates.
   - `target_modules`: Transformer layers to which LoRA is applied.
   - `lora_dropout`: Dropout rate to prevent overfitting.

**Expected Outcome:**
The model and tokenizer are successfully configured with 4-bit quantization and LoRA settings, enabling memory-efficient fine-tuning.


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, DataCollatorForLanguageModeling
from peft import LoraConfig, get_peft_model

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    padding_side="right",
    add_eos_token=True
)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    load_in_8bit=True,
    device_map="auto"
)

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

<h1>Training Arguments and Training</h1>

**Objective:**
To define training arguments and initiate the fine-tuning process.

**Training Parameters:**
- `num_train_epochs`: Number of training iterations.
- `per_device_train_batch_size`: Batch size for each GPU.
- `gradient_accumulation_steps`: Accumulates gradients over multiple steps to simulate larger batch sizes.
- `learning_rate`: The rate at which model weights are updated.
- `fp16`: Enables 16-bit floating-point precision for faster training.
- `lr_scheduler_type`: Specifies the learning rate decay schedule.

**Trainer Class:**
The `Trainer` class automates the training process by integrating the model, dataset, and training configuration.

**Expected Outcome:**
The model is fine-tuned on the instruction-based dataset, adapting its parameters for disease-drug recommendation tasks.


In [None]:
training_args = TrainingArguments(
    output_dir="./llama2-medical-finetuned",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    save_steps=100,
    logging_steps=10,
    max_steps=1000,
    optim="paged_adamw_8bit",
    lr_scheduler_type="cosine",
    warmup_ratio=0.05,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

# Start training
trainer.train()


[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,2.5617
20,2.0379
30,0.8761
40,0.4028
50,0.3544
60,0.3493
70,0.33
80,0.3026
90,0.3028
100,0.2975


TrainOutput(global_step=1000, training_loss=0.25289148378372195, metrics={'train_runtime': 3295.8221, 'train_samples_per_second': 4.855, 'train_steps_per_second': 0.303, 'total_flos': 3.25588787134464e+17, 'train_loss': 0.25289148378372195, 'epoch': 16.0})

<h1>Generate Predictions</h1>

**Objective:**
To generate predictions using the fine-tuned model for specific disease-related queries.

**Key Steps:**
1. Tokenize the input query using the pre-trained tokenizer.
2. Use the fine-tuned model to generate a response based on the input tokens.
3. Decode the output tokens back into human-readable text.

**Expected Outcome:**
The model produces a relevant drug recommendation for the given disease query.

In [None]:
def generate_response(prompt, model, tokenizer, max_length=256):
    inputs = tokenizer(prompt, return_tensors="pt")
    inputs = {k: v.cuda() for k, v in inputs.items()}

    outputs = model.generate(
        input_ids=inputs["input_ids"],
        max_length=max_length,
        num_return_sequences=1,
        temperature=0.3,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id,
        early_stopping=True,
        no_repeat_ngram_size=3,
        top_p=0.9,
        repetition_penalty=1.2
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Example usage
prompt = "Disease: Diabetes, Type 2. What is the recommended drug?"
response = generate_response(prompt, model, tokenizer)
print(response)



Disease: Diabetes, Drug: Metformin. What is this drug used for? Please select the recommended use for metformin .
What is the recommended dosage for diabetes?
The recommended adult dose of metformine hydrochloride is 1 to 2 g twice daily or 1 g three times a day. For patients with severe renal impairment (creatinine clearance <30 mL/min), the recommended initial dose is 500 mg twice daily. Dosage adjustments may be necessary in clinical situations where fluid retention and/or hypoglycemia may occur. If treatment with metformina hydroclorida results in hypogliestemia, increase the dose by not more than 5 mg at a time until a satisfactory response is obtained. If after a satisfactorily adequate trial of metormina hydrochlora it is still necessary to administer insulin, consider using rapid acting insulins (e.g., regular).


<h1>Save and Load Model</h1>

**Objective:**
To save the fine-tuned model and tokenizer for future use and reload them when needed.

**Steps:**
1. Save the model and tokenizer to the specified directory.
2. Reload them later for inference or further fine-tuning.

**Expected Outcome:**
The fine-tuned model and tokenizer are successfully saved and can be reloaded without loss of functionality.


In [None]:
def save_model_and_tokenizer(model, tokenizer, save_path):
    # Model kaydetme
    model.save_pretrained(save_path + "/model")
    # Tokenizer kaydetme
    tokenizer.save_pretrained(save_path + "/tokenizer")

def load_model_and_tokenizer(load_path):
    from transformers import AutoModelForCausalLM, AutoTokenizer

    model = AutoModelForCausalLM.from_pretrained(load_path + "/model")
    model = model.cuda()

    tokenizer = AutoTokenizer.from_pretrained(load_path + "/tokenizer")

    return model, tokenizer

save_path = "/home/user/saved_model"
save_model_and_tokenizer(model, tokenizer, save_path)

model, tokenizer = load_model_and_tokenizer(save_path)

prompt = "Disease: Diabetes, Drug: Metformin. What is this drug used for?"
response = generate_response(prompt, model, tokenizer)
print(response)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Disease: Diabetes, Drug: Metformin. What is this drug used for? Please select the recommended condition for which
What is this medication prescribed for? please select the appropriate disease. what is the recommended drug? metformin...
Which of the following drugs would be most appropriate to treat diabetic ketoacidosis? a. insulin b. glucose c....
What are the standard drugs used to treat type 2 diabetes (non-insulin)? please list all relevant drugs. what ...


<h1>Save Configuration</h1>

In [None]:
def save_full_model(model, tokenizer, config, save_path):
    model.save_pretrained(save_path + "/model")
    tokenizer.save_pretrained(save_path + "/tokenizer")
    config.save_pretrained(save_path + "/config")

def load_full_model(load_path):
    from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig

    config = AutoConfig.from_pretrained(load_path + "/config")
    model = AutoModelForCausalLM.from_pretrained(load_path + "/model", config=config)
    model = model.cuda()
    tokenizer = AutoTokenizer.from_pretrained(load_path + "/tokenizer")

    return model, tokenizer, config

<h1>Google Drive Integration</h1>

In [None]:
from google.colab import drive
drive.mount('/content/drive')

save_path = "/content/drive/MyDrive/saved_model"  # Drive'da kaydetmek istediğiniz konum
save_model_and_tokenizer(model, tokenizer, save_path)

Mounted at /content/drive


