# Documentation

Title:
Dual LoRA Fine-Tuned LLM for Patient Monitoring System

**********************************************

Introduction

This project focuses on the efficient fine-tuning and deployment of a large language model (LLM) using LoRA/QLoRA adapters on an edge AI device. LLM can be hosted on edge device or can be offered as a cloud service with high optimisation. The system adapts a shared base LLM for 3 distinct but complementary healthcare applications, using low-rank adaptation to minimize resource demands while preserving performance.

**********************************************
Key Features

Shared Base LLM with LoRA Adapters:
- Uses a single quantized LLM backbone (4-bit).
- Deploys two task-specific LoRA/QLoRA adapters in .safetensors format.
- Enables modular loading depending on the active task, conserving memory.

Edge-Optimized Inference:
- Runs locally on Jetson Orin Nano, ensuring low-latency and offline capabilities.
- Optimized via quantization, 4-bit loading (bitsandbytes), and selective adapter loading.

Natural Language Interfaces:
- Accepts unstructured inputs via speech or text.
- Can interface with a voice front-end or mobile app for broader accessibility.

**********************************************
Technical Highlights
- LoRA/QLoRA Fine-tuning Pipeline:
Base model: LLaMA2-7B (4-bit) or similar lightweight LLM.
Task 1 adapter fine-tuned on medical symptom → condition mappings.
Task 2 adapter trained on health education data (clinical + layman terms).
Task 3 adapter trained to provide diet plan.

- Training via HuggingFace Trainer + PEFT, quantisation with memory-efficient strategies.

- Deployment with SafeTensor Adapters:
Both adapters saved in .safetensors for secure, portable inference.
Adapters are dynamically loadable on-device via PEFT.

- Memory Efficiency:
4-bit base model reduces memory by ~75%.

**********************************************
Novelty:
- LLM deployment for dual healthcare NLP tasks on an embedded device.
- Enables multi-task inference with just adapter switching — memory optimised ideal for resource-constrained settings.
- Demonstrates that LoRA + QLoRA adapters can be strategically used for modular AI on edge.
**********************************************

Core Idea:

Since the output fine-tuned model is saved as safetensors and adaptors over the base model, we can have multiple fine-tuned safetensors cum adapters accessing the same base model minimising the memory required for edge deployment.

So, we are fine-tuning one LLM model for 2 applications with 2 datasets (diagnostic agent and diet recommendation) and storing the output fine-tuned model as safetensors, adaptors.


******************************************


Overview of the Fine-Tuning Pipeline

1.1 Dataset Preparation:

1.1.1 Diagnostic agent -

- Source: FreedomIntelligence/medical-o1-reasoning-SFT
- Combines the prompt and response into a given template to train the LLM on these prompt-response pairs
- Final format = "prompt" + "response" stored under 'text' column of the dataframe, for supervised instruction tuning.

1.1.2 Health Edu Gen + One-day diet planner:

- Source: issai/LLM_for_Dietary_Recommendation_System
- Contains 11.6k rows with 50 patient data split across several cells irregularly. So, pre-processed the dataset into a customised dataframe of shape 50 x 3 [50 patients ecah having 3 columns - Patient Profile (medical history), Health Education Generation, One-day diet plan]
- Thus the dataset was cleaned into a structured df and furthur arranged by combining the prompt and response into a given template to train the LLM on these prompt-response pairs

1.2 Template for LLM Supervised Fine-tuning

1.2.1 Diagnostic:
```
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Given the patient's health symptoms generate the diagnosis based on the patient health : ...
Given this clinical presentation and history, what is the most likely diagnosis?

### Response:

```

Example:
```
<｜begin▁of▁sentence｜>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Given the patient's health symptoms generate the diagnosis based on the patient health : A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis?


### Response:
The most likely diagnosis for this 45-year-old man with a history of alcohol use, presenting with sudden onset dysarthria, shuffling gait, and intention tremors, is likely alcoholic neuropathy. While alcoholism is known to cause neuropathy, the sudden onset of symptoms may also be related to other conditions such as vitamin deficiency, especially vitamin B12 deficiency, which can produce similar symptoms. However, given the history of alcohol use and the absence of other symptoms like numbness or tingling, alcoholic neuropathy is the more likely culprit. Therefore, the most probable diagnosis in this scenario is alcoholic neuropathy.
### End<｜end▁of▁sentence｜>
```


1.2.2 Health Edu Gen:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

```
### Instruction:
You are a health and diet tips generator.
Provide dietary recommendation for this patient profile:

### Response:

```

1.2.3 One-day Diet Plan:
```
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
You are a diet tips generator.
Given this patient profile: ...
Give a specific diet plan for the day (one-day diet plan) based on the patient profile using Central Asian food.

### Response:

```



1.3 Model & Tokenizer Loading: (same for all 3)

- Model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- Loading with:

  - 4-bit quantization (QLoRA) using bitsandbytes
  - device_map="auto" for memory-aware loading across GPUs (it helps avoid overflow)
  - Tokenizer and model both loaded with a HuggingFace access token for private model access.


1.4 LoRA Adapter Setup: (same for all 3)

- prepare_model_for_kbit_training(model): enables gradient checkpointing and unfreezes input layers

- get_peft_model(...): attaches LoRA adapters for fine-tuning only a small subset of parameters

- print_trainable_parameters(): confirms minimal layers are trainable Here,
```
trainable params: 44,060,672 || all params: 8,074,321,920 || trainable%: 0.5457
```


1.5 Training with SFTTrainer (from trl): (same for all 3)

- Very lightweight: only 1000 samples
- Hyperparameters: small batch size, gradient accumulation, short steps, 8-bit optimizer, weight decay
- Output: a LoRA adapter file (.safetensors) containing only the fine-tuned weights, not the full model


**********************************************


Concepts Used

* LoRA (Low-Rank Adaptation) -	Fine-tunes only a few extra trainable matrices inserted into attention layers. Keeps base model frozen.
* QLoRA - 	Uses 4-bit quantization for the base model to reduce memory. Combined with LoRA to enable fine-tuning.
* BitsAndBytes Quantization	- Allows model weights to be loaded in 8-bit or 4-bit precision to save memory.
* PEFT (Parameter Efficient Fine-Tuning)	- Framework to plug in LoRA or other adapter methods into existing HF Transformers easily.
* SFT (Supervised Fine-Tuning)	- Standard fine-tuning where the model learns from input-output examples using full supervision.
* Gradient Accumulation -	Helps simulate larger batch sizes without exceeding memory limits.


**********************************************


How Optimized is This Setup?

1. Efficient memory usage: 4-bit quantized base model fits even on edge AI device with 8-16 GB RAM.

2. LoRA adapter training avoids full weight updates → reduces memory, compute, and saves energy.

3. Fast and Lightweight Training: Only fine-tuning adapter weights (~1-2% of model size). Uses 8-bit optimizer (adamw_8bit) and short training schedule for quick iteration.

# Imports

In [None]:
! pip install -U transformers peft trl accelerate

Collecting peft
  Downloading peft-0.15.2-py3-none-any.whl.metadata (13 kB)
Collecting trl
  Downloading trl-0.16.1-py3-none-any.whl.metadata (12 kB)
Collecting accelerate
  Downloading accelerate-1.6.0-py3-none-any.whl.metadata (19 kB)
Collecting datasets>=3.0.0 (from trl)
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets>=3.0.0->trl)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets>=3.0.0->trl)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets>=3.0.0->trl)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.12.0,>=2023.1.0 (from fsspec[http]<=2024.12.0,>=2023.1.0->datasets>=3.0.0->trl)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cud

In [None]:
! pip install transformers datasets peft bitsandbytes accelerate

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl (76.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.1/76.1 MB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.45.5


In [None]:
!pip install compressed-tensors

Collecting compressed-tensors
  Downloading compressed_tensors-0.9.3-py3-none-any.whl.metadata (7.0 kB)
Downloading compressed_tensors-0.9.3-py3-none-any.whl (98 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.4/98.4 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: compressed-tensors
Successfully installed compressed-tensors-0.9.3


In [None]:
!pip install -q -U bitsandbytes
!pip install -q -U transformers
!pip install -q -U accelerate

In [None]:
! python -m bitsandbytes

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(7, 5), cuda_version_string='124', cuda_version_tuple=(12, 4))
PyTorch settings found: CUDA_VERSION=124, Highest Compute Capability: (7, 5).
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
The directory listed in your path is found to be non-existent: /sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events
The directory listed in your path is found to be non-existent: //172.28.0.1
The directory listed in your path is found to be non-existent: 8013
The directory listed in your path is found to be non-existent: //colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/

In [None]:
hugging_face_token = ''

# 1. 8B Model for Diagnostic Agent

## Fine-tuning diagnostic agent

In [None]:
# Modules for fine-tuning
import torch # Import PyTorch
from trl import SFTTrainer # Trainer for supervised fine-tuning (SFT)
# from unsloth import is_bfloat16_supported # Checks if the hardware supports bfloat16 precision
# Hugging Face modules
from huggingface_hub import login # Lets you login to API
from transformers import TrainingArguments # Defines training hyperparameters
from datasets import load_dataset # Lets you load fine-tuning datasets

import os
from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

In [None]:
import pandas as pd
from datasets import load_dataset
from datasets import Dataset

#Load the dataset from the HuggingFace Hub
rd_ds = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", 'en')

#Convert to pandas dataframe for convenient processing
rd_df = pd.DataFrame(rd_ds['train'])

#Combine the two attributes into an instruction string
rd_df['instruction'] = "Given the patient's health symptoms generate the diagnosis based on the patient health : " + rd_df['Question']

rd_df = rd_df[['instruction', 'Complex_CoT', 'Response']]

#Get a 1000 sample subset for fine-tuning purposes
rd_df_sample = rd_df.sample(n=1000, random_state=42)

#Define template and format data into the template for supervised fine-tuning
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

{}

### Response:\n"""

rd_df_sample['prompt'] = rd_df_sample["instruction"].apply(lambda x: template.format(x))
rd_df_sample.rename(columns={'Response': 'response'}, inplace=True)
rd_df_sample['response'] = rd_df_sample['response'] + "\n### End"
rd_df_sample = rd_df_sample[['prompt', 'response']]

rd_df_sample['text'] = rd_df_sample["prompt"] + rd_df_sample["response"]
rd_df_sample.drop(columns=['prompt', 'response'], inplace=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

medical_o1_sft.json:   0%|          | 0.00/74.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25371 [00:00<?, ? examples/s]

In [None]:
rd_df_sample

Unnamed: 0,text
6460,Below is an instruction that describes a task....
18018,Below is an instruction that describes a task....
16564,Below is an instruction that describes a task....
15409,Below is an instruction that describes a task....
6798,Below is an instruction that describes a task....
...,...
7567,Below is an instruction that describes a task....
24345,Below is an instruction that describes a task....
9188,Below is an instruction that describes a task....
3393,Below is an instruction that describes a task....


In [None]:
rd_ds

DatasetDict({
    train: Dataset({
        features: ['Question', 'Complex_CoT', 'Response'],
        num_rows: 25371
    })
})

In [None]:
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
import os
from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

device = "cuda" if torch.cuda.is_available() else "cpu"

model_path = 'deepseek-ai/DeepSeek-R1-Distill-Llama-8B'

tokenizer = AutoTokenizer.from_pretrained(model_path, token=hugging_face_token, load_in_4bit=True) # Load the model in 4-bit quantization to save memory)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, device_map="auto", token=hugging_face_token)

#Pass in a prompt and infer with the model
prompt = "Q: Given the patient's health symptoms generate the diagnosis based on the patient health : A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis? \nA:"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=128
)

print(tokenizer.decode(generation_output[0]))

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-000002.safetensors:   0%|          | 0.00/7.39G [00:00<?, ?B/s]

model-00001-of-000002.safetensors:   0%|          | 0.00/8.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<｜begin▁of▁sentence｜>Q: Given the patient's health symptoms generate the diagnosis based on the patient health : A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis? 
A: Multiple Sclerosis (MS)
B: Alzheimer's Disease
C: Posterior fossa tumor
D: Cerebral Infarction
Okay, so I have this medical case to think through. A 45-year-old man with a history of alcohol use, now abstinent for 10 years. He's presenting with sudden onset dysarthria, shuffling gait, and intention tremors. I need to figure out what his diagnosis is most likely. The options are MS, Alzheimer's, posterior fossa tumor, or cerebral infarction.

First, let me break down each symptom and what they could indicate.

D


In [None]:
from peft import LoraConfig

#If only targeting attention blocks of the model
target_modules = ["q_proj", "v_proj"]

#If targeting all linear layers
target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj','lm_head']

lora_config = LoraConfig(
r=16,
target_modules = target_modules,
lora_alpha=8,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)

In [None]:
import re
model_modules = str(model.modules)
pattern = r'\((\w+)\): Linear'
linear_layer_names = re.findall(pattern, model_modules)

names = []
# Print the names of the Linear layers
for name in linear_layer_names:
    names.append(name)
target_modules = list(set(names))

In [None]:
target_modules

['lm_head',
 'o_proj',
 'gate_proj',
 'up_proj',
 'v_proj',
 'down_proj',
 'k_proj',
 'q_proj']

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Define training arguments
training_args=TrainingArguments(
    per_device_train_batch_size=2,  # Number of examples processed per device (GPU) at a time
    gradient_accumulation_steps=4,  # Accumulate gradients over 4 steps before updating weights
    num_train_epochs=1, # Full fine-tuning run
    warmup_steps=5,  # Gradually increases learning rate for the first 5 steps
    max_steps=60,  # Limits training to 60 steps (useful for debugging; increase for full fine-tuning)
    learning_rate=2e-4,  # Learning rate for weight updates (tuned for LoRA fine-tuning)
    logging_steps=10,  # Logs training progress every 10 steps
    optim="adamw_8bit",  # Uses memory-efficient AdamW optimizer in 8-bit mode
    weight_decay=0.01,  # Regularization to prevent overfitting
    lr_scheduler_type="linear",  # Uses a linear learning rate schedule
    seed=3407,  # Sets a fixed seed for reproducibility
    output_dir="outputs",  # Directory where fine-tuned model checkpoints will be saved
)


dataset = Dataset.from_pandas(rd_df_sample)
dataset_dict = DatasetDict({"train": dataset})


# Prepare for training by enabling input gradients and such
model = prepare_model_for_kbit_training(model)

# Attach LoRA adapters
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()  # Optional: shows which params will be trained

trainer = SFTTrainer(
  model=model,
  train_dataset=dataset_dict["train"],  # Dataset used for training
  args=training_args,
)

# Initiate the training process
with mlflow.start_run(run_name= 'run_name_of_choice'):
  trainer.train()

trainable params: 44,060,672 || all params: 8,074,321,920 || trainable%: 0.5457


Converting train dataset to ChatML:   0%|          | 0/1000 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/1000 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/1000 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/1000 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


NameError: name 'mlflow' is not defined

In [None]:
# Start the fine-tuning process
trainer_stats = trainer.train()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mnithya-3169[0m ([33mnithya-3169-r-v-c-e[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)


Step,Training Loss
10,2.1958
20,1.4954
30,1.2817
40,1.2187
50,1.2696
60,1.1686




In [None]:
# Start the fine-tuning process
trainer_stats = trainer.train()

In [None]:
#Pass in a prompt and infer with the model
prompt = "Q: Given the patient's health symptoms generate the diagnosis based on the patient health : A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis? \nA:"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=128
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  return fn(*args, **kwargs)


<｜begin▁of▁sentence｜>Q: Given the patient's health symptoms generate the diagnosis based on the patient health : A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis? 
A: Wilson's disease
B: Huntington's disease
C: Spinocerebellar ataxia
D: Posterior fossa tumors
E: Multiple sclerosis
F: Alcoholic cerebellar degeneration
G: Cerebral amyloidosis
H: Progressive supranuclear palsy (PSP)
I: Alzheimer's disease
J: Idiopathic parkinsonsonism
K: Acute disseminated encephalomyelitis (ADEM)
L: Spinal cord compression
M: Cerebral infarction
N: Neurocysticercosis
O: Acute alcohol withdrawal


In [None]:
instruction = "Given the patient's health symptoms generate the diagnosis based on the patient health : A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis? \n"
prompt = template.format(instruction)
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=128
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  return fn(*args, **kwargs)


<｜begin▁of▁sentence｜>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

Given the patient's health symptoms generate the diagnosis based on the patient health : A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis? 


### Response:
The most likely diagnosis for this 45-year-old man with a history of alcohol use, presenting with sudden onset dysarthria, shuffling gait, and intention tremors, is likely alcoholic neuropathy. While alcoholism is known to cause neuropathy, the sudden onset of symptoms may also be related to other conditions such as vitamin deficiency, especially vitamin B12 deficiency, which can produce similar symptoms. However, given the history of alcohol use and the absence of other symptoms like numb

In [None]:
model_name = "model_ft_diagnostic"

# Save the fine-tuned model
model.save_pretrained(model_name)

# Save the tokenizer
tokenizer.save_pretrained(model_name)


('model_ft_diagnostic/tokenizer_config.json',
 'model_ft_diagnostic/special_tokens_map.json',
 'model_ft_diagnostic/tokenizer.json')

In [None]:
from google.colab import files
import shutil

# Define the source directory of the model
model_dir = "outputs"

# Create a zip archive of the model directory
shutil.make_archive(model_dir, 'zip', model_dir)

# Download the zip file
files.download(f'{model_dir}.zip')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
from google.colab import files
import shutil

# Define the source directory of the model
model_dir = "model_ft_diagnostic"

# Create a zip archive of the model directory
shutil.make_archive(model_dir, 'zip', model_dir)

# Download the zip file
files.download(f'{model_dir}.zip')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Inference

In [None]:
# transformers                          4.51.3
# peft                                  0.15.2
# accelerate                            1.6.0
# bitsandbytes                          0.45.5
# datasets                              3.5.0

# triton                                3.2.0
# trl                                   0.16.1
# tensorflow                            2.18.0
# torch                                 2.6.0+cu124
# wandb                                 0.19.9
# safetensors                           0.5.3
# scikit-learn                          1.6.1
# nvidia-cuda-nvcc-cu12                 12.5.82

In [None]:
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
import os
from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

device = "cuda" if torch.cuda.is_available() else "cpu"
model_path = 'deepseek-ai/DeepSeek-R1-Distill-Llama-8B'
tokenizer = AutoTokenizer.from_pretrained(model_path, token=hugging_face_token, load_in_4bit=True) # Load the model in 4-bit quantization to save memory)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, device_map="auto", token=hugging_face_token)


The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-000002.safetensors:   0%|          | 0.00/8.67G [00:00<?, ?B/s]

model-00002-of-000002.safetensors:   0%|          | 0.00/7.39G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

# Load the adapter config to find the base model
peft_model_path = "model_ft_diagnostic"
config = PeftConfig.from_pretrained(peft_model_path)

# Load the model with the LoRA adapters
model = PeftModel.from_pretrained(model, peft_model_path)

# Set to eval mode
model.eval()


PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lor

In [None]:
#Define template and format data into the template for supervised fine-tuning
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

{}

### Response:\n"""

instruction = "Given the patient's health symptoms generate the diagnosis based on the patient health : A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis? \n"
prompt = template.format(instruction)
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=128, temperature=0.7,
    top_p=0.95
)

print(tokenizer.decode(generation_output[0], skip_special_tokens=True))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

Given the patient's health symptoms generate the diagnosis based on the patient health : A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis? 


### Response:
Based on the patient's clinical presentation and history, the most likely diagnosis is cerebellar dysfunction. This is particularly concerning given his history of alcohol use, as prolonged alcohol use can lead to chronic liver disease, which may cause issues in the brain such as Wernicke-Korsakoff syndrome. This syndrome involves damage to the cerebellum, which can manifest as dysarthria, shuffling gait, and intention tremors. Therefore, it is reasonable to suspect that the patient's cerebellar dysfunction is

# 2. 8B Model for Diet Tips [Together]

## Dataset Preperation

In [None]:
# Modules for fine-tuning
import torch # Import PyTorch
from trl import SFTTrainer # Trainer for supervised fine-tuning (SFT)
# from unsloth import is_bfloat16_supported # Checks if the hardware supports bfloat16 precision
# Hugging Face modules
from huggingface_hub import login # Lets you login to API
from transformers import TrainingArguments # Defines training hyperparameters
from datasets import load_dataset # Lets you load fine-tuning datasets
import pandas as pd
from datasets import Dataset
import pandas as pd
import os
from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

In [None]:
import pandas as pd
from datasets import load_dataset
from datasets import Dataset

#Load the dataset from the HuggingFace Hub
diet_ds = load_dataset("issai/LLM_for_Dietary_Recommendation_System")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/4.70k [00:00<?, ?B/s]

cases_results.zip:   0%|          | 0.00/131k [00:00<?, ?B/s]

cases_results_1.zip:   0%|          | 0.00/158k [00:00<?, ?B/s]

cases_results_1_tr.zip:   0%|          | 0.00/191k [00:00<?, ?B/s]

cases_results_2.zip:   0%|          | 0.00/159k [00:00<?, ?B/s]

cases_results_2_tr.zip:   0%|          | 0.00/182k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/11634 [00:00<?, ? examples/s]

In [None]:
diet_ds

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 11634
    })
})

In [None]:
whole_text = ""
whole_text = [whole_text + " " + row["text"] for row in diet_ds['train']]
whole_text = "".join(whole_text)

In [None]:
whole_text

' Provide dietary recommendation for this patient profile.  Young Model Staying in Shape Name: Amina Gender: Female Age: 22 Nationality: Kazakhstani Location: Almaty, Kazakhstan Family Information: Marital Status: Single Family Members: Mother, Father, one younger brother Occupation: Amina: Professional model Mother: Homemaker Father: Business owner  Cultural Background: Amina and her family follow the religion of Islam and do not consume pork in their diet. They primarily consume Kazakh traditional food with influences from Central Asian and Russian dishes. They occasionally enjoy international cuisine in Almaty\'s diverse food scene.  Lifestyle Information: Amina has an active lifestyle, participating in fashion shows, photo shoots, and model castings. She engages in regular exercise, including yoga, running, and strength training to stay in shape and maintain her fitness.  Diet History: Breakfast: Smoothie with almond milk, banana, spinach, and protein powder or overnight oats with 

In [None]:
! pip install deep-translator

Collecting deep-translator
  Downloading deep_translator-1.11.4-py3-none-any.whl.metadata (30 kB)
Downloading deep_translator-1.11.4-py3-none-any.whl (42 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/42.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.3/42.3 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: deep-translator
Successfully installed deep-translator-1.11.4


In [None]:
from deep_translator import GoogleTranslator
from tqdm import tqdm

# Your input text (replace this with your actual text)
text = whole_text

# Optional: clean up extra spaces
text = re.sub(r'\s+', ' ', text).strip()

# Break into chunks for translation (safely under token limit)
chunks = textwrap.wrap(text, width=500, break_long_words=False, replace_whitespace=False)

translated_chunks = []
for i, chunk in tqdm(enumerate(chunks)):
    try:
        translated = GoogleTranslator(source='auto', target='en').translate(chunk)
        translated_chunks.append(translated)
    except Exception as e:
        print(f"Translation error in chunk {i}: {e}")
        translated_chunks.append(chunk)  # fallback

# Rejoin translated chunks
translated_text = ' '.join(translated_chunks)

# Normalize case
translated_text_lower = translated_text.lower()

2741it [43:05,  1.06it/s]


In [None]:
translated_text_lower



In [None]:
with open("translated_text_lower.txt", "w", encoding="utf-8") as file:
    file.write(translated_text_lower)

In [None]:
import re
import pandas as pd

# Your large text blob
text = whole_text

# Define markers
prof_marker = "provide dietary recommendation for this patient profile"
# prof_marker_2 = "provide a meal for the ot patient \/ patient profile"
rec_marker = "dietary recommendations"
rec_marker_2 = "recommendations for"
plan_marker = "give a specific diet plan for the day based on the patient profile using central asian food"

# Normalize the text for easier pattern matching (case insensitive)
text_lower = text.lower()

# Split based on the recommendation marker
# split_data = re.split(f"(?=({prof_marker}|{prof_marker_2}))", text_lower)
split_data = re.split(f"(?=({prof_marker}))", text_lower)

# Clean out any accidental empty entries
split_data = [entry.strip() for entry in split_data if entry.strip()]

# Process each chunk to get the recommendation and plan parts
data_rows = []

for chunk in split_data:
    if prof_marker or prof_marker_2 in chunk:

        if rec_marker in chunk:
            prof_part, plan_part = chunk.split(rec_marker, 1)
            prof_part = prof_part.strip()
            plan_part = rec_marker + plan_part.strip()  # add the marker back
        elif rec_marker_2 in chunk:
            prof_part, plan_part = chunk.split(rec_marker_2, 1)
            prof_part = prof_part.strip()
            plan_part = rec_marker_2 + plan_part.strip()  # add the marker back
        else:
            prof_part = chunk.strip()
            plan_part = ""  # no diet plan provided yet

        # Try to split at the plan marker
        if plan_marker in plan_part:
            rec_part, plan_part = plan_part.split(plan_marker, 1)
            rec_part = rec_part.strip()
            plan_part = plan_marker + plan_part.strip()  # add the marker back
        else:
            rec_part = plan_part.strip()
            plan_part = ""  # no diet plan provided yet
        data_rows.append([prof_part, rec_part, plan_part])

# Convert to DataFrame
df = pd.DataFrame(data_rows, columns=["PatientProfile", "Recommendation", "DietPlan"])

# Show the result
print(df.head())


                                      PatientProfile  \
0  provide dietary recommendation for this patien...   
1  provide dietary recommendation for this patien...   
2  provide dietary recommendation for this patien...   
3  provide dietary recommendation for this patien...   
4  provide dietary recommendation for this patien...   

                                      Recommendation  \
0                                                      
1  dietary recommendations:  1. increase calorie ...   
2                                                      
3  dietary recommendations:  1. increase intake o...   
4                                                      

                                            DietPlan  
0                                                     
1  give a specific diet plan for the day based on...  
2                                                     
3  give a specific diet plan for the day based on...  
4                                                  

In [None]:
# remove rows with null values in any one column in df

df = df.dropna()

In [None]:
df.shape

(100, 3)

In [None]:
df

Unnamed: 0,PatientProfile,Recommendation,DietPlan
0,provide dietary recommendation for this patien...,,
1,provide dietary recommendation for this patien...,dietary recommendations: 1. increase calorie ...,give a specific diet plan for the day based on...
2,provide dietary recommendation for this patien...,,
3,provide dietary recommendation for this patien...,dietary recommendations: 1. increase intake o...,give a specific diet plan for the day based on...
4,provide dietary recommendation for this patien...,,
...,...,...,...
95,provide dietary recommendation for this patien...,dietary recommendations: 1. breakfast: opt fo...,give a specific diet plan for the day based on...
96,provide dietary recommendation for this patien...,,
97,provide dietary recommendation for this patien...,dietary recommendations: 1. maintain a well-b...,give a specific diet plan for the day based on...
98,provide dietary recommendation for this patien...,,


In [None]:
# prompt: delete all even number rows from df... 0,2,4 ... upto 100. and all rows >= 102

# Delete even-numbered rows up to 100 and rows >= 102
rows_to_delete = list(range(0, 99, 2))
df_new = df.drop(index=rows_to_delete)


In [None]:
df_new.shape

(50, 3)

In [None]:
df_new

Unnamed: 0,PatientProfile,Recommendation,DietPlan
1,provide dietary recommendation for this patien...,dietary recommendations: 1. increase calorie ...,give a specific diet plan for the day based on...
3,provide dietary recommendation for this patien...,dietary recommendations: 1. increase intake o...,give a specific diet plan for the day based on...
5,provide dietary recommendation for this patien...,dietary recommendationsfor aibek: 1. focus on...,give a specific diet plan for the day based on...
7,provide dietary recommendation for this patien...,dietary recommendationsfor sabina: 1. consume...,give a specific diet plan for the day based on...
9,provide dietary recommendation for this patien...,dietary recommendationsfor svetlana with iron ...,give a specific diet plan for the day based on...
11,provide dietary recommendation for this patien...,dietary recommendationsfor erkinbek: 1. focus...,give a specific diet plan for the day based on...
13,provide dietary recommendation for this patien...,dietary recommendationsfor li wei during her c...,give a specific diet plan for the day based on...
15,provide dietary recommendation for this patien...,dietary recommendationsfor nursultan: 1. ener...,give a specific diet plan for the day based on...
17,provide dietary recommendation for this patien...,dietary recommendationsfor mark: 1. emphasize...,give a specific diet plan for the day based on...
19,provide dietary recommendation for this patien...,dietary recommendationsfor aida with chronic h...,give a specific diet plan for the day based on...


In [None]:
df_new.to_csv("patient_profiles.csv", index=False)

## Fine-tuning

In [None]:
diet_df = pd.read_csv("patient_profiles_final.csv", names=["PatientProfile", "Recommendation", "DietPlan"], header=0)

In [None]:
diet_df.shape

(50, 3)

In [None]:
# Step 1: Create a combined instruction string
diet_df['instruction'] = (
    "You are a health and diet tips generator. You need to provide the following 2 responses. \n\n" +
    "1. Provide dietary recommendation for this patient profile: " + diet_df['PatientProfile'].astype(str) + "\n\n" +
    "2. Give a specific diet plan for the day based on the patient profile using Central Asian food."
)

# Step 2: Combine both responses into a single response string
diet_df['response'] = (
    "1. Recommendation: " + diet_df['Recommendation'].astype(str) + "\n\n" +
    "2. One-day Diet Plan: " + diet_df['DietPlan']
)

# template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

# ### Instruction:
# You are a health and diet tips generator.
# 1. Provide dietary recommendation for this patient profile: {} \n
# 2. Give a specific diet plan for the day based on the patient profile using Central Asian food.

# ### Response:
# 1. Dietary recommendation: {} \n\n
# 2. One-day diet plan: {}


# ### End
# """

# Step 3: Define the prompt template
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}
"""

# Step 4: Apply template formatting
# diet_df['formatted_prompt'] = diet_df.apply(
#     lambda row: template.format(row['PatientProfile'], row['Recommendation'], row['DietPlan']),
#     axis=1
# )
diet_df['formatted_prompt'] = diet_df.apply(
    lambda row: template.format(row['instruction'], row['response']),
    axis=1
)

In [None]:
diet_df.shape

(50, 4)

In [None]:
print(diet_df['formatted_prompt'].iloc[5])


Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
You are a health and diet tips generator. You need to provide the following 2 responses. 

1. Provide dietary recommendation for this patient profile:  recovery after long covid name: erkinbek age: 42 gender: male ethnicity: kyrgyz location: taraz, kazakhstan marital status: married, has 4 children occupation: welder medical history: long covid-19 recovery date of covid-19 diagnosis: 8 months ago symptoms: persistent fatigue, shortness of breath, brain fog, muscle weakness, loss of smell and taste current medication: none family history: has father with hypertension anthropometry and body composition: height: 175 cm weight: 83 kg bmi: 27.1 kg/m2 biochemical and hematological markers: blood pressure: 120/80 mmhg hemoglobin (hb): 13.5 g/dl white blood cells (wbc): 6,500 cells/mm³ platelet count: 200,000 cells/mm³ clinical: experienced severe respiratory symptoms dur

In [None]:
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
import os
from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

device = "cuda" if torch.cuda.is_available() else "cpu"

# model_path = 'LoftQ/Llama-2-7b-hf-4bit-64rank'
# model_path = 'TheBloke/Mistral-7B-Instruct-v0.1-GGUF'
# You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see:
model_path = 'deepseek-ai/DeepSeek-R1-Distill-Llama-8B'
# tokenizer = LlamaTokenizer.from_pretrained(model_path)
# model = LlamaForCausalLM.from_pretrained(
# model_path, load_in_8bit=True, device_map='auto',
# ).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_path, token=hugging_face_token, load_in_4bit=True) # Load the model in 4-bit quantization to save memory)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, device_map="auto", token=hugging_face_token)



tokenizer_config.json:   0%|          | 0.00/3.07k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-000002.safetensors:   0%|          | 0.00/7.39G [00:00<?, ?B/s]

model-00001-of-000002.safetensors:   0%|          | 0.00/8.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

In [None]:
#Pass in a prompt and infer with the model
instruction = " folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, chak-chak, coke environmental, behavioral, and social factors: karima lives alone with her three children. karima sometimes works night shifts. karima's busy lifestyle includes taking her children to school and kindergarten, working at a local supermarket, cooking for her children, and doing chores. karima mentions not having time for regular exercise. additional information she has been in routine check up where she was diagnosed with folate deficiency anemia. she was referred to a hematologist who recommended dietary adjustments. karima, as a tatar, frequently consumes traditional cuisine that may not provide sufficient folate. to address her condition, she seeks guidance from a dietitian to incorporate folate-rich foods into her meals. "

prompt = template.format(instruction, "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<｜begin▁of▁sentence｜>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
 folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, chak-chak, coke environmental, behavioral, and social factors:

In [None]:
import re
model_modules = str(model.modules)
pattern = r'\((\w+)\): Linear'
linear_layer_names = re.findall(pattern, model_modules)

names = []
# Print the names of the Linear layers
for name in linear_layer_names:
    names.append(name)
target_modules = list(set(names))

In [None]:
target_modules

['lm_head',
 'q_proj',
 'k_proj',
 'v_proj',
 'o_proj',
 'down_proj',
 'gate_proj',
 'up_proj']

In [None]:
from peft import LoraConfig

#If only targeting attention blocks of the model
target_modules = ["q_proj", "v_proj"]

#If targeting all linear layers
target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj','lm_head']

lora_config = LoraConfig(
r=16,
target_modules = target_modules,
lora_alpha=8,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Define training arguments
training_args=TrainingArguments(
    per_device_train_batch_size=2,  # Number of examples processed per device (GPU) at a time
    gradient_accumulation_steps=4,  # Accumulate gradients over 4 steps before updating weights
    num_train_epochs=1, # Full fine-tuning run
    warmup_steps=5,  # Gradually increases learning rate for the first 5 steps
    max_steps=45,  # Limits training to 60 steps (useful for debugging; increase for full fine-tuning)
    learning_rate=2e-4,  # Learning rate for weight updates (tuned for LoRA fine-tuning)
    logging_steps=5,  # Logs training progress every 10 steps
    optim="adamw_8bit",  # Uses memory-efficient AdamW optimizer in 8-bit mode
    weight_decay=0.01,  # Regularization to prevent overfitting
    lr_scheduler_type="linear",  # Uses a linear learning rate schedule
    seed=3407,  # Sets a fixed seed for reproducibility
    output_dir="outputs",  # Directory where fine-tuned model checkpoints will be saved
)

# trainer = SFTTrainer(
#   model=model,
#   tokenizer=tokenizer,
#   train_dataset=rd_df_sample,  # Dataset used for training
#   dataset_text_field="text",  # Specifies which field in the dataset contains training text
#   max_seq_length=256,
#   dataset_num_proc=2,  # Uses 2 CPU threads to speed up data preprocessing
#   args=training_args,
# )

formatted_diet_df = diet_df[["formatted_prompt"]].copy()
formatted_diet_df = formatted_diet_df.rename(columns={"formatted_prompt": "text"})
dataset = Dataset.from_pandas(formatted_diet_df)
dataset_dict = DatasetDict({"train": dataset})


# Prepare for training by enabling input gradients and such
model = prepare_model_for_kbit_training(model)

# Attach LoRA adapters
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()  # Optional: shows which params will be trained

trainer = SFTTrainer(
  model=model,
  train_dataset=dataset_dict["train"],  # Dataset used for training
  args=training_args,
)

# Initiate the training process
# with mlflow.start_run(run_name= 'run_name_of_choice'):
#   trainer.train()

trainable params: 44,060,672 || all params: 8,074,321,920 || trainable%: 0.5457


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Converting train dataset to ChatML:   0%|          | 0/50 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [None]:
# Start the fine-tuning process
trainer_stats = trainer.train()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmnithyashree-cs22[0m ([33mchandanamn-cs23-na[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)


Step,Training Loss
5,2.2139
10,1.9228
15,1.529
20,1.3803
25,1.2317
30,1.1479
35,1.1015
40,1.0833
45,0.9939




In [None]:
#Pass in a prompt and infer with the model
instruction = " folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, chak-chak, coke environmental, behavioral, and social factors: karima lives alone with her three children. karima sometimes works night shifts. karima's busy lifestyle includes taking her children to school and kindergarten, working at a local supermarket, cooking for her children, and doing chores. karima mentions not having time for regular exercise. additional information she has been in routine check up where she was diagnosed with folate deficiency anemia. she was referred to a hematologist who recommended dietary adjustments. karima, as a tatar, frequently consumes traditional cuisine that may not provide sufficient folate. to address her condition, she seeks guidance from a dietitian to incorporate folate-rich foods into her meals. "

prompt = template.format(instruction, "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<｜begin▁of▁sentence｜>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
 folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, chak-chak, coke environmental, behavioral, and social factors:

In [None]:
#Pass in a prompt and infer with the model
instruction = " for this 45-year-old man with a history of alcohol use, presenting with sudden onset dysarthria, shuffling gait, and intention tremors, is likely alcoholic neuropathy. While alcoholism is known to cause neuropathy, the sudden onset of symptoms may also be related to other conditions such as vitamin deficiency, especially vitamin B12 deficiency, which can produce similar symptoms. However, given the history of alcohol use and the absence of other symptoms like numbness or tingling, alcoholic neuropathy is the more likely culprit. Therefore, the most probable diagnosis in this scenario is alcoholic neuropathy "

prompt = template.format(instruction, "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<｜begin▁of▁sentence｜>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
 for this 45-year-old man with a history of alcohol use, presenting with sudden onset dysarthria, shuffling gait, and intention tremors, is likely alcoholic neuropathy. While alcoholism is known to cause neuropathy, the sudden onset of symptoms may also be related to other conditions such as vitamin deficiency, especially vitamin B12 deficiency, which can produce similar symptoms. However, given the history of alcohol use and the absence of other symptoms like numbness or tingling, alcoholic neuropathy is the more likely culprit. Therefore, the most probable diagnosis in this scenario is alcoholic neuropathy 

### Response:

1. **Provide a clear diagnosis**: Based on the information provided, the most likely diagnosis is alcoholic neuropathy. This condition is commonly associated with chronic alcohol use and can lead to symptoms such as dysart

In [None]:
model_name = "model_ft_diet"

# Save the fine-tuned model
model.save_pretrained(model_name)

# Save the tokenizer
tokenizer.save_pretrained(model_name)




('model_ft_diet/tokenizer_config.json',
 'model_ft_diet/special_tokens_map.json',
 'model_ft_diet/tokenizer.json')

In [None]:
from google.colab import files
import shutil

# Define the source directory of the model
model_dir = "model_ft_diet"

# Create a zip archive of the model directory
shutil.make_archive(model_dir, 'zip', model_dir)

# Download the zip file
files.download(f'{model_dir}.zip')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Inference

In [None]:
! unzip /content/model_ft_diet.zip

Archive:  /content/model_ft_diet.zip
  inflating: tokenizer_config.json   
  inflating: special_tokens_map.json  
  inflating: adapter_config.json     
  inflating: tokenizer.model         
  inflating: README.md               
  inflating: adapter_model.safetensors  
  inflating: tokenizer.json          


In [None]:
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
import os
from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

device = "cuda" if torch.cuda.is_available() else "cpu"
model_path = 'LoftQ/Llama-2-7b-hf-4bit-64rank'
tokenizer = AutoTokenizer.from_pretrained(model_path, token=hugging_face_token, load_in_4bit=True) # Load the model in 4-bit quantization to save memory)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, device_map="auto", token=hugging_face_token)


The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

# Load the adapter config to find the base model
peft_model_path = "model_ft_diet"
config = PeftConfig.from_pretrained(peft_model_path)

# # Load the base model
# base_model = AutoModelForCausalLM.from_pretrained(
#     config.base_model_name_or_path,
#     token=hugging_face_token,
#     load_in_4bit=True,
#     device_map="auto",  # or "cuda:0" if using GPU manually
# )

# # Load the tokenizer
# tokenizer = AutoTokenizer.from_pretrained(
#     config.base_model_name_or_path,
#     token=hugging_face_token,
#     load_in_4bit=True
# )

# Load the model with the LoRA adapters
model = PeftModel.from_pretrained(model, peft_model_path)

# Set to eval mode
model.eval()


PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(32000, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora

In [None]:
#Define template and format data into the template for supervised fine-tuning
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
You are a health and diet tips generator.
1. Provide dietary recommendation for this patient profile: {} \n
2. Give a specific diet plan for the day based on the patient profile using Central Asian food.

### Response:
1. Dietary recommendation: {} \n\n
2. One-day diet plan: {}


### End
"""

#Pass in a prompt and infer with the model
instruction = " folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, chak-chak, coke environmental, behavioral, and social factors: karima lives alone with her three children. karima sometimes works night shifts. karima's busy lifestyle includes taking her children to school and kindergarten, working at a local supermarket, cooking for her children, and doing chores. karima mentions not having time for regular exercise. additional information she has been in routine check up where she was diagnosed with folate deficiency anemia. she was referred to a hematologist who recommended dietary adjustments. karima, as a tatar, frequently consumes traditional cuisine that may not provide sufficient folate. to address her condition, she seeks guidance from a dietitian to incorporate folate-rich foods into her meals. "

prompt = template.format(instruction, "", "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=256
)

print(tokenizer.decode(generation_output[0], skip_special_tokens=True))

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
You are a health and diet tips generator.
1. Provide dietary recommendation for this patient profile:  folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken,

# 3. Separation of models

## Dietary Recommendation Fine-tuning

### Fine-tuning

In [None]:
# Modules for fine-tuning
import torch # Import PyTorch
from trl import SFTTrainer # Trainer for supervised fine-tuning (SFT)
# from unsloth import is_bfloat16_supported # Checks if the hardware supports bfloat16 precision
# Hugging Face modules
from huggingface_hub import login # Lets you login to API
from transformers import TrainingArguments # Defines training hyperparameters
from datasets import load_dataset # Lets you load fine-tuning datasets
import pandas as pd
from datasets import Dataset
import pandas as pd
import os
from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

In [None]:
diet_df = pd.read_csv("patient_profiles_final.csv", names=["PatientProfile", "Recommendation", "DietPlan"], header=0)

In [None]:
diet_df.shape

(50, 3)

In [None]:
# Step 1: Create a combined instruction string
diet_df['instruction'] = (
    "You are a health and diet tips generator. \n\n" +
    "Provide dietary recommendation for this patient profile: " + diet_df['PatientProfile'].astype(str)
)

# Step 2: Combine both responses into a single response string
diet_df['response'] = (
    "Recommendation: " + diet_df['Recommendation'].astype(str) + "\n\n"
)

# Step 3: Define the prompt template
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}
"""

diet_df['formatted_prompt'] = diet_df.apply(
    lambda row: template.format(row['instruction'], row['response']),
    axis=1
)

In [None]:
diet_df.shape

(50, 6)

In [None]:
print(diet_df['formatted_prompt'].iloc[5])


Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
You are a health and diet tips generator. 

Provide dietary recommendation for this patient profile:  recovery after long covid name: erkinbek age: 42 gender: male ethnicity: kyrgyz location: taraz, kazakhstan marital status: married, has 4 children occupation: welder medical history: long covid-19 recovery date of covid-19 diagnosis: 8 months ago symptoms: persistent fatigue, shortness of breath, brain fog, muscle weakness, loss of smell and taste current medication: none family history: has father with hypertension anthropometry and body composition: height: 175 cm weight: 83 kg bmi: 27.1 kg/m2 biochemical and hematological markers: blood pressure: 120/80 mmhg hemoglobin (hb): 13.5 g/dl white blood cells (wbc): 6,500 cells/mm³ platelet count: 200,000 cells/mm³ clinical: experienced severe respiratory symptoms during covid-19 currently in the recovery phase, grad

In [None]:
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
import os
from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

device = "cuda" if torch.cuda.is_available() else "cpu"

model_path = 'deepseek-ai/DeepSeek-R1-Distill-Llama-8B'

tokenizer = AutoTokenizer.from_pretrained(model_path, token=hugging_face_token, load_in_4bit=True) # Load the model in 4-bit quantization to save memory)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, device_map="auto", token=hugging_face_token)



tokenizer_config.json:   0%|          | 0.00/3.07k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-000002.safetensors:   0%|          | 0.00/7.39G [00:00<?, ?B/s]

model-00001-of-000002.safetensors:   0%|          | 0.00/8.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

In [None]:
#Pass in a prompt and infer with the model
instruction = " folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, chak-chak, coke environmental, behavioral, and social factors: karima lives alone with her three children. karima sometimes works night shifts. karima's busy lifestyle includes taking her children to school and kindergarten, working at a local supermarket, cooking for her children, and doing chores. karima mentions not having time for regular exercise. additional information she has been in routine check up where she was diagnosed with folate deficiency anemia. she was referred to a hematologist who recommended dietary adjustments. karima, as a tatar, frequently consumes traditional cuisine that may not provide sufficient folate. to address her condition, she seeks guidance from a dietitian to incorporate folate-rich foods into her meals. "

prompt = template.format(instruction, "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<｜begin▁of▁sentence｜>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
 folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, chak-chak, coke environmental, behavioral, and social factors:

In [None]:
import re
model_modules = str(model.modules)
pattern = r'\((\w+)\): Linear'
linear_layer_names = re.findall(pattern, model_modules)

names = []
# Print the names of the Linear layers
for name in linear_layer_names:
    names.append(name)
target_modules = list(set(names))

In [None]:
target_modules

['o_proj',
 'gate_proj',
 'up_proj',
 'down_proj',
 'v_proj',
 'lm_head',
 'q_proj',
 'k_proj']

In [None]:
from peft import LoraConfig

#If only targeting attention blocks of the model
target_modules = ["q_proj", "v_proj"]

#If targeting all linear layers
target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj','lm_head']

lora_config = LoraConfig(
r=16,
target_modules = target_modules,
lora_alpha=8,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Define training arguments
training_args=TrainingArguments(
    per_device_train_batch_size=2,  # Number of examples processed per device (GPU) at a time
    gradient_accumulation_steps=4,  # Accumulate gradients over 4 steps before updating weights
    num_train_epochs=1, # Full fine-tuning run
    warmup_steps=5,  # Gradually increases learning rate for the first 5 steps
    max_steps=45,  # Limits training to 60 steps (useful for debugging; increase for full fine-tuning)
    learning_rate=2e-4,  # Learning rate for weight updates (tuned for LoRA fine-tuning)
    logging_steps=5,  # Logs training progress every 10 steps
    optim="adamw_8bit",  # Uses memory-efficient AdamW optimizer in 8-bit mode
    weight_decay=0.01,  # Regularization to prevent overfitting
    lr_scheduler_type="linear",  # Uses a linear learning rate schedule
    seed=3407,  # Sets a fixed seed for reproducibility
    output_dir="outputs",  # Directory where fine-tuned model checkpoints will be saved
)

formatted_diet_df = diet_df[["formatted_prompt"]].copy()
formatted_diet_df = formatted_diet_df.rename(columns={"formatted_prompt": "text"})
dataset = Dataset.from_pandas(formatted_diet_df)
dataset_dict = DatasetDict({"train": dataset})


# Prepare for training by enabling input gradients and such
model = prepare_model_for_kbit_training(model)

# Attach LoRA adapters
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()  # Optional: shows which params will be trained

trainer = SFTTrainer(
  model=model,
  train_dataset=dataset_dict["train"],  # Dataset used for training
  args=training_args,
)


trainable params: 44,060,672 || all params: 8,074,321,920 || trainable%: 0.5457


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Converting train dataset to ChatML:   0%|          | 0/50 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [None]:
# Start the fine-tuning process
trainer_stats = trainer.train()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmnithyashree-cs22[0m ([33mchandanamn-cs23-na[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)


Step,Training Loss
5,2.2522
10,1.913
15,1.5213
20,1.3832
25,1.2615
30,1.1731
35,1.1241
40,1.102
45,1.0046




In [None]:
#Pass in a prompt and infer with the model
instruction = " for this 45-year-old man with a history of alcohol use, presenting with sudden onset dysarthria, shuffling gait, and intention tremors, is likely alcoholic neuropathy. While alcoholism is known to cause neuropathy, the sudden onset of symptoms may also be related to other conditions such as vitamin deficiency, especially vitamin B12 deficiency, which can produce similar symptoms. However, given the history of alcohol use and the absence of other symptoms like numbness or tingling, alcoholic neuropathy is the more likely culprit. Therefore, the most probable diagnosis in this scenario is alcoholic neuropathy "

prompt = template.format(instruction, "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  return fn(*args, **kwargs)


In [None]:
model_name = "model_ft_diet"

# Save the fine-tuned model
model.save_pretrained(model_name)

# Save the tokenizer
tokenizer.save_pretrained(model_name)




('model_ft_diet/tokenizer_config.json',
 'model_ft_diet/special_tokens_map.json',
 'model_ft_diet/tokenizer.json')

In [None]:
from google.colab import files
import shutil

# Define the source directory of the model
model_dir = "model_ft_diet"

# Create a zip archive of the model directory
shutil.make_archive(model_dir, 'zip', model_dir)

# Download the zip file
files.download(f'{model_dir}.zip')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Inference

In [None]:
#Pass in a prompt and infer with the model
instruction = (
     "You are a health and diet tips generator. \n\n" +
    "Provide dietary recommendation for this patient profile: " +
" for this 45-year-old man with a history of alcohol use, presenting with sudden onset dysarthria, shuffling gait, and intention tremors, is likely alcoholic neuropathy. While alcoholism is known to cause neuropathy, the sudden onset of symptoms may also be related to other conditions such as vitamin deficiency, especially vitamin B12 deficiency, which can produce similar symptoms. However, given the history of alcohol use and the absence of other symptoms like numbness or tingling, alcoholic neuropathy is the more likely culprit. Therefore, the most probable diagnosis in this scenario is alcoholic neuropathy "
)

prompt = template.format(instruction, "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  return fn(*args, **kwargs)


<｜begin▁of▁sentence｜>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
You are a health and diet tips generator. 

Provide dietary recommendation for this patient profile:  for this 45-year-old man with a history of alcohol use, presenting with sudden onset dysarthria, shuffling gait, and intention tremors, is likely alcoholic neuropathy. While alcoholism is known to cause neuropathy, the sudden onset of symptoms may also be related to other conditions such as vitamin deficiency, especially vitamin B12 deficiency, which can produce similar symptoms. However, given the history of alcohol use and the absence of other symptoms like numbness or tingling, alcoholic neuropathy is the more likely culprit. Therefore, the most probable diagnosis in this scenario is alcoholic neuropathy 

### Response:

Recommendation: dietary recommendations for this patient with alcoholic neuropathy:  1. maintain a balanced diet with suf

In [None]:
#Pass in a prompt and infer with the model
instruction = (
     "You are a health and diet tips generator. \n\n" +
    "Provide dietary recommendation for this patient profile: " +
" folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, chak-chak, coke environmental, behavioral, and social factors: karima lives alone with her three children. karima sometimes works night shifts. karima's busy lifestyle includes taking her children to school and kindergarten, working at a local supermarket, cooking for her children, and doing chores. karima mentions not having time for regular exercise. additional information she has been in routine check up where she was diagnosed with folate deficiency anemia. she was referred to a hematologist who recommended dietary adjustments. karima, as a tatar, frequently consumes traditional cuisine that may not provide sufficient folate. to address her condition, she seeks guidance from a dietitian to incorporate folate-rich foods into her meals. "
)

prompt = template.format(instruction, "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<｜begin▁of▁sentence｜>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
You are a health and diet tips generator. 

Provide dietary recommendation for this patient profile:  folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea

## One-day Diet Plan Fine-tuning

### Fine-tuning

In [None]:
import torch # Import PyTorch
from trl import SFTTrainer # Trainer for supervised fine-tuning (SFT)
from huggingface_hub import login # Lets you login to API
from transformers import TrainingArguments # Defines training hyperparameters
from datasets import load_dataset # Lets you load fine-tuning datasets
import pandas as pd
from datasets import Dataset
import pandas as pd
import os
from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

In [None]:
diet_df = pd.read_csv("patient_profiles_final.csv", names=["PatientProfile", "Recommendation", "DietPlan"], header=0)

In [None]:
diet_df.shape

(50, 3)

In [None]:
# Step 1: Create a combined instruction string
diet_df['instruction'] = (
    "You are a diet tips generator. \n\n" +
    "Given this patient profile: " + diet_df['PatientProfile'].astype(str) + "\n" +
    "Give a specific diet plan for the day (one-day diet plan) based on the patient profile using Central Asian food."
)

# Step 2: Combine both responses into a single response string
diet_df['response'] = (
    "One-day diet plan: " + diet_df['DietPlan'].astype(str) + "\n\n"
)


# Step 3: Define the prompt template
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}
"""

diet_df['formatted_prompt'] = diet_df.apply(
    lambda row: template.format(row['instruction'], row['response']),
    axis=1
)

In [None]:
diet_df.shape

(50, 6)

In [None]:
print(diet_df['formatted_prompt'].iloc[5])


Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
You are a diet tips generator. 

Given this patient profile:  recovery after long covid name: erkinbek age: 42 gender: male ethnicity: kyrgyz location: taraz, kazakhstan marital status: married, has 4 children occupation: welder medical history: long covid-19 recovery date of covid-19 diagnosis: 8 months ago symptoms: persistent fatigue, shortness of breath, brain fog, muscle weakness, loss of smell and taste current medication: none family history: has father with hypertension anthropometry and body composition: height: 175 cm weight: 83 kg bmi: 27.1 kg/m2 biochemical and hematological markers: blood pressure: 120/80 mmhg hemoglobin (hb): 13.5 g/dl white blood cells (wbc): 6,500 cells/mm³ platelet count: 200,000 cells/mm³ clinical: experienced severe respiratory symptoms during covid-19 currently in the recovery phase, gradually improving diet during covid-19: br

In [None]:
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
import os
from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

device = "cuda" if torch.cuda.is_available() else "cpu"

model_path = 'deepseek-ai/DeepSeek-R1-Distill-Llama-8B'

tokenizer = AutoTokenizer.from_pretrained(model_path, token=hugging_face_token, load_in_4bit=True) # Load the model in 4-bit quantization to save memory)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, device_map="auto", token=hugging_face_token)



tokenizer_config.json:   0%|          | 0.00/3.07k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-000002.safetensors:   0%|          | 0.00/7.39G [00:00<?, ?B/s]

model-00001-of-000002.safetensors:   0%|          | 0.00/8.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

In [None]:
#Pass in a prompt and infer with the model
instruction = " folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, chak-chak, coke environmental, behavioral, and social factors: karima lives alone with her three children. karima sometimes works night shifts. karima's busy lifestyle includes taking her children to school and kindergarten, working at a local supermarket, cooking for her children, and doing chores. karima mentions not having time for regular exercise. additional information she has been in routine check up where she was diagnosed with folate deficiency anemia. she was referred to a hematologist who recommended dietary adjustments. karima, as a tatar, frequently consumes traditional cuisine that may not provide sufficient folate. to address her condition, she seeks guidance from a dietitian to incorporate folate-rich foods into her meals. "

prompt = template.format(instruction, "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=256
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<｜begin▁of▁sentence｜>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
 folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, chak-chak, coke environmental, behavioral, and social factors:

In [None]:
import re
model_modules = str(model.modules)
pattern = r'\((\w+)\): Linear'
linear_layer_names = re.findall(pattern, model_modules)

names = []
# Print the names of the Linear layers
for name in linear_layer_names:
    names.append(name)
target_modules = list(set(names))

In [None]:
target_modules

['lm_head',
 'v_proj',
 'gate_proj',
 'down_proj',
 'k_proj',
 'up_proj',
 'q_proj',
 'o_proj']

In [None]:
from peft import LoraConfig

#If only targeting attention blocks of the model
target_modules = ["q_proj", "v_proj"]

#If targeting all linear layers
target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj','lm_head']

lora_config = LoraConfig(
r=16,
target_modules = target_modules,
lora_alpha=8,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Define training arguments
training_args=TrainingArguments(
    per_device_train_batch_size=2,  # Number of examples processed per device (GPU) at a time
    gradient_accumulation_steps=4,  # Accumulate gradients over 4 steps before updating weights
    num_train_epochs=1, # Full fine-tuning run
    warmup_steps=5,  # Gradually increases learning rate for the first 5 steps
    max_steps=45,  # Limits training to 60 steps (useful for debugging; increase for full fine-tuning)
    learning_rate=2e-4,  # Learning rate for weight updates (tuned for LoRA fine-tuning)
    logging_steps=5,  # Logs training progress every 10 steps
    optim="adamw_8bit",  # Uses memory-efficient AdamW optimizer in 8-bit mode
    weight_decay=0.01,  # Regularization to prevent overfitting
    lr_scheduler_type="linear",  # Uses a linear learning rate schedule
    seed=3407,  # Sets a fixed seed for reproducibility
    output_dir="outputs",  # Directory where fine-tuned model checkpoints will be saved
)


formatted_diet_df = diet_df[["formatted_prompt"]].copy()
formatted_diet_df = formatted_diet_df.rename(columns={"formatted_prompt": "text"})
dataset = Dataset.from_pandas(formatted_diet_df)
dataset_dict = DatasetDict({"train": dataset})


# Prepare for training by enabling input gradients and such
model = prepare_model_for_kbit_training(model)

# Attach LoRA adapters
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()  # Optional: shows which params will be trained

trainer = SFTTrainer(
  model=model,
  train_dataset=dataset_dict["train"],  # Dataset used for training
  args=training_args,
)


trainable params: 44,060,672 || all params: 8,074,321,920 || trainable%: 0.5457


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Converting train dataset to ChatML:   0%|          | 0/50 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [None]:
# Start the fine-tuning process
trainer_stats = trainer.train()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmnithyashree-cs22[0m ([33mchandanamn-cs23-na[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)


Step,Training Loss
5,2.5317
10,2.123
15,1.7189
20,1.5211
25,1.3077
30,1.2579
35,1.1827
40,1.1529
45,1.0638




In [None]:
#Pass in a prompt and infer with the model
instruction = (
    "You are a diet tips generator. \n\n" +
    "Given this patient profile: "  +
    " for this 45-year-old man with a history of alcohol use, presenting with sudden onset dysarthria, shuffling gait, and intention tremors, is likely alcoholic neuropathy. While alcoholism is known to cause neuropathy, the sudden onset of symptoms may also be related to other conditions such as vitamin deficiency, especially vitamin B12 deficiency, which can produce similar symptoms. However, given the history of alcohol use and the absence of other symptoms like numbness or tingling, alcoholic neuropathy is the more likely culprit. Therefore, the most probable diagnosis in this scenario is alcoholic neuropathy " + "\n"
    "Give a specific diet plan for the day (one-day diet plan) based on the patient profile using Central Asian food."
)

prompt = template.format(instruction, "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  return fn(*args, **kwargs)


In [None]:
model_name = "model_ft_diet"

# Save the fine-tuned model
model.save_pretrained(model_name)

# Save the tokenizer
tokenizer.save_pretrained(model_name)




('model_ft_diet/tokenizer_config.json',
 'model_ft_diet/special_tokens_map.json',
 'model_ft_diet/tokenizer.json')

In [None]:
from google.colab import files
import shutil

# Define the source directory of the model
model_dir = "model_ft_diet"

# Create a zip archive of the model directory
shutil.make_archive(model_dir, 'zip', model_dir)

# Download the zip file
files.download(f'{model_dir}.zip')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Inference

In [None]:
! unzip /content/model_ft_diet_separate_dietplan.zip

Archive:  /content/model_ft_diet_separate_dietplan.zip
  inflating: tokenizer_config.json   
  inflating: special_tokens_map.json  
  inflating: adapter_config.json     
  inflating: README.md               
  inflating: adapter_model.safetensors  
  inflating: tokenizer.json          


In [None]:
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
import os
from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

device = "cuda" if torch.cuda.is_available() else "cpu"
model_path = 'deepseek-ai/DeepSeek-R1-Distill-Llama-8B'
tokenizer = AutoTokenizer.from_pretrained(model_path, token=hugging_face_token, load_in_4bit=True) # Load the model in 4-bit quantization to save memory)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, device_map="auto", token=hugging_face_token)


tokenizer_config.json:   0%|          | 0.00/3.07k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-000002.safetensors:   0%|          | 0.00/8.67G [00:00<?, ?B/s]

model-00002-of-000002.safetensors:   0%|          | 0.00/7.39G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

# Load the adapter config to find the base model
peft_model_path = "model_ft_diet_separate_dietplan"
config = PeftConfig.from_pretrained(peft_model_path)

# Load the model with the LoRA adapters
model = PeftModel.from_pretrained(model, peft_model_path)

# Set to eval mode
model.eval()


PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lor

In [None]:
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}
"""

#Pass in a prompt and infer with the model
instruction = (
    "You are a diet tips generator. \n\n" +
    "Given this patient profile: "  +
    " for this 45-year-old man with a history of alcohol use, presenting with sudden onset dysarthria, shuffling gait, and intention tremors, is likely alcoholic neuropathy. While alcoholism is known to cause neuropathy, the sudden onset of symptoms may also be related to other conditions such as vitamin deficiency, especially vitamin B12 deficiency, which can produce similar symptoms. However, given the history of alcohol use and the absence of other symptoms like numbness or tingling, alcoholic neuropathy is the more likely culprit. Therefore, the most probable diagnosis in this scenario is alcoholic neuropathy " + "\n"
    "Give a specific diet plan for the day (one-day diet plan) based on the patient profile using Central Asian food."
)

prompt = template.format(instruction, "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<｜begin▁of▁sentence｜>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
You are a diet tips generator. 

Given this patient profile:  for this 45-year-old man with a history of alcohol use, presenting with sudden onset dysarthria, shuffling gait, and intention tremors, is likely alcoholic neuropathy. While alcoholism is known to cause neuropathy, the sudden onset of symptoms may also be related to other conditions such as vitamin deficiency, especially vitamin B12 deficiency, which can produce similar symptoms. However, given the history of alcohol use and the absence of other symptoms like numbness or tingling, alcoholic neuropathy is the more likely culprit. Therefore, the most probable diagnosis in this scenario is alcoholic neuropathy 
Give a specific diet plan for the day (one-day diet plan) based on the patient profile using Central Asian food.

### Response:

One-day diet plan:  breakfast: - 1 bowl of buckw

In [None]:
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}
"""

#Pass in a prompt and infer with the model
instruction = (
    "You are a diet tips generator. \n\n" +
    "Given this patient profile: "  +
" folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, chak-chak, coke environmental, behavioral, and social factors: karima lives alone with her three children. karima sometimes works night shifts. karima's busy lifestyle includes taking her children to school and kindergarten, working at a local supermarket, cooking for her children, and doing chores. karima mentions not having time for regular exercise. additional information she has been in routine check up where she was diagnosed with folate deficiency anemia. she was referred to a hematologist who recommended dietary adjustments. karima, as a tatar, frequently consumes traditional cuisine that may not provide sufficient folate. to address her condition, she seeks guidance from a dietitian to incorporate folate-rich foods into her meals. " + "\n"
    "Give a specific diet plan for the day (one-day diet plan) based on the patient profile using Central Asian food."
)

prompt = template.format(instruction, "")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<｜begin▁of▁sentence｜>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
You are a diet tips generator. 

Given this patient profile:  folate deficiency anemia name: karima age: 38 gender: female ethnicity: tatar location: atyrau, kazakhstan marital status: divorced occupation: salesman at a local supermarket medical history: diagnosis: folate deficiency anemia symptoms: weakness, fatigue past medical history: none provided family history: no information provided medication: none provided anthropometry, body composition, and functional assessment: height: 165 cm weight: 63 kg waist circumference: 80 cm biochemical and hematological markers: serum folate: 3.2 ng/ml serum iron: 65 mcg/dl serum vitamin b12: 400 pg/ml red blood cell count: 4.2 million cells/mcl mean corpuscular volume (mcv): 110 fl diet: typical foods consumed: white bread toast with butter and jam, coffee or tea, rice with chicken, plov, echpochmak, c