<a href="https://colab.research.google.com/github/huseyincavusbi/Qwen3-30b-finance-lora/blob/main/Qwen3_30b_finance_lora.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
    !pip install --no-deps unsloth



In [2]:
import torch
from unsloth import FastLanguageModel
from transformers import TrainingArguments
from trl import SFTTrainer
from datasets import load_dataset

# 1. Load the Model
max_seq_length = 2048

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-14B-unsloth-bnb-4bit",
    max_seq_length = max_seq_length,
    load_in_4bit = True,
    dtype = None, # Will default to torch.bfloat16 if available
)

# 2. Configure LoRA Adapters
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Rank of the adapters. A common choice.
    lora_alpha = 16, # A scaling factor for the adapters.
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = True,
    random_state = 42,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj",],
)

print("Unsloth model configured for 4-bit LoRA fine-tuning!")


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.7.8: Fast Qwen3 patching. Transformers: 4.53.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Unsloth 2025.7.8 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.


Unsloth model configured for 4-bit LoRA fine-tuning!


In [4]:
# Load Datasets and Merge them
from datasets import load_dataset, concatenate_datasets

def load_and_merge_finance_datasets():
    print("Loading gbharti/wealth-alpaca_lora dataset...")
    wealth_ds = load_dataset("gbharti/wealth-alpaca_lora", split="train")

    print("Loading Josephgflowers/Finance-Instruct-500k dataset...")
    finance_ds = load_dataset("Josephgflowers/Finance-Instruct-500k", split="train")

    def preprocess_wealth_alpaca(example):
        if example.get('input'):
            example['instruction'] = f"{example['instruction']}\n{example['input']}"
        return {"instruction": example["instruction"], "output": example["output"]}

    def preprocess_finance_instruct(example):
        # The output should come from the 'assistant' column in the dataset
        return {"instruction": example["user"], "output": example["assistant"]}

    wealth_ds = wealth_ds.map(preprocess_wealth_alpaca, remove_columns=wealth_ds.column_names)
    finance_ds = finance_ds.map(preprocess_finance_instruct, remove_columns=finance_ds.column_names)

    print("Merging the datasets...")
    merged_dataset = concatenate_datasets([wealth_ds, finance_ds])
    return merged_dataset

merged_dataset = load_and_merge_finance_datasets()

Loading gbharti/wealth-alpaca_lora dataset...
Loading Josephgflowers/Finance-Instruct-500k dataset...


Map:   0%|          | 0/44341 [00:00<?, ? examples/s]

Map:   0%|          | 0/518185 [00:00<?, ? examples/s]

Merging the datasets...


In [5]:
# Prepare Data for Qwen3 ChatML format

# We create a new column 'text' that contains the formatted prompt.
# SFTTrainer will then use this column for training.
def formatting_prompts_func(example):
    messages = [
        {"role": "user", "content": example["instruction"]},
        {"role": "assistant", "content": example["output"]},
    ]
    # The tokenizer formats the messages into the required ChatML string.
    # We don't tokenize here, just create the formatted text string.
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
    return { "text": text }

dataset = merged_dataset.map(formatting_prompts_func)

print("\n--- Formatted Dataset Example ---")
print(dataset[0]["text"])

Map:   0%|          | 0/562526 [00:00<?, ? examples/s]


--- Formatted Dataset Example ---
<|im_start|>user
For a car, what scams can be plotted with 0% financing vs rebate?<|im_end|>
<|im_start|>assistant
<think>

</think>

The car deal makes money 3 ways. If you pay in one lump payment. If the payment is greater than what they paid for the car, plus their expenses, they make a profit. They loan you the money. You make payments over months or years, if the total amount you pay is greater than what they paid for the car, plus their expenses, plus their finance expenses they make money. Of course the money takes years to come in, or they sell your loan to another business to get the money faster but in a smaller amount. You trade in a car and they sell it at a profit. Of course that new transaction could be a lump sum or a loan on the used car... They or course make money if you bring the car back for maintenance, or you buy lots of expensive dealer options. Some dealers wave two deals in front of you: get a 0% interest loan. These tend to b

In [6]:
# Configure LoRA and Start Training
from trl import SFTTrainer
from transformers import TrainingArguments

# --- Training Arguments ---
training_args = TrainingArguments(
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4, # Effective batch size = 2 * 4 = 8
    warmup_steps = 10,
    max_steps = 300,
    learning_rate = 2e-4,
    fp16 = not torch.cuda.is_bf16_supported(),
    bf16 = torch.cuda.is_bf16_supported(),
    logging_steps = 1,
    optim = "adamw_8bit",
    weight_decay = 0.01,
    lr_scheduler_type = "linear",
    seed = 42,
    output_dir = "outputs",
)

# --- Initialize Trainer ---
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text", # Point trainer to our formatted 'text' column
    max_seq_length = max_seq_length,
    args = training_args,
)

# --- Start Fine-tuning ---
print("Starting the fine-tuning process...")
trainer.train()
print("Fine-tuning complete!")

Unsloth: Tokenizing ["text"]:   0%|          | 0/562526 [00:00<?, ? examples/s]

Starting the fine-tuning process...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 562,526 | Num Epochs = 1 | Total steps = 300
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 64,225,280 of 14,832,532,480 (0.43% trained)
[34m[1mwandb[0m: Currently logged in as: [33mhuseyincavus[0m ([33mhuseyincavus2[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,1.6933
2,3.0788
3,2.5193
4,1.8182
5,2.5825
6,1.927
7,2.2755
8,1.4975
9,2.1297
10,1.3467




Fine-tuning complete!


In [7]:
# Inference and Saving the Model

print("\n--- Running Inference ---")
from transformers import pipeline

# Use Unsloth's fast inference pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Create a test prompt
messages = [
    {"role": "user", "content": "What are the main risks associated with investing in emerging markets?"},
]

# Get the response
outputs = pipe(messages, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.95)
print(outputs[0]['generated_text'])


Device set to use cuda:0



--- Running Inference ---
[{'role': 'user', 'content': 'What are the main risks associated with investing in emerging markets?'}, {'role': 'assistant', 'content': "<think>\n\n</think>\n\nInvesting in emerging markets can be associated with several risks, including political and regulatory risks, economic risks, currency risks, and market risks. Political and regulatory risks can arise from changes in government policies, regulations, or laws that can affect the operations of businesses. Economic risks can include factors such as inflation, unemployment, and economic downturns. Currency risks can involve exchange rate fluctuations and the potential for currency devaluation. Market risks can arise from volatility in stock prices, lack of liquidity, and market bubbles. It's important for investors to carefully assess these risks and diversify their portfolios to mitigate potential losses."}]


In [9]:
# Save the Adapters and Push to Hugging Face Hub
from huggingface_hub import notebook_login

# Save the fine-tuned LoRA adapters
print("\n--- Saving LoRA Adapters ---")
model.save_pretrained("qwen3_30b_finance_lora")
tokenizer.save_pretrained("qwen3_30b_finance_lora")
print("Model adapters saved to 'qwen3_30b_finance_lora'")

# Log in to Hugging Face Hub
notebook_login()

# Push the model adapters and tokenizer to the Hub
repo_name = "huseyincavus/qwen3-30b-finance-lora"

print(f"\n--- Pushing LoRA Adapters to Hugging Face Hub ({repo_name}) ---")
model.push_to_hub(repo_name, token = True)
tokenizer.push_to_hub(repo_name, token = True)
print("Model adapters and tokenizer pushed to Hugging Face Hub!")


--- Saving LoRA Adapters ---
Model adapters saved to 'qwen3_30b_finance_lora'


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…


--- Pushing LoRA Adapters to Hugging Face Hub (huseyincavus/qwen3-30b-finance-lora) ---


README.md:   0%|          | 0.00/593 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/257M [00:00<?, ?B/s]

Saved model to https://huggingface.co/huseyincavus/qwen3-30b-finance-lora


  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Model adapters and tokenizer pushed to Hugging Face Hub!
