## Step 1: Mounting Google Drive

In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Navigate to the repo folder
%cd /content/drive/MyDrive/llm-finetuning-project/llm-finetuning-summarizer

# List repo contents
!ls

Mounted at /content/drive
/content/drive/MyDrive/llm-finetuning-project/llm-finetuning-summarizer
data		       gpt4o_judgments.json  notebooks	      README.md  wandb
deployment	       LICENSE		     project_plan.md  results
eval_predictions.json  models		     qa_pairs	      scripts


## Step 2: Importing Libraries and Setting Output Path

In [2]:
import os
import json
from pathlib import Path
import re
from google.colab import files
import pandas as pd
import sys

In [3]:
sys.path.append("./scripts")
from qa_utils import generate_QA_pair
from qa_utils_with_context import generate_QA_pair_with_context

In [5]:
BASE_DIR = "/content/drive/MyDrive/llm-finetuning-project/llm-finetuning-summarizer"
PDF_DIR = os.path.join(BASE_DIR, "data", "Evaluation_PDFs")
QA_DIR = os.path.join(BASE_DIR, "qa_pairs", "qa_pairs_eval")
QA_DIR_WITH_CONTEXT = os.path.join(BASE_DIR, "qa_pairs", "qa_pairs_eval_with_context")
os.makedirs(QA_DIR, exist_ok=True)
os.makedirs(QA_DIR_WITH_CONTEXT, exist_ok=True)

## Step 3: Generating QA pairs

### **LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation**

**Abstract**

Low-Rank Adaptation (LoRA) has emerged as a popular parameter-
efficient fine-tuning (PEFT) method for Large Language Models (LLMs),
yet it still incurs notable overhead and suffers from parameter interference in multi-task scenarios. We propose LoRA with Reduced Interference (LoRI), a simple yet effective approach that freezes the projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance. Moreover, LoRI minimizes cross-task interference in adapter merging by leveraging the orthogonality between adapter subspaces, and supports continual learning by using sparsity to mitigate catastrophic forgetting. Extensive experiments across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using up to 95% fewer trainable parameters than LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference.

**Introduction**

Large language models (LLMs) have transformed deep learning, showcasing remarkable capabilities across various domains. However, their deployment remains computationally demanding, particularly when fine-tuning is required to adapt to downstream tasks or align with human preferences. To mitigate the high resource costs, researchers have developed a range of parameter-efficient fine-tuning (PEFT) techniques. Among these techniques, LoRA has gained widespread adoption. Nevertheless, LoRA still introduces notable memory overhead, particularly in large-scale models. Consequently, recent research has focused on further optimizing LoRA by reducing the number of trainable parameters without compromising performance.

Recent studies have shown that delta parameters – the
differences between fine-tuned and pretrained model weights – exhibit significant redundancy. Motivated by the effectiveness of random projections and the observed redundancy in delta parameters, we propose LoRA with Reduced Interference (LoRI). LoRI keeps the low-rank matrices A fixed as random projections, while training the matrices B using task-specific sparse masks. To retain the most critical elements of B, LoRI performs a calibration process
to extract sparse masks by selecting the highest-magnitude elements across all layers and projections. As shown in Figure 1(a), LoRI maintains performance even with 90% sparsity in B while keeping A frozen. This demonstrates that adaptation does not require updating A, and that B has considerable redundancy. By applying more constrained updates than LoRA, LoRI significantly reduces the number of trainable parameters while better preserving the pretrained model’s knowledge during adaptation.

Multi-task learning is essential for enabling versatile models with multi-task capabilities, which is traditionally performed via joint training on a combination of task-specific datasets. However, training large models on this data mixture is prohibitively expensive in terms of time and compute. Model merging is a training-free alternative for building powerful models by combining existing ones. This approach is well-suited for merging LoRA adapters to enable multi-task capabilities within a single LoRA. However, as shown in Figure 1(b), directly merging heterogeneous LoRAs often results in parameter interference, leading to degraded performance in the merged LoRA compared to single-task LoRAs. Additionally, many existing merging methods require trial-and-error to identify the optimal method for a specific combination of tasks. LoRI tackles these challenges by enabling adapter merging without manual selection of merging methods. By using fixed, randomly initialized projection A, LoRI maps task-specific adapters into approximately orthogonal subspaces, thereby reducing interference when merging multiple LoRIs.

Beyond multi-tasking, safety-critical scenarios require that each newly introduced adapter enhances model capabilities while preserving the safety alignment of the pretrained base model. LoRI provides a lightweight continual learning approach for adapting models while preserving safety, where training is performed sequentially across tasks. The strategy involves first fine-tuning an adapter on safety data to establish alignment, followed by separate adaptation to each downstream task. However, as illustrated in Figure 1(c), continual learning often leads to catastrophic forgetting, wherein the adaptation to new tasks substantially compromises previously acquired knowledge. LoRI mitigates forgetting by leveraging the sparsity of matrices B through task-specific masks. This isolation of parameter updates across tasks facilitates continual learning with minimal interference, preserving both safety and task effectiveness.

To evaluate the effectiveness of LoRI, we conduct extensive experiments across a diverse suite of benchmarks spanning natural language understanding (NLU), mathematical reasoning, code generation, and safety alignment tasks. Using Llama-3-8B and Mistral-7B as base models, our results show that LoRI achieves performance comparable to – or better than – full fine-tuning (FFT), LoRA, and other PEFT methods, while using up to 95% fewer trainable parameters than LoRA. Notably, LoRI with 90% sparsity in B surpasses LoRA by
17.3% on HumanEval with Llama-3. Beyond single-task adaptation, we evaluate LoRI in multi-task settings, including adapter merging and continual learning scenarios. Concatenated merging of LoRI adapters consistently outperforms LoRA adapters overall, closely matching the performance of single-task LoRA baseline. In continual learning, LoRI significantly outperforms LoRA in mitigating catastrophic forgetting of safety alignment, while maintaining strong performance on downstream tasks.

**Conclusion**

In this work, we introduced LoRI, a simple yet effective approach to parameter-efficient fine-tuning (PEFT) that substantially reduces trainable parameters while minimizing cross-task interference. By freezing the projection matrices A as random projections and sparsifying B using task-specific masks, LoRI achieves strong single-task performance across
diverse domains – including natural language understanding, mathematical reasoning, code generation, and safety alignment – while reducing trainable parameters by up to 95% compared to LoRA. Furthermore, LoRI enables training-free adapter merging with minimal performance degradation, and supports continual learning with significantly reduced catastrophic forgetting. It also provides a lightweight approach to building safety adapters that preserve the safety alignment of the base model.

Future Work. We identify several promising avenues for extending this work. While LoRI currently leverages unstructured magnitude-based sparsity, future research can explore structured sparsity patterns – such as block sparsity, head pruning, or group-wise masking – which may offer better hardware compatibility. Additionally, although this study focuses on LLMs, the core design of LoRI is modality-agnostic. Extending LoRI to diffusion and vision-language models for multi-modal generation is a promising direction, given the growing impact of adapter-based fine-tuning.


In [6]:
context = """
Abstract:

Low-Rank Adaptation (LoRA) has emerged as a popular parameter- efficient fine-tuning (PEFT) method for Large Language Models (LLMs), yet it still incurs notable overhead and suffers from parameter interference in multi-task scenarios. We propose LoRA with Reduced Interference (LoRI), a simple yet effective approach that freezes the projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance. Moreover, LoRI minimizes cross-task interference in adapter merging by leveraging the orthogonality between adapter subspaces, and supports continual learning by using sparsity to mitigate catastrophic forgetting. Extensive experiments across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using up to 95% fewer trainable parameters than LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference.

Introduction:

Large language models (LLMs) have transformed deep learning, showcasing remarkable capabilities across various domains. However, their deployment remains computationally demanding, particularly when fine-tuning is required to adapt to downstream tasks or align with human preferences. To mitigate the high resource costs, researchers have developed a range of parameter-efficient fine-tuning (PEFT) techniques. Among these techniques, LoRA has gained widespread adoption. Nevertheless, LoRA still introduces notable memory overhead, particularly in large-scale models. Consequently, recent research has focused on further optimizing LoRA by reducing the number of trainable parameters without compromising performance.

Recent studies have shown that delta parameters – the differences between fine-tuned and pretrained model weights – exhibit significant redundancy. Motivated by the effectiveness of random projections and the observed redundancy in delta parameters, we propose LoRA with Reduced Interference (LoRI). LoRI keeps the low-rank matrices A fixed as random projections, while training the matrices B using task-specific sparse masks. To retain the most critical elements of B, LoRI performs a calibration process to extract sparse masks by selecting the highest-magnitude elements across all layers and projections. As shown in Figure 1(a), LoRI maintains performance even with 90% sparsity in B while keeping A frozen. This demonstrates that adaptation does not require updating A, and that B has considerable redundancy. By applying more constrained updates than LoRA, LoRI significantly reduces the number of trainable parameters while better preserving the pretrained model’s knowledge during adaptation.

Multi-task learning is essential for enabling versatile models with multi-task capabilities, which is traditionally performed via joint training on a combination of task-specific datasets. However, training large models on this data mixture is prohibitively expensive in terms of time and compute. Model merging is a training-free alternative for building powerful models by combining existing ones. This approach is well-suited for merging LoRA adapters to enable multi-task capabilities within a single LoRA. However, as shown in Figure 1(b), directly merging heterogeneous LoRAs often results in parameter interference, leading to degraded performance in the merged LoRA compared to single-task LoRAs. Additionally, many existing merging methods require trial-and-error to identify the optimal method for a specific combination of tasks. LoRI tackles these challenges by enabling adapter merging without manual selection of merging methods. By using fixed, randomly initialized projection A, LoRI maps task-specific adapters into approximately orthogonal subspaces, thereby reducing interference when merging multiple LoRIs.

Beyond multi-tasking, safety-critical scenarios require that each newly introduced adapter enhances model capabilities while preserving the safety alignment of the pretrained base model. LoRI provides a lightweight continual learning approach for adapting models while preserving safety, where training is performed sequentially across tasks. The strategy involves first fine-tuning an adapter on safety data to establish alignment, followed by separate adaptation to each downstream task. However, as illustrated in Figure 1(c), continual learning often leads to catastrophic forgetting, wherein the adaptation to new tasks substantially compromises previously acquired knowledge. LoRI mitigates forgetting by leveraging the sparsity of matrices B through task-specific masks. This isolation of parameter updates across tasks facilitates continual learning with minimal interference, preserving both safety and task effectiveness.

To evaluate the effectiveness of LoRI, we conduct extensive experiments across a diverse suite of benchmarks spanning natural language understanding (NLU), mathematical reasoning, code generation, and safety alignment tasks. Using Llama-3-8B and Mistral-7B as base models, our results show that LoRI achieves performance comparable to – or better than – full fine-tuning (FFT), LoRA, and other PEFT methods, while using up to 95% fewer trainable parameters than LoRA. Notably, LoRI with 90% sparsity in B surpasses LoRA by 17.3% on HumanEval with Llama-3. Beyond single-task adaptation, we evaluate LoRI in multi-task settings, including adapter merging and continual learning scenarios. Concatenated merging of LoRI adapters consistently outperforms LoRA adapters overall, closely matching the performance of single-task LoRA baseline. In continual learning, LoRI significantly outperforms LoRA in mitigating catastrophic forgetting of safety alignment, while maintaining strong performance on downstream tasks.


Conclusion:

In this work, we introduced LoRI, a simple yet effective approach to parameter-efficient fine-tuning (PEFT) that substantially reduces trainable parameters while minimizing cross-task interference. By freezing the projection matrices A as random projections and sparsifying B using task-specific masks, LoRI achieves strong single-task performance across diverse domains – including natural language understanding, mathematical reasoning, code generation, and safety alignment – while reducing trainable parameters by up to 95% compared to LoRA. Furthermore, LoRI enables training-free adapter merging with minimal performance degradation, and supports continual learning with significantly reduced catastrophic forgetting. It also provides a lightweight approach to building safety adapters that preserve the safety alignment of the base model.

Future Work. We identify several promising avenues for extending this work. While LoRI currently leverages unstructured magnitude-based sparsity, future research can explore structured sparsity patterns – such as block sparsity, head pruning, or group-wise masking – which may offer better hardware compatibility. Additionally, although this study focuses on LLMs, the core design of LoRI is modality-agnostic. Extending LoRI to diffusion and vision-language models for multi-modal generation is a promising direction, given the growing impact of adapter-based fine-tuning.

""".strip()

In [10]:
qa_pairs = [
    {
        "question": "What is the primary innovation introduced by the LoRI method for parameter-efficient fine-tuning?",
        "answer": "LoRI introduces a novel approach that freezes the projection matrices A as random projections and sparsifies the matrices B using task-specific masks, thereby significantly reducing trainable parameters while minimizing cross-task interference."
    },
    {
        "question": "How does LoRI reduce the number of trainable parameters compared to traditional LoRA?",
        "answer": "LoRI reduces the number of trainable parameters by keeping matrix A fixed as a random projection and sparsifying matrix B using task-specific masks, eliminating the need to train both matrices and reducing redundancy."
    },
    {
        "question": "Why is sparsity in matrix B important in LoRI?",
        "answer": "Sparsity in matrix B enables LoRI to retain only the most critical elements necessary for adaptation, reducing parameter count and mitigating cross-task interference during adapter merging and continual learning."
    },
    {
        "question": "How does LoRI improve the process of merging adapters in multi-task scenarios?",
        "answer": "LoRI enables more effective adapter merging by using fixed, randomly initialized projection matrices A, which maps task-specific adapters into approximately orthogonal subspaces, thus reducing parameter interference."
    },
    {
        "question": "What mechanism does LoRI use to mitigate catastrophic forgetting in continual learning?",
        "answer": "LoRI mitigates catastrophic forgetting by applying task-specific sparse masks to matrix B, which isolates parameter updates across tasks and preserves knowledge from previous adaptations, including safety alignment."
    },
    {
        "question": "On what benchmark did LoRI with 90% sparsity in B outperform LoRA, and by how much?",
        "answer": "LoRI with 90% sparsity in B outperformed LoRA by 17.3% on the HumanEval benchmark using the Llama-3 model."
    },
    {
        "question": "How does LoRI compare to full fine-tuning and other PEFT methods in terms of performance and efficiency?",
        "answer": "LoRI matches or outperforms full fine-tuning and other PEFT methods across multiple domains while using up to 95% fewer trainable parameters than LoRA, demonstrating both high performance and high efficiency."
    },
    {
        "question": "What types of tasks were used to evaluate LoRI's effectiveness?",
        "answer": "LoRI was evaluated on a diverse set of tasks, including natural language understanding, mathematical reasoning, code generation, and safety alignment."
    },
    {
        "question": "What potential future directions do the authors propose for extending LoRI?",
        "answer": "The authors suggest exploring structured sparsity patterns like block sparsity or head pruning and adapting LoRI to multi-modal models such as diffusion and vision-language systems."
    },
    {
        "question": "What is the broader significance of LoRI in the context of PEFT and LLM deployment?",
        "answer": "LoRI provides a lightweight, modular, and scalable solution for adapting LLMs with minimal overhead, making it particularly suited for multi-task learning, safety-critical alignment, and efficient deployment on resource-constrained hardware."
    }
]

In [8]:
title = "LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation"

In [None]:
generate_QA_pair("01", 2025, "lori", title, qa_pairs, QA_DIR)

In [11]:
generate_QA_pair_with_context(
    paper_number="01",
    publication_year=2025,
    short_id="lori",
    title=title,
    qa_pairs=qa_pairs,
    context=context,
    save_dir= QA_DIR_WITH_CONTEXT
)

'/content/drive/MyDrive/llm-finetuning-project/llm-finetuning-summarizer/qa_pairs/qa_pairs_eval_with_context/01_2025_lori.json'

### **ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning**


**Abstract**

Low-Rank Adaptation (LoRA) has become a widely adopted technique for fine-tuning large-scale pre-trained models with minimal parameter updates. However, existing methods rely on fixed ranks or focus solely on either rank pruning or expansion, failing to adapt ranks dynamically to match the importance of different layers during training. In this work, we propose ElaLoRA, an adaptive low-rank adaptation framework that dynamically prunes and expands ranks based on gradient-derived importance scores. To the best of our knowledge, ElaLoRA is the first method that enables both rank pruning and expansion during fine-tuning. Experiments across multiple benchmarks demonstrate that ElaLoRA consistently outperforms existing PEFT methods across different parameter budgets. Furthermore, our studies validate that layers receiving higher rank allocations contribute more significantly to model performance, providing theoretical justification for our adaptive strategy. By introducing a principled and adaptive rank allocation mechanism, ElaLoRA offers a scalable and efficient fine-tuning solution, particularly suited for resource-constrained environments.

**Introduction**

Scaling laws of transformer-based Pre-trained Language Models (PLMs) suggest that increasing model size leads to improved generalization and task performance, which has driven the rapid expansion of model architectures, from 330M parameters in BERT to 1.5B in GPT-2, 175B in GPT-3, and 671B in DeepSeek, highlighting the trend toward ever-larger pretrained models. Despite
these advances, Large Language Models (LLMs) remain constrained by their knowledge boundaries, requiring fine-tuning to specialize in domain-specific applications and adapt to evolving datasets. Traditionally, full fine-tuning has been the standard approach, which is nevertheless prohibitively expensive in terms of memory and computation.

To address the computational burden of full fine-tuning, Parameter-efficient fine-tuning (PEFT) methods have been developed, with Low-Rank Adaptation (LoRA) being a widely used approach that reduces trainable parameters
without increasing inference latency. However, LoRA’s fixed rank allocation leads to suboptimal performance by failing to account for layer-specific importance (Zhang et al., 2023b). Dynamic rank allocation methods like AdaLoRA and SaLoRA decompose a matrix using singular value decomposition (SVD) and selectively prune its singular values to control the rank of the matrix, but these methods are computationally inefficient as they begin with a high rank. IncreLoRA (Zhang et al., 2023a) mitigates this by starting with a minimal rank and increasing it heuristically. However, early training samples may not be effectively learned or utilized when the rank is small.

To overcome these limitations, we propose ElaLoRA, a novel adaptive and dynamic LoRA framework that simultaneously prunes and expands ranks (as shown in Figure 1). By dynamically reallocating computational resources to the most critical layers, ElaLoRA ensures that essential layers receive more capacity while redundant ranks are removed. ElaLoRA operates through three key components: 1) SVD-based adaptation strategy; 2) importance score calculation to quantify the significance of each rank based on loss gradients; and 3) a dynamic rank learning algorithm that reallocates ranks at scheduled intervals.
Experimental results across multiple Natural Language Understanding (NLU), Natural Language Generation (NLG), and Visual Task benchmarks demonstrate that ElaLoRA consistently outperforms existing PEFT methods under various parameter budgets. Notably, ElaLoRA achieves better average GLUE results with r = 2 than other PEFT methods at r = 4, making it particularly well-suited for resource-constrained environments. Our key contributions include:

• We introduce ElaLoRA, the first method to the best of our knowledge that enables both rank pruning and expansion simultaneously during fine-tuning. Comparisons are shown in Table 1.

• We conduct extensive experiments across multiple benchmarks under different parameter budgets. Our results consistently demonstrate the effectiveness of ElaLoRA, outperforming existing PEFT methods in performance.

• We conduct analysis to verify that the layers and matrices identified as highly important for a specific task are indeed significant for that task, providing a principled validation of our adaptive rank allocation method.

**Conclusion**

In this work, we introduced ElaLoRA, a novel parameter-efficient fine-tuning (PEFT) method that dynamically prunes and expands ranks based on importance scores, ensuring that the most impactful layers receive additional capacity while removing redundant ranks. This adaptive rank learning mechanism enables more efficient model adaptation across diverse NLP and Vision tasks. Our empirical results demonstrate that ElaLoRA outperforms other state-of-the-art methods, achieving superior accuracy across multiple benchmark datasets while maintaining a lower or comparable parameter budget. Beyond performance improvements, our analysis of final rank distributions and importance score distributions confirms that ElaLoRA’s rank allocation decisions align with the layers that contribute most to task-specific learning.



In [12]:
context = """
Abstract:
Low-Rank Adaptation (LoRA) has become a widely adopted technique for fine-tuning large-scale pre-trained models with minimal parameter updates. However, existing methods rely on fixed ranks or focus solely on either rank pruning or expansion, failing to adapt ranks dynamically to match the importance of different layers during training. In this work, we propose ElaLoRA, an adaptive low-rank adaptation framework that dynamically prunes and expands ranks based on gradient-derived importance scores. To the best of our knowledge, ElaLoRA is the first method that enables both rank pruning and expansion during fine-tuning. Experiments across multiple benchmarks demonstrate that ElaLoRA consistently outperforms existing PEFT methods across different parameter budgets. Furthermore, our studies validate that layers receiving higher rank allocations contribute more significantly to model performance, providing theoretical justification for our adaptive strategy. By introducing a principled and adaptive rank allocation mechanism, ElaLoRA offers a scalable and efficient fine-tuning solution, particularly suited for resource-constrained environments.

Introduction:
Scaling laws of transformer-based Pre-trained Language Models (PLMs) suggest that increasing model size leads to improved generalization and task performance, which has driven the rapid expansion of model architectures, from 330M parameters in BERT to 1.5B in GPT-2, 175B in GPT-3, and 671B in DeepSeek, highlighting the trend toward ever-larger pretrained models. Despite these advances, Large Language Models (LLMs) remain constrained by their knowledge boundaries, requiring fine-tuning to specialize in domain-specific applications and adapt to evolving datasets. Traditionally, full fine-tuning has been the standard approach, which is nevertheless prohibitively expensive in terms of memory and computation.

To address the computational burden of full fine-tuning, Parameter-efficient fine-tuning (PEFT) methods have been developed, with Low-Rank Adaptation (LoRA) being a widely used approach that reduces trainable parameters without increasing inference latency. However, LoRA’s fixed rank allocation leads to suboptimal performance by failing to account for layer-specific importance (Zhang et al., 2023b). Dynamic rank allocation methods like AdaLoRA and SaLoRA decompose a matrix using singular value decomposition (SVD) and selectively prune its singular values to control the rank of the matrix, but these methods are computationally inefficient as they begin with a high rank. IncreLoRA (Zhang et al., 2023a) mitigates this by starting with a minimal rank and increasing it heuristically. However, early training samples may not be effectively learned or utilized when the rank is small.

To overcome these limitations, we propose ElaLoRA, a novel adaptive and dynamic LoRA framework that simultaneously prunes and expands ranks (as shown in Figure 1). By dynamically reallocating computational resources to the most critical layers, ElaLoRA ensures that essential layers receive more capacity while redundant ranks are removed. ElaLoRA operates through three key components: 1) SVD-based adaptation strategy; 2) importance score calculation to quantify the significance of each rank based on loss gradients; and 3) a dynamic rank learning algorithm that reallocates ranks at scheduled intervals. Experimental results across multiple Natural Language Understanding (NLU), Natural Language Generation (NLG), and Visual Task benchmarks demonstrate that ElaLoRA consistently outperforms existing PEFT methods under various parameter budgets. Notably, ElaLoRA achieves better average GLUE results with r = 2 than other PEFT methods at r = 4, making it particularly well-suited for resource-constrained environments. Our key contributions include:

• We introduce ElaLoRA, the first method to the best of our knowledge that enables both rank pruning and expansion simultaneously during fine-tuning. Comparisons are shown in Table 1.

• We conduct extensive experiments across multiple benchmarks under different parameter budgets. Our results consistently demonstrate the effectiveness of ElaLoRA, outperforming existing PEFT methods in performance.

• We conduct analysis to verify that the layers and matrices identified as highly important for a specific task are indeed significant for that task, providing a principled validation of our adaptive rank allocation method.

Conclusion:
In this work, we introduced ElaLoRA, a novel parameter-efficient fine-tuning (PEFT) method that dynamically prunes and expands ranks based on importance scores, ensuring that the most impactful layers receive additional capacity while removing redundant ranks. This adaptive rank learning mechanism enables more efficient model adaptation across diverse NLP and Vision tasks. Our empirical results demonstrate that ElaLoRA outperforms other state-of-the-art methods, achieving superior accuracy across multiple benchmark datasets while maintaining a lower or comparable parameter budget. Beyond performance improvements, our analysis of final rank distributions and importance score distributions confirms that ElaLoRA’s rank allocation decisions align with the layers that contribute most to task-specific learning.
""".strip()

In [13]:
qa_pairs = [
  {
    "question": "What are the core limitations of traditional LoRA methods that ElaLoRA seeks to address?",
    "answer": "ElaLoRA addresses two key limitations of traditional LoRA: the fixed rank allocation across layers, which overlooks the layer-specific importance, and the inability to adapt ranks dynamically during training, which can lead to suboptimal parameter efficiency."
  },
  {
    "question": "Describe the three core components of the ElaLoRA framework.",
    "answer": "ElaLoRA's architecture consists of: (1) an SVD-based adaptation strategy for matrix decomposition, (2) an importance score calculation mechanism based on loss gradients to assess rank relevance, and (3) a dynamic rank learning algorithm that reallocates ranks periodically during training to optimize layer-wise adaptation."
  },
  {
    "question": "How does ElaLoRA’s adaptive strategy improve performance under limited parameter budgets?",
    "answer": "ElaLoRA reallocates computational resources to the most critical layers by pruning less important ranks and expanding ranks in essential layers, thus achieving higher performance even under smaller parameter budgets—for example, outperforming other PEFT methods with r=2 compared to their r=4 settings."
  },
  {
    "question": "In what way does ElaLoRA achieve better task alignment during fine-tuning?",
    "answer": "ElaLoRA uses gradient-derived importance scores to identify which layers contribute most to task-specific learning, allowing the model to allocate more capacity to those layers and thus improving task alignment and learning efficiency."
  },
  {
    "question": "What experimental evidence supports the superiority of ElaLoRA over other PEFT methods?",
    "answer": "Experiments across NLU, NLG, and vision benchmarks show that ElaLoRA consistently outperforms state-of-the-art PEFT methods in accuracy, particularly under constrained parameter budgets, and demonstrates better GLUE benchmark performance even with fewer trainable parameters."
  },
  {
    "question": "Why is ElaLoRA particularly well-suited for resource-constrained environments?",
    "answer": "ElaLoRA's dynamic pruning and expansion mechanism ensures that only the most essential ranks are trained, reducing memory usage and computational cost while maintaining high performance, making it ideal for low-resource scenarios."
  },
  {
    "question": "How does the final rank distribution in ElaLoRA reflect its adaptive learning process?",
    "answer": "ElaLoRA’s final rank distribution reveals that higher ranks are allocated to layers deemed more important via importance scores, confirming that the model dynamically concentrates learning capacity on the most impactful parts of the network."
  },
  {
    "question": "What are the broader implications of ElaLoRA’s design for the future of fine-tuning large models?",
    "answer": "ElaLoRA’s design shows that adaptive, importance-based rank allocation can significantly improve parameter efficiency without sacrificing accuracy, suggesting a paradigm shift toward more intelligent and resource-aware fine-tuning strategies."
  },
  {
    "question": "What distinguishes ElaLoRA from prior dynamic rank methods like AdaLoRA or IncreLoRA?",
    "answer": "While AdaLoRA and IncreLoRA either prune or expand ranks, ElaLoRA is the first to implement both pruning and expansion dynamically during training, offering a more flexible and principled mechanism for allocating parameter capacity."
  },
  {
    "question": "Why is parameter-efficient fine-tuning increasingly important in the LLM landscape?",
    "answer": "As LLMs grow in size, full fine-tuning becomes prohibitively expensive, especially for domain-specific or low-resource settings. PEFT methods like ElaLoRA offer a practical solution by enabling adaptation with minimal compute and storage costs."
  }
]

In [15]:
title = "ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning"

In [None]:
generate_QA_pair("02", 2025, "elalora", title, qa_pairs, QA_DIR)

'/content/drive/MyDrive/llm-finetuning-project/llm-finetuning-summarizer/qa_pairs/qa_pairs_eval/02_2025_elalora.json'

In [16]:
generate_QA_pair_with_context(
    paper_number="02",
    publication_year=2025,
    short_id="elalora",
    title=title,
    qa_pairs=qa_pairs,
    context=context,
    save_dir= QA_DIR_WITH_CONTEXT
)

'/content/drive/MyDrive/llm-finetuning-project/llm-finetuning-summarizer/qa_pairs/qa_pairs_eval_with_context/02_2025_elalora.json'

### **Beyond QA Pairs: Assessing Parameter-Efficient Fine-Tuning for Fact Embedding in LLMs**

**Abstract**

This paper presents an extensive examination of Parameter-Efficient Fine-Tuning (PEFT) for embedding domain specific facts into Large Language Models (LLMs), focusing on improving the fine-tuning process by categorizing question-answer (QA) pairs into ‘Factual’ and ‘Conceptual’ classes using a BERT-based classifier. Two distinct Llama-2 models are fine-tuned based on these classifications and evaluated using larger models like GPT-3.5 Turbo and Gemini. Our results indicate that models trained on conceptual datasets outperform those trained on factual datasets. Additionally, we compare the efficiency of two synthetic fine-tuning dataset generation techniques, D-RAG and D-Naive, with D-Naive demonstrating superior performance. Although PEFT has shown effectiveness, our research indicates that it may not be the most optimal method for embedding facts into LLMs. However, it has demonstrated exceptional performance in instruction-based tasks. Our findings are reinforced by a 1000-sample dataset in the data center domain, where the fine-tuned Llama-2 7B model significantly outperforms the baseline model in generating product recommendations. Our study highlights the importance of QA pair categorization and synthetic dataset generation techniques in enhancing the performance of LLMs in specific domains.


**Introduction**

Parameter-Efficient Fine-Tuning (PEFT) has emerged as a highly effective strategy for refining Large Language Models (LLMs) on domain-specific data, thanks to its reduced computational and time requirements compared to full fine-tuning. This technique has seen widespread adoption in the industry for embedding domain knowledge into LLMs. Platforms like Azure, Google Cloud Platform, Mistral, AWS, and Lamini offer fine-tuning as a service using methods like Low Rank Adaptation (LoRA), making PEFT accessible and user-friendly (Hu et al. 2021). These low code/no code solutions have become popular among developers due to their simplicity. However, the ease of use of these platforms can create a misconception that merely having a large quantity of question-answer (QA) pairs is sufficient for effective domain adaptation. This misunderstanding may lead to the utilization of low-quality datasets, compromising the effectiveness of the fine-tuning process. In this paper, we address this issue by proposing a set of metrics to assess the quality and appropriateness of QA datasets for PEFT. We introduce a novel method for categorizing QA pairs into ‘Factual’ and ‘Conceptual’ classes using a BERT-based classifier. By separating the original dataset based on these categories, we fine-tune two distinct sets of Llama-2 models
using LoRA. Our evaluation, conducted with larger models such as GPT-3.5 Turbo, Gemini 1.5 Pro, and Prometheus, reveals that models trained on conceptual datasets significantly outperform those trained on factual datasets. Furthermore, we investigate the effectiveness of two synthetic dataset generation techniques, D-RAG and D-Naive (depicted in Figure 1). Our results show that the D-Naive approach produces superior fine-tuning datasets compared to D-RAG. Additionally, we suggest that while PEFT is highly effective, it may not be optimal for embedding factual information into LLMs. Instead, it excels in instruction-based tasks. To support our assertion, we conducted an experiment using a 1000-sample dataset for sales product recommendation in the data center domain. The results clearly demonstrate that the fine-tuned Llama-2 7B model outperforms the baseline model.

**Conclusions**

Our research highlights the paramount importance of the quality and categorization of QA pairs in PEFT, providing profound insights into optimizing the fine-tuning process of LLMs for domain-specific applications. The outcomes of our fine-tuning experiments reveal that PEFT is particularly
advantageous for scenarios requiring minimal factual information embedding into LLMs. Notably, the LLM trained on a conceptual dataset significantly outperformed the one trained on a factual dataset. This trend was consistently observed across all three proctor models, underscoring that the sheer volume of QA pairs is insufficient for the effective deployment of PEFT in developing domain-specific QA bots. It is crucial to judiciously select the use-case when leveraging PEFT. Our product recommendation experiment further illustrates that for instruction-based applications, even a dataset as modest as 1,000 prompt-response pairs can yield a high-quality fine-tuned model.

Although our experiments with D-RAG and D-Naive did not demonstrate that the D-RAG technique for synthetic training data generation is more efficient, we believe that this avenue warrants further exploration. The potential of D-RAG to generate more comprehensive and complete answers remains promising. In this particular instance, the technique’s shortcomings were primarily due to the suboptimal performance of the vector database retriever. By addressing these retrieval inefficiencies, future research could unlock the full potential of D-RAG, thereby contributing to more effective and nuanced fine-tuning methodologies for LLMs. Thus, while current findings emphasize the importance of careful use-case selection and QA pair quality in PEFT, they also open the door for continued innovation in synthetic data generation techniques.

In [17]:
context = """
Abstract:
This paper presents an extensive examination of Parameter-Efficient Fine-Tuning (PEFT) for embedding domain specific facts into Large Language Models (LLMs), focusing on improving the fine-tuning process by categorizing question-answer (QA) pairs into ‘Factual’ and ‘Conceptual’ classes using a BERT-based classifier. Two distinct Llama-2 models are fine-tuned based on these classifications and evaluated using larger models like GPT-3.5 Turbo and Gemini. Our results indicate that models trained on conceptual datasets outperform those trained on factual datasets. Additionally, we compare the efficiency of two synthetic fine-tuning dataset generation techniques, D-RAG and D-Naive, with D-Naive demonstrating superior performance. Although PEFT has shown effectiveness, our research indicates that it may not be the most optimal method for embedding facts into LLMs. However, it has demonstrated exceptional performance in instruction-based tasks. Our findings are reinforced by a 1000-sample dataset in the data center domain, where the fine-tuned Llama-2 7B model significantly outperforms the baseline model in generating product recommendations. Our study highlights the importance of QA pair categorization and synthetic dataset generation techniques in enhancing the performance of LLMs in specific domains.

Introduction:
Parameter-Efficient Fine-Tuning (PEFT) has emerged as a highly effective strategy for refining Large Language Models (LLMs) on domain-specific data, thanks to its reduced computational and time requirements compared to full fine-tuning. This technique has seen widespread adoption in the industry for embedding domain knowledge into LLMs. Platforms like Azure, Google Cloud Platform, Mistral, AWS, and Lamini offer fine-tuning as a service using methods like Low Rank Adaptation (LoRA), making PEFT accessible and user-friendly (Hu et al. 2021). These low code/no code solutions have become popular among developers due to their simplicity. However, the ease of use of these platforms can create a misconception that merely having a large quantity of question-answer (QA) pairs is sufficient for effective domain adaptation. This misunderstanding may lead to the utilization of low-quality datasets, compromising the effectiveness of the fine-tuning process. In this paper, we address this issue by proposing a set of metrics to assess the quality and appropriateness of QA datasets for PEFT. We introduce a novel method for categorizing QA pairs into ‘Factual’ and ‘Conceptual’ classes using a BERT-based classifier. By separating the original dataset based on these categories, we fine-tune two distinct sets of Llama-2 models using LoRA. Our evaluation, conducted with larger models such as GPT-3.5 Turbo, Gemini 1.5 Pro, and Prometheus, reveals that models trained on conceptual datasets significantly outperform those trained on factual datasets. Furthermore, we investigate the effectiveness of two synthetic dataset generation techniques, D-RAG and D-Naive (depicted in Figure 1). Our results show that the D-Naive approach produces superior fine-tuning datasets compared to D-RAG. Additionally, we suggest that while PEFT is highly effective, it may not be optimal for embedding factual information into LLMs. Instead, it excels in instruction-based tasks. To support our assertion, we conducted an experiment using a 1000-sample dataset for sales product recommendation in the data center domain. The results clearly demonstrate that the fine-tuned Llama-2 7B model outperforms the baseline model.

Conclusion:
Our research highlights the paramount importance of the quality and categorization of QA pairs in PEFT, providing profound insights into optimizing the fine-tuning process of LLMs for domain-specific applications. The outcomes of our fine-tuning experiments reveal that PEFT is particularly advantageous for scenarios requiring minimal factual information embedding into LLMs. Notably, the LLM trained on a conceptual dataset significantly outperformed the one trained on a factual dataset. This trend was consistently observed across all three proctor models, underscoring that the sheer volume of QA pairs is insufficient for the effective deployment of PEFT in developing domain-specific QA bots. It is crucial to judiciously select the use-case when leveraging PEFT. Our product recommendation experiment further illustrates that for instruction-based applications, even a dataset as modest as 1,000 prompt-response pairs can yield a high-quality fine-tuned model.

Although our experiments with D-RAG and D-Naive did not demonstrate that the D-RAG technique for synthetic training data generation is more efficient, we believe that this avenue warrants further exploration. The potential of D-RAG to generate more comprehensive and complete answers remains promising. In this particular instance, the technique’s shortcomings were primarily due to the suboptimal performance of the vector database retriever. By addressing these retrieval inefficiencies, future research could unlock the full potential of D-RAG, thereby contributing to more effective and nuanced fine-tuning methodologies for LLMs. Thus, while current findings emphasize the importance of careful use-case selection and QA pair quality in PEFT, they also open the door for continued innovation in synthetic data generation techniques.
""".strip()

In [18]:
qa_pairs = [
    {
        "question": "What is the primary goal of the study presented in 'Beyond QA Pairs'?",
        "answer": "The study aims to assess the effectiveness of Parameter-Efficient Fine-Tuning (PEFT) for embedding domain-specific facts into LLMs, focusing on the impact of QA pair categorization and synthetic dataset generation techniques."
    },
    {
        "question": "How are QA pairs categorized in this study, and what is the purpose of this categorization?",
        "answer": "QA pairs are classified into ‘Factual’ and ‘Conceptual’ categories using a BERT-based classifier. The purpose is to investigate how the nature of QA pairs influences the effectiveness of PEFT."
    },
    {
        "question": "What were the findings regarding models trained on conceptual vs factual QA datasets?",
        "answer": "Models fine-tuned on conceptual datasets consistently outperformed those trained on factual datasets across multiple evaluations."
    },
    {
        "question": "Which synthetic dataset generation techniques are evaluated in this work, and which one performs better?",
        "answer": "The paper evaluates D-RAG and D-Naive synthetic data generation methods. D-Naive outperformed D-RAG in fine-tuning effectiveness, largely due to better retrieval performance."
    },
    {
        "question": "What was the significance of the product recommendation task in the data center domain?",
        "answer": "The task served as a practical demonstration showing that a Llama-2 7B model fine-tuned with PEFT on just 1,000 instruction-based QA pairs significantly outperformed the baseline in generating product recommendations."
    },
    {
        "question": "Why do the authors argue that PEFT may not be optimal for factual embedding?",
        "answer": "The study shows that while PEFT is effective for instruction tuning, it struggles with embedding factual information as effectively, likely due to its limited parameter update scope."
    },
    {
        "question": "What conclusions do the authors draw about the volume versus quality of QA data in PEFT?",
        "answer": "They conclude that sheer quantity of QA pairs is insufficient; quality and conceptual depth are far more critical for successful PEFT."
    },
    {
        "question": "What limitations of D-RAG were identified in the study?",
        "answer": "D-RAG's limitations were attributed to the poor performance of its underlying vector database retriever, leading to suboptimal training data quality."
    },
    {
        "question": "How do the authors suggest future research should improve PEFT for fact embedding?",
        "answer": "Future research should explore improvements in retrieval systems used by D-RAG, and consider more refined QA classification and data generation strategies."
    },
    {
        "question": "What is the key insight this paper contributes to the field of LLM fine-tuning?",
        "answer": "The paper highlights that PEFT's success hinges more on dataset composition—especially the conceptual quality of QA pairs—than on volume alone, and that careful use-case targeting is essential."
    }
]

In [19]:
title = "Beyond QA Pairs: Assessing Parameter-Efficient Fine-Tuning for Fact Embedding in LLMs"

In [None]:
generate_QA_pair("03", 2025, "beyond_qa_pairs", title, qa_pairs, QA_DIR)

'/content/drive/MyDrive/llm-finetuning-project/llm-finetuning-summarizer/qa_pairs/qa_pairs_eval/03_2025_beyond_qa_pairs.json'

In [20]:
generate_QA_pair_with_context(
    paper_number="03",
    publication_year=2025,
    short_id="beyond_qa_pairs",
    title=title,
    qa_pairs=qa_pairs,
    context=context,
    save_dir= QA_DIR_WITH_CONTEXT
)

'/content/drive/MyDrive/llm-finetuning-project/llm-finetuning-summarizer/qa_pairs/qa_pairs_eval_with_context/03_2025_beyond_qa_pairs.json'

## Step 4: Combining All QA Pairs

In [None]:
qa_folder = './qa_pairs/qa_pairs_eval'
output_path = './data/eval.jsonl'

all_qa_pairs = []

for filename in os.listdir(qa_folder):
    if filename.endswith('.json'):
        with open(os.path.join(qa_folder, filename), 'r') as f:
            data = json.load(f)
            for pair in data['qa_pairs']:
                all_qa_pairs.append({
                    "question": pair['question'].strip(),
                    "answer": pair['answer'].strip()
                })

In [None]:
all_qa_pairs[0]

{'question': 'What is the primary innovation introduced by the LoRI method for parameter-efficient fine-tuning?',
 'answer': 'LoRI introduces a novel approach that freezes the projection matrices A as random projections and sparsifies the matrices B using task-specific masks, thereby significantly reducing trainable parameters while minimizing cross-task interference.'}

In [None]:
with open(output_path, 'w') as f:
    for pair in all_qa_pairs:
        json.dump(pair, f)
        f.write('\n')

In [None]:
df = pd.read_json('data/eval.jsonl', lines=True)

In [None]:
df

Unnamed: 0,question,answer
0,What is the primary innovation introduced by t...,LoRI introduces a novel approach that freezes ...
1,How does LoRI reduce the number of trainable p...,LoRI reduces the number of trainable parameter...
2,Why is sparsity in matrix B important in LoRI?,Sparsity in matrix B enables LoRI to retain on...
3,How does LoRI improve the process of merging a...,LoRI enables more effective adapter merging by...
4,What mechanism does LoRI use to mitigate catas...,LoRI mitigates catastrophic forgetting by appl...
5,On what benchmark did LoRI with 90% sparsity i...,LoRI with 90% sparsity in B outperformed LoRA ...
6,How does LoRI compare to full fine-tuning and ...,LoRI matches or outperforms full fine-tuning a...
7,What types of tasks were used to evaluate LoRI...,"LoRI was evaluated on a diverse set of tasks, ..."
8,What potential future directions do the author...,The authors suggest exploring structured spars...
9,What is the broader significance of LoRI in th...,"LoRI provides a lightweight, modular, and scal..."


## Step 5: Combining All QA Pairs With Context

In [23]:
# Define paths
qa_folder = './qa_pairs/qa_pairs_eval_with_context'  # This should contain your new JSON files with 'context'
output_path = './data/eval_with_context.jsonl'

In [24]:
# Aggregate all QA pairs
all_qa_pairs = []

for filename in os.listdir(qa_folder):
    if filename.endswith('.json'):
        with open(os.path.join(qa_folder, filename), 'r') as f:
            data = json.load(f)
            for pair in data['qa_pairs']:
                all_qa_pairs.append({
                    "question": pair['question'].strip(),
                    "answer": pair['answer'].strip(),
                    "context": pair['context'].strip()
                })

# Save to .jsonl file
with open(output_path, 'w') as f:
    for pair in all_qa_pairs:
        json.dump(pair, f)
        f.write('\n')

print(f"uccessfully saved {len(all_qa_pairs)} QA pairs with context to {output_path}")

uccessfully saved 30 QA pairs with context to ./data/eval_with_context.jsonl
