# Parameter Efficient Fine-Tuning QLoRA for Legal Assistance

In the ever-evolving field of legal technology, the use of advanced natural language processing models has become increasingly critical. **To enhance the effectiveness and accuracy of automated legal assistants**, fine-tuning state-of-the-art language models, such as those based on the QLoRA (Quantized Low-Rank Adapters), is essential. This process involves adapting a pre-trained model to **better handle specific legal contexts and tasks**, ultimately improving its ability to assist users with nuanced legal inquiries.

- **Objective:** The primary objective of this fine-tuning exercise is to adapt the QLoRA model to perform optimally in the domain of legal assistance. By **fine-tuning the model with a specialized dataset**, we aim to enhance its proficiency in understanding, generating, and advising on various legal issues.

- **Dataset Preparation:** The dataset for this fine-tuning exercise has been carefully curated to reflect a range of common and complex legal scenarios. It includes a variety of examples across different areas of law, including but not limited to:

  - Contract Law
  - Family Law
  - Corporate Law
  - Intellectual Property Law
  - Estate Planning
  - Real Estate Law
  - Immigration Law
  - Consumer Protection Law
  - Criminal Law
  - Environmental Law

Each example is structured in a specific format to provide the model with clear instructions and responses. The dataset includes:

- **System Prompt:** A brief description of the model's role and expertise.
- **User Prompt:** A detailed legal question or scenario requiring a specific response.
- **Model Answer:** A well-structured and accurate response tailored to the legal query.

- **Format:** The dataset entries follow this format:

```
<s>[INST] <<SYS>>
System prompt
<</SYS>>

User prompt [/INST] Model answer </s>

```
The dataset features a diverse set of legal queries, ranging from drafting legal clauses to explaining complex legal principles. Here are a few types of examples included:

- **Contract Clauses:** Drafting specific clauses such as limitation of liability.

- **Custody Arrangements:** Explaining factors considered in child custody decisions.

- **Business Formation:** Outlining steps to form a limited liability company (LLC).

- **Copyright Registration:** Detailing requirements for obtaining a copyright registration.

- **Estate Planning:** Comparing wills and living trusts.
- **Lease Agreements:** Identifying key components of a residential lease agreement.
- **Immigration Processes:** Describing the process for applying for a family-based green card.
- **Consumer Rights:** Outlining rights under the Fair Credit Reporting Act (FCRA).
- **Criminal Rights:** Explaining Miranda rights and their application.
- **Environmental Regulations:** Summarizing key provisions of the Clean Air Act.
- **Purpose and Benefits:** Fine-tuning the QLoRA model with this dataset will enable it to provide more accurate, relevant, and contextually appropriate responses to a wide range of legal inquiries. This tailored model will better support users in navigating legal issues, understanding their rights and obligations, and making informed decisions.

By investing in this fine-tuning process, we aim to enhance the model's capability to serve as a reliable and knowledgeable legal assistant, ultimately improving the quality and accessibility of legal support.


In [None]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 xformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.9/116.9 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m26.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.8/20.8 MB[0m [31m22.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m43.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Configuring and loading a model for **causal language modeling** using the transformers library from Hugging Face. The code specifically sets up a configuration for a model that **is optimized for running in a resource-constrained environment** by utilizing 4-bit quantization techniques. The following key components and settings are involved:

- **Importing Required Libraries:** The script begins by importing two essential libraries:

- **AutoModelForCausalLM from the transformers module:** This function is responsible for loading a pre-trained model that can be used for causal language modeling, a type of task where the model generates text based on previous input.

- **BitsAndBytesConfig from the same module:** This class is used to configure specific settings related to the quantization of the model, which allows the model to **operate with reduced precision**, saving memory and computational resources.

- **torch:** This is the PyTorch library, which provides support for tensor computations and **enables model training on GPUs.**

**Configuring the Bits and Bytes (bnb) Settings:** The bnb_config object is instantiated using the BitsAndBytesConfig class. The configuration includes several important parameters:

- **load_in_4bit=True:** This enables loading the model in 4-bit precision, which drastically reduces the memory footprint while maintaining an acceptable level of performance.
- **bnb_4bit_quant_type="nf4":** This parameter specifies the quantization type as "nf4" (Normal Float 4), a quantization scheme known for preserving the accuracy of the model while operating with lower precision.

- **bnb_4bit_compute_dtype=torch.float16:** This setting ensures that computations are performed using 16-bit floating-point precision (float16). This balances performance and resource usage during model inference.

- **bnb_4bit_use_double_quant=False:** This setting disables double quantization, which could further reduce precision at the cost of potentially increasing inference speed. In this case, it is set to False to maintain a higher level of accuracy.

Overall, this script is designed to optimize the model for efficient inference, particularly in environments where memory and computational resources are limited. The use of 4-bit quantization and float16 precision strikes a balance between performance and resource efficiency, making it well-suited for deployment in scenarios where hardware constraints are a concern.

In [None]:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,  # Enables loading the model with 4-bit quantization to reduce memory usage and improve performance.
    bnb_4bit_quant_type="nf4",  # Specifies the quantization type for 4-bit quantization. "nf4" refers to a specific quantization format used to optimize model size and speed.
    bnb_4bit_compute_dtype=torch.float16,  # Sets the data type used for computations. "torch.float16" indicates that calculations should be performed using 16-bit floating-point precision, which can improve performance on compatible hardware.
    bnb_4bit_use_double_quant=False,  # Determines whether to use double quantization for additional precision. Setting this to "False" disables double quantization, which may reduce computational overhead.
)

model = AutoModelForCausalLM.from_pretrained(
    "NousResearch/Llama-2-7b-chat-hf",  # Specifies the pre-trained model to load. In this case, it is "NousResearch/Llama-3-7b-chat-hf", which refers to a specific model variant.
    quantization_config=bnb_config,  # Passes the quantization configuration (defined earlier) to adjust the model's precision and memory usage.
    device_map={"": 0},  # Maps the model to a specific device. Here, it assigns the model to the first GPU (device 0). An empty string for the key indicates the main model.
)

model.config.use_cache = False  # Disables the use of cache during model inference. This setting ensures that the model does not use cached intermediate computations, which might be useful for debugging or specific use cases.
model.config.pretraining_tp = 1  # Sets the tensor parallelism degree for pre-training. A value of 1 indicates no tensor parallelism, meaning that computations are not distributed across multiple devices.

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   5%|4         | 168M/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"



tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

In [None]:
from transformers import pipeline

llama2_pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=500)

  @torch.library.impl_abstract("xformers_flash::flash_fwd")
  @torch.library.impl_abstract("xformers_flash::flash_bwd")


In [None]:
dataset_path = "/content/dataset.txt"

with open(dataset_path, 'r', encoding='utf-8') as f:
  examples = f.read().splitlines()

In [None]:
examples[1]

'<s>[INST] <<SYS>>You are a legal assistant specializing in intellectual property law. Your task is to draft a copyright license agreement that includes clauses for scope of use, duration, and payment terms.<</SYS>>Draft a copyright license agreement that includes scope of use, duration, and payment terms.</INST>Sure! Here is a template for a Copyright License Agreement: COPYRIGHT LICENSE AGREEMENT This Copyright License Agreement ("Agreement") is made between [Licensor Name], with an address at [Address], and [Licensee Name], with an address at [Address]. 1. Scope of Use: The Licensor grants the Licensee a non-exclusive, non-transferable license to use the copyrighted material titled [Title of Work] for [specific use]. 2. Duration: This license shall commence on the Effective Date and continue for [X years/months]. 3. Payment Terms: The Licensee agrees to pay the Licensor [Amount] as a one-time fee or [Specify Payment Schedule]. IN WITNESS WHEREOF, the parties have executed this Agree

In [None]:
from datasets import Dataset, DatasetDict

ds = Dataset.from_dict({"text": examples})

In [None]:
ds

Dataset({
    features: ['text'],
    num_rows: 11
})

In [None]:
ds["text"][2]

'<s>[INST] <<SYS>>You are a legal assistant specializing in real estate law. Your task is to draft a lease agreement that includes clauses for rent, duration, and maintenance responsibilities.<</SYS>>Draft a lease agreement that includes rent, duration, and maintenance responsibilities.</INST>Sure! Here is a template for a Lease Agreement: LEASE AGREEMENT This Lease Agreement ("Agreement") is entered into between [Landlord Name], with an address at [Address], and [Tenant Name], with an address at [Address]. 1. Rent: The Tenant agrees to pay the Landlord a monthly rent of [Amount], due on the [Day] of each month. 2. Duration: This lease shall commence on [Start Date] and end on [End Date], unless extended or terminated earlier in accordance with this Agreement. 3. Maintenance Responsibilities: The Tenant shall be responsible for routine maintenance and repair of the leased premises, including [specific responsibilities]. The Landlord shall be responsible for major repairs not caused by 