## 📝 Project Objective

In this project, the goal was to fine-tune a large language model (Mistral-7B-Instruct-v0.2) to generate Statements of Purpose (SOPs) in a specific author's writing style. This model was trained to provide an explanation for its stylistic choices — highlighting why certain words or phrases were used to match the author's tone and style.

While the model did not explicitly generate the reasoning explanations during inference, it successfully produced SOPs that mirrored the target author's writing style. The outputs sounded natural, coherent, and stylistically aligned with the author's typical language patterns.

Furthermore, this fine-tuned model (trained on both SOP + explanation data) **outperformed** a baseline model that was trained **only on SOPs without explanations** — indicating that training on richer, dual-output data improved the quality of SOP generation.

✅ This notebook walks through the entire process:  
- Formatting the dataset  
- Fine-tuning using QLoRA + LoRA  
- Loading the fine-tuned model  
- Running inference with new prompts


###Setting up the notebook by installing packages needed to fine-tune/train LLMs using Hugging Face tools, parameter-efficient methods like LoRA, and model quantization with bitsandbytes.

In [None]:
!pip install transformers datasets trl peft accelerate bitsandbytes


Collecting datasets
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting trl
  Downloading trl-0.16.1-py3-none-any.whl.metadata (12 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.12.0,>=2023.1.0 (from fsspec[http]<=2024.12.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.12

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Please ensure that you have access to Hugging Face’s mistralai/Mistral-7B-Instruct-v0.2 model. To do this, you must have a Hugging Face account, and your access token must have the correct permissions: go to “Access Tokens” → ""Edit Access Tokens" → “Repository Permissions” → ensure you have access to mistralai/Mistral-7B-Instruct-v0.2. Then, log in using:

In [None]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### 📚 Dataset Formatting for Fine-Tuning

This code processes a structured `.jsonl` dataset containing user prompts and assistant responses (SOP + reasoning) and reformats it into a flat text format suitable for LLM fine-tuning.

- **Reads** the input `.jsonl` file line-by-line.
- **Extracts** the user prompt, SOP, and reasoning from each message.
- **Formats** the extracted content using special tokens:
  - `[INST]` and `[/INST]` for the user prompt
  - `[SOP]` and `[/SOP]` for the statement of purpose
  - `[REASONING]` and `[/REASONING]` for the explanation
- **Saves** the final formatted output into a new `.jsonl` file, with one JSON object per line containing a `text` field.

#### 📄 Example Output Format:

[INST] Write an SOP for Computer Science. [/INST] [SOP] I have always been passionate about... [/SOP] [REASONING] The SOP emphasizes personal experiences related to the field... [/REASONING]


> This formatting helps the model learn both SOP generation and explanation generation during fine-tuning.

✅ Output is saved to: `/content/drive/MyDrive/manasa_sop_formatted_updated.jsonl`


In [None]:
import json

input_path = "/content/drive/MyDrive/manasa_chatml_dataset_reformatted.jsonl"
output_path = "/content/drive/MyDrive/manasa_sop_formatted_updated.jsonl"

def convert_message_to_flat_text(messages):
    user_msg = None
    assistant_msg = None

    for m in messages:
        if m["role"] == "user":
            user_msg = m["content"]
        elif m["role"] == "assistant":
            assistant_msg = m["content"]

    if isinstance(assistant_msg, dict) and "sop" in assistant_msg and "explanation" in assistant_msg:
        sop = assistant_msg["sop"].strip()
        reasoning = assistant_msg["explanation"].strip()

        return f"<s>[INST] {user_msg.strip()} [/INST]\n[SOP]\n{sop}\n[/SOP]\n[REASONING]\n{reasoning}\n[/REASONING]</s>"

    return None

# Process the input file line-by-line
with open(input_path, "r", encoding="utf-8") as infile, open(output_path, "w", encoding="utf-8") as outfile:
    for line in infile:
        data = json.loads(line)
        flat_text = convert_message_to_flat_text(data["messages"])
        if flat_text:
            json.dump({"text": flat_text}, outfile)
            outfile.write("\n")

print(f"✅ Converted dataset saved to: {output_path}")

✅ Converted dataset saved to: /content/drive/MyDrive/manasa_sop_formatted_updated.jsonl


### 🚀 Fine-Tuning Mistral-7B-Instruct-v0.2 with QLoRA and LoRA Adapters

This section fine-tunes the `mistralai/Mistral-7B-Instruct-v0.2` model on the formatted SOP + reasoning dataset using **QLoRA** (quantized 4-bit model loading) and **LoRA adapters** (efficient fine-tuning).

#### Steps:

- **Load Dataset**: Read the processed `.jsonl` file.
- **Load Tokenizer**: Initialize the tokenizer from the Mistral model.
- **Setup Quantization (QLoRA)**: Configure 4-bit quantization using `BitsAndBytesConfig`.
- **Load Base Model**: Load the Mistral model with quantization enabled.
- **Prepare Model**: Prepare the model for k-bit (4-bit) training.
- **Attach LoRA Adapters**: Inject lightweight LoRA adapters into the model for parameter-efficient fine-tuning.
- **Tokenize Dataset**: Preprocess the dataset by tokenizing and padding each text example.
- **Data Collator**: Set up a data collator for language modeling without masked language modeling (MLM).
- **Define Training Arguments**: Specify batch size, number of epochs, logging, and save strategy.
- **Initialize Trainer**: Use `trl.SFTTrainer` for supervised fine-tuning with the prepared settings.
- **Train the Model**: Start the training process.
- **Save Adapter Weights**: Save only the LoRA adapter weights separately after training.

✅ The fine-tuned LoRA adapter is saved to: `/content/drive/MyDrive/mistral-v02-sop-explainer-lora-adapter`

In [None]:
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TrainingArguments,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from trl import SFTTrainer
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# --- Model and dataset config ---
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
dataset_path = "/content/drive/MyDrive/manasa_sop_formatted_updated.jsonl"
adapter_output_path = "/content/drive/MyDrive/mistral-v02-sop-explainer-lora-adapter"

# --- Load dataset ---
dataset = load_dataset("json", data_files=dataset_path, split="train")

# --- Tokenizer ---
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

# --- Quantization config for QLoRA ---
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

# --- Load model with quantization ---
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto"
)

# --- Prepare model for QLoRA ---
model = prepare_model_for_kbit_training(model)

# --- Add LoRA adapters ---
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)

# --- Tokenize dataset ---
def preprocess_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=2048,
        padding="max_length"
    )

tokenized_dataset = dataset.map(preprocess_function, batched=True)

# --- Data collator ---
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# --- Training arguments ---
training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/mistral-v02-sop-explainer-lora",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    fp16=True,
    logging_steps=10,
    save_strategy="epoch",
)

# --- Final Trainer ---
trainer = SFTTrainer(
    model=model,
    train_dataset=tokenized_dataset,
    args=training_args,
    data_collator=data_collator,
)

# --- Train!
trainer.train()

# ✅ Save LoRA adapter weights separately for inference
model.save_pretrained(adapter_output_path)
print(f"✅ LoRA adapter saved to: {adapter_output_path}")


Generating train split: 0 examples [00:00, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Map:   0%|          | 0/559 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/559 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mlasyaedunuri[0m ([33mlasyaedunuri-university-of-north-carolina-at-charlotte[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)


Step,Training Loss
10,3.5699
20,3.2558
30,3.022
40,2.9305
50,2.8842
60,2.7724
70,2.7727
80,2.7358
90,2.713
100,2.7129


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


✅ LoRA adapter saved to: /content/drive/MyDrive/mistral-v02-sop-explainer-lora-adapter


### 🎯 Load Fine-Tuned Mistral-7B Model with LoRA Adapter for Inference

This section loads the base Mistral-7B-Instruct-v0.2 model and attaches the fine-tuned LoRA adapter trained on SOP + reasoning data.

#### Steps:

- **Load Tokenizer**: Load the tokenizer from the original Mistral model.
- **Load Base Model**: Load the base Mistral-7B model onto the appropriate device.
- **Attach LoRA Adapter**: Load the trained LoRA adapter weights and apply them on top of the base model.
- **Prepare for Inference**: Set the model to evaluation mode (`model.eval()`).

✅ After this step, the model is ready to generate SOPs and explanations based on new user prompts.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model_id = "mistralai/Mistral-7B-Instruct-v0.2"
adapter_path = "/content/drive/MyDrive/mistral-v02-sop-explainer-lora-adapter"

# Load tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, device_map="auto", torch_dtype="auto")

# Load fine-tuned LoRA
model = PeftModel.from_pretrained(base_model, adapter_path)
model.eval()


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32000, 4096)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): Linear

### 🧠 Inference: Generate SOP + Reasoning with Fine-Tuned Model

This section uses the loaded model to generate an SOP and its stylistic reasoning based on a new user prompt.

#### Steps:

- **Define Prompt**: Create a prompt formatted with `[INST] ... [/INST]`.
- **Tokenize Prompt**: Convert the text into input tokens suitable for the model.
- **Generate Output**: Use the model to generate a response, with sampling parameters for diversity.
  - `max_new_tokens=700`: Limit generation to 700 tokens.
  - `temperature=0.7`: Control randomness (higher = more creative).
  - `top_p=0.9`: Nucleus sampling for diverse but focused outputs.
  - `do_sample=True`: Enable sampling (rather than greedy decoding).
- **Decode and Print**: Convert generated tokens back into human-readable text and display the output.

✅ This will generate both an SOP and an explanation styled according to the fine-tuned model.

In [None]:
import torch

# Define your prompt
prompt="<s>[INST] Write me an SOP for a Master's in Data Science. Then explain your stylistic choices. [/INST]"

# Tokenize the prompt
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate output
output = model.generate(
    **inputs,
    max_new_tokens=700,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id  # stop at end of text
)

# Decode and print the output
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[INST] Write me an SOP for a Master's in Data Science. Then explain your stylistic choices. [/INST]Statement of Purpose The world today is governed by data, with each individual, company and government entity striving to make the most of it. Data is the new oil of the world and is increasingly becoming more valuable as the world becomes more digital. In my opinion, the most significant leap humanity has ever made towards progress was the digital revolution. It brought about a paradigm shift in the way we lived and operated, and it is only fair to accept that the role of data in making it happen is undeniably significant. With the world becoming more data driven, it is only fair to accept that data science has become the most sought after and respected profession of today. I have always admired the power of data and the impact it has had on businesses, governments and individuals. I have always believed that data science is an intriguing field of study and work, with the potential to br

### 🧠 Inference with Example Guidance (Few-Shot Prompting)

In this setup, we give the model an example input that includes:
- A **sample SOP** for a different field (Sustainable Architecture).
- A **sample reasoning explanation** describing stylistic choices made.

Then, in the same prompt, we ask the model to:
- **Write a new SOP** for a different topic (Data Science).
- **Explain** its stylistic choices for the newly generated SOP.

We used **few-shot prompting** — providing examples in the prompt to guide the model's generation style and expected output structure.

✅ The goal is to encourage the model to output both the SOP and an explanation that mimics the style shown in the example.


In [None]:
prompt = """<s>[INST] Write me an SOP for a Master's in Sustainable Architecture. Then explain your stylistic choices. [/INST]
STATEMENT OF PURPOSE
Growing up surrounded by the richness of nature...
🧠 Explanation:
- Uses descriptive, poetic tone to reflect personal connection to sustainability.
- Follows chronological structure with strong personal motivation.

[INST] Write me an SOP for a Master's in Data Science. Then explain your stylistic choices. [/INST]
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(
    **inputs,
    max_new_tokens=700,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[INST] Write me an SOP for a Master's in Sustainable Architecture. Then explain your stylistic choices. [/INST]
STATEMENT OF PURPOSE
Growing up surrounded by the richness of nature...
🧠 Explanation:
- Uses descriptive, poetic tone to reflect personal connection to sustainability.
- Follows chronological structure with strong personal motivation.

[INST] Write me an SOP for a Master's in Data Science. Then explain your stylistic choices. [/INST]
STATEMENT OF PURPOSE
Growing up, I have always believed in the power of data and its potential to bring about change. It was my grandmother, an astrologer, who instilled the power of data in me at a young age. She used to say, "The stars and the planets, their positioning, their alignment, and the way they interacted with one another, all of them have their own unique stories to tell. If one is able to decipher them, it could bring about a change that could be beneficial for the future." She was an excellent astrologer, always able to predict th

### 📝 Inference with Explicit Prompt for SOP + Reasoning

In this approach, we prompt the model to generate both:
- A Statement of Purpose (SOP) for a Master's in Data Science.
- An explanation for the stylistic choices made while writing the SOP.

The prompt explicitly includes "🧠 Explanation:" to signal the model to provide a reasoning section after the SOP.

#### Key details:
- **Prompt starts with `[INST]` and ends with `[/INST]`** for consistency with training format.
- **🧠 Explanation:** is included inside the prompt to encourage the model to output reasoning naturally.
- **No example SOP is provided beforehand** — this is a *zero-shot to one-shot* style prompt relying on instructions only.

✅ This setup is designed to test if the model can generate both outputs based purely on instruction without needing example demonstrations.

In [None]:
prompt = "<s>[INST] Write me an SOP for a Master's in Data Science. Then explain your stylistic choices. [/INST] 🧠 Explanation:\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(
    **inputs,
    max_new_tokens=700,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[INST] Write me an SOP for a Master's in Data Science. Then explain your stylistic choices. [/INST] 🧠 Explanation:

The following is a statement of purpose for pursuing a masters degree in Computer Science with a focus on Artificial Intelligence.

Growing up, I have always had a fascination for how machines could learn and adapt to their environment. This interest began at an early age, when I was introduced to a computer. I was amazed to see how the machine could calculate and process information in seconds. This interest grew stronger with time and developed into a passion. Today, I am proud to say that I have a strong foundation in computer science, a strong academic record and a clear vision of what I want to achieve in the future. I have always wanted to be a part of the ever growing field of computers, and more importantly, the advancements in artificial intelligence. I believe that the ability to create a machine that can learn and adapt to its environment is a monumental achiev

### 🧩 Inference with JSON-Formatted Output Request

In this approach, the model is prompted to generate a structured response in **JSON format**, containing:
- `"sop"`: the Statement of Purpose text.
- `"explanation"`: reasoning describing how the writing style matches the target author's style.

#### Key details:
- **Instruction-based prompt**: Clearly asks the model to produce SOP + explanation in a specific JSON format.
- **No examples provided**: Relies fully on clear instruction-following.
- **Sampling settings**:
  - `temperature=0.7`: Balanced creativity.
  - `top_p=0.9`: Diverse but focused generation.
  - `repetition_penalty=1.2`: Discourages repetitive outputs.
- **Max tokens set to 1000** to allow enough space for both the SOP and explanation.

✅ This format makes it easier to **parse**, **analyze**, and **evaluate** outputs systematically for downstream tasks.

In [None]:
prompt="Write an SOP for Masters in Architecture. Include both the SOP itself and an explanation of how the writing style matches my friend's style. Format your response as JSON with sop and explanation fields."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(
    **inputs,
    max_new_tokens=1000,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.2,
    eos_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Write an SOP for Masters in Architecture. Include both the SOP itself and an explanation of how the writing style matches my friend's style. Format your response as JSON with sop and explanation fields. { "sop": "I have always been captivated by the intricate ways in which structures and buildings have evolved over centuries. From the humble stone houses of ancient civilisations to the towering steel and glass structures of the modern era, architecture has always fascinated me. It is a discipline that not only requires an eye for aesthetics but also a mind for functionality and a heart for the people who will inhabit the spaces. I have always been impressed by the way the structures of the past have stood the test of time while the modern structures have kept up with the changing times. I have always wanted to contribute to this field and make a mark of my own by creating structures that are functional, beautiful and sustainable. I want to learn the intricacies of the subject, from the

### 🛠️ Inference with Output Parsing into SOP and Reasoning Sections

In this approach:
- The model is instructed to format its output with `[SOP]...[/SOP]` and `[REASONING]...[/REASONING]` sections.
- After generation, we **automatically parse** the response into:
  - 📄 SOP content
  - 🧠 Reasoning explanation
- **Error handling** is included to catch and display malformed outputs if the model does not follow the format exactly.

#### Key details:
- **Prompt formatting**: Clear use of `[INST]`, `[SOP]`, `[REASONING]`, and `[/INST]` markers.
- **Sampling settings** for creative but controlled output.
- **Graceful fallback**: If parsing fails, the full raw output is printed for manual inspection.

✅ This setup ensures that structured outputs are parsed cleanly for analysis, while still being robust to occasional model mistakes.

In [None]:
from transformers import AutoTokenizer
import torch

# Load tokenizer and set pad token
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer.pad_token = tokenizer.eos_token

prompt = (
    "<s>[INST] Write me an SOP for a Master's in Data Science. "
    "Then explain your stylistic choices. Format with [SOP] and [REASONING] sections. [/INST]</s>"
)

inputs = tokenizer(prompt, return_tensors="pt", padding=True).to("cuda")

output = model.generate(
    **inputs,
    max_new_tokens=700,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.2,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id
)

response = tokenizer.decode(output[0], skip_special_tokens=True)

if "[REASONING]" in response:
    try:
        sop = response.split("[SOP]")[1].split("[/SOP]")[0].strip()
        reasoning = response.split("[REASONING]")[1].split("[/REASONING]")[0].strip()
        print("📄 SOP:\n", sop)
        print("\n🧠 Reasoning:\n", reasoning)
    except IndexError:
        print("⚠️ Output malformed. Full response:\n", response)
else:
    print("⚠️ Could not find [REASONING] section. Output was:\n", response)


📄 SOP:
 and [REASONING] sections. [/INST][_SOP_]

🧠 Reasoning:
 sections. [/INST][_SOP_]
[SOP] Statement of Purpose The data science community has been growing at an exponential rate, with businesses leveraging the power of insights to gain a competitive edge in the marketplace. In my pursuit of learning more about this field, I have come across several interesting opportunities that can help me contribute towards creating meaningful outcomes while ensuring they are delivered in real time. With technology evolving every day, it is important to stay updated on tools and technologies that can enable us to create value in real time. My passion for data analytics and machine learning stems from watching how big corporations use these tools to understand their consumers better and serve them accordingly. It also fascinated me to see how predictive models can be used to forecast demand or supply patterns based on historical trends, which could be useful information for companies looking to s

### 📌 Observation

Even though the model did not generate a [REASONING] section as instructed, adding structured prompts like [SOP] and [REASONING] helped in another important way:

- The SOPs are now **ending properly**, without cutting off mid-sentence.
- The model seems to implicitly learn to **wrap up** the SOP before the reasoning section, leading to more complete and coherent outputs.

✅ Structured prompting improved the overall quality and completeness of the SOP, even if the reasoning generation was inconsistent.
