### Intalling all the packages needed for Fine tuning and inference from the model

#### What each package does
*   accelerate: This library helps with distributed training across multiple GPUs or machines. It simplifies launching training scripts on these setups and manages communication between devices. During fine-tuning, you might have a large dataset that benefits from processing on multiple GPUs for faster training.(In our case we offload some of the weight on the CPU for faster retrieval)
*   transformer: This package is provided by huggingface and contains all the necessary imports needed for training and inference. Eg: Trainer.trainer, BitsAndBytesConfig, pipeline etc.
*  datasets : This library simplifies loading and preparing various NLP datasets. It provides functionalities for downloading pre-processed datasets, splitting them into training, validation, and test sets, and applying transformations on the data for model training.
* bitsandbytes : This library is specifically designed for quantization of transformer models. Quantization reduces the model size and memory footprint by representing weights and activations using lower precision formats (e.g., 8-bit integers) compared to standard 32-bit floats. This can be crucial for deployment on resource-constrained devices during inference.
* peft : This package is used for generating LoraConfig of the model, which is crucial for combining with the base model, in order to generate the Lora Adapters.





In [None]:
 !pip install -q accelerate bitsandbytes transformers==4.39.3 datasets==2.17.0 peft==0.4.0 #trl==0.4.7

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.6/297.6 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.8/8.8 MB[0m [31m82.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m536.6/536.6 kB[0m [31m43.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m45.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━

### Importing all the necessary packages

In [None]:
import os
import torch
import transformers
from datetime import datetime
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import prepare_model_for_kbit_training, PeftModel

### Creating the accelerator object that is used to prepare the model for training (Offload some of the weight to the CPU)

In [None]:
from accelerate import FullyShardedDataParallelPlugin, Accelerator
from torch.distributed.fsdp.fully_sharded_data_parallel import FullOptimStateDictConfig, FullStateDictConfig

fsdp_plugin = FullyShardedDataParallelPlugin(
    state_dict_config=FullStateDictConfig(offload_to_cpu=True, rank0_only=False),
    optim_state_dict_config=FullOptimStateDictConfig(offload_to_cpu=True, rank0_only=False),
)

accelerator = Accelerator(fsdp_plugin=fsdp_plugin)

### Choosing the Model (This model can be hosted in Hugging Face or should be a path to a local folder).
#### This local folder should contain the config.json file and the safe tensors of the models, if these are not present the **AutoModelForCausalLM** that is used below to load the model will throw error.

In [None]:
model_name = "microsoft/Phi-3-mini-128k-instruct"

### BitsAndBytesConfig
#### Used for quantizations, to significantly reduce the size of the model to fit in GPU for training.
#### The quantization used is 4 bit quantization with double quantization where 0.4 bit quantization is used in the second run. "nf4" refers to a specific quantization scheme supported by bitsandbytes.
### AutoModelForCausalLM
#### model_name is used to look for the model. If the model_name is a path to a local folder containing a config.json file and model safe tensors. Else it will look for the model in Huggingface, if it does not exist as a local folder.


*   quantization_config : This parameter takes the BNB config that we created.
*   low_cpu_mem_usage : Setting this argument to True (which it is here) instructs the library to use optimizations that might reduce memory usage on the CPU during model loading. This can be helpful for systems with limited memory.
* trust_remote_code : Setting this argument to True (which it is here) signifies that you trust the code hosted by the Hugging Face model repository.



In [None]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    # attn_implementation='eager',
    low_cpu_mem_usage=True,
    trust_remote_code=True
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/3.35k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

### Tokenizer
A tokenizer plays a crucial role in working with pre-trained transformer models for Natural Language Processing (NLP) tasks. It performs several key functions:

* Vocabulary Building: The tokenizer builds a vocabulary of all the words or sub-word units (like characters) the pre-trained model was trained on.

* Text Preprocessing: It preprocesses the text input you provide for the model. This often involves steps like:

* Normalization: Converting text to lowercase or uppercase as needed.
Tokenization: Splitting the text into individual words or sub-words based on the vocabulary.
Special Token Addition: Adding special tokens (like padding markers or start/end of sentence markers) required by the model architecture.
Numerical Representation: The tokenizer converts the preprocessed text (words or sub-words) into numerical representations that the model can understand. This might involve converting each word to its corresponding index in the vocabulary or applying more complex techniques like Byte Pair Encoding (BPE).

By using the same tokenizer that the pre-trained model was trained with, you ensure that the input text is represented in the same way the model expects it. This is essential for the model to make accurate predictions on new data.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/3.18k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### General Inferencing

This function will be used to inference from the model. It takes in list of messages.
These messages are combined with some generation arguments that help in controlling the precision, uniqueness and randomness of the output.

**Pipeline** is used for inference from the model, it takes in the quantized model that we generate using BNB, along with the tokenizer.

In [None]:
def generate(messages):
  pipe = pipeline(
      "text-generation",
      model=model,
      tokenizer=tokenizer,
      trust_remote_code = True
  )

  generation_args = {
      "max_new_tokens": 500,
      "return_full_text": False,
      "temperature": 0.0,
      "do_sample": False,
  }

  output = pipe(messages, **generation_args)
  print(output[0]['generated_text'])

Here in the code below we load the content of a JSONl file, which contains questions and answers, but we pass only the question and generate the answers to the question and match them to the recorded answers in the JSONL file.

In [None]:
import json
question = []
with open("/content/formatted_test_set.jsonl","r") as f:
    question = [json.loads(line) for line in f]

messages = [
      {"role": "user", "content": question[0]["content"]+'\n'+question[1]["content"]}
]
generate(messages)

Answer: Yes, the contract does allow for termination for material breach.

To arrive at this conclusion, I analyzed the Master Service Agreement agreement. The key terms related to termination for material breach are found in the 'Termination' clause. It states that either party may terminate the agreement at any time if the other party is in material breach of any provision of the agreement. This indicates that the contract does indeed allow for termination in case of a material breach.

The thought process involved in this analysis was to identify the relevant sections of the agreement that pertain to termination and breach. I then looked for specific language that indicated the conditions under which termination could occur.

The core meaning of the question was to determine if the contract provides a provision for termination in case of a material breach. By focusing on this core meaning, I was able to find the relevant information in the agreement and conclude that the contract do

### Fine tuning the model

#### Loading the datasets to fine tune our model.
load_dataset from the dataset pacakage will take care of generating splits and creating a list of the data present in the dataset.

In [None]:
from datasets import load_dataset
train_dataset = load_dataset('json', data_files='formatted_train_set.jsonl', split='train')
validation_dataset = load_dataset('json', data_files='formatted_test_set.jsonl', split='train')

Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

#### In order to tokenize the datasets, the list needs to be converted and passed to the tokenizer. This function gets called iteratively when mapping each dataset value to the tokenizer.

In [None]:
def format_prompt(mess):
  text = str(mess)
  return text

### Breaking down each parameter for the AutoTokenizer.from_pretrained()


*   model_name (str): This argument should be the same name you used to load the model earlier.
*   This argument sets the side for padding sequences during tokenization. Here, it's set to "left". Padding is used when working with batches of sequences that have different lengths. The model expects all sequences in a batch to be the same size. Padding adds special tokens (usually the pad token) to shorter sequences to make them all the same length.
* add_eos_token (Optional[bool]): Setting this argument to True (which it is here) ensures the tokenizer adds a special "end-of-sentence" token (often denoted by `</s>`) to the end of each processed sequence. This token can be crucial for the model to understand where a sentence ends, especially for tasks like text generation.
* add_bos_token (Optional[bool]): Setting this argument to True (which it is here) ensures the tokenizer adds a special "beginning-of-sentence" token (often denoted by `<s>`)

##### tokenizer.pad_token = tokenizer.eos_token: This line assigns the value of the "end-of-sentence" token (eos_token) to the tokenizer's pad_token attribute. This is a common practice, especially for causal language models, where the end of a sentence often signifies a stopping point for generation. By setting them to the same value, you ensure that the model treats both the end of a sentence and padding tokens similarly during processing.

### generate_and_tokenize_prompt(prompt)
This function will take the prompt/input and tokenize it for training or inference of the model.<br>
**tokenizer()** takes:


*   truncation=True: This argument specifies that the tokenizer should truncate the prompt if its length exceeds a certain limit.
* max_length=512: This argument sets the maximum length for the tokenized prompt. Any tokens exceeding this limit will be removed during truncation.
* padding="max_length": This argument instructs the tokenizer to pad the prompt with the padding token (set earlier) up to the max_length if the prompt is shorter.

tokenized_train_dataset,tokenized_validation_dataset stores the dataset loaded earlier and converts them to tokenized format.



In [None]:
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    padding_side="left",
    add_eos_token=True,
    add_bos_token=True,
)
tokenizer.pad_token = tokenizer.eos_token

def generate_and_tokenize_prompt(prompt):
    result = tokenizer(
        format_prompt(prompt),
        truncation=True,
        max_length=512,
        padding="max_length",
    )
    result["labels"] = result["input_ids"].copy()
    return result

tokenized_train_dataset = train_dataset.map(generate_and_tokenize_prompt)
tokenized_validation_dataset = validation_dataset.map(generate_and_tokenize_prompt)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/618 [00:00<?, ? examples/s]

* Gradient checkpointing discards these unnecessary activations after they've been used to compute the gradients for the subsequent layer. This can significantly reduce memory usage during training, especially for large models.
* k-bit quantization, where "k" is a specific number of bits used to represent the weights in the model.

In [None]:
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)



1.   Trainable parameters: These are the parameters that have requires_grad set to True and are updated during the training process to optimize the model's performance.
2. Non-trainable parameters: These are parameters that have requires_grad set to False and are not updated during training. They might be fixed values or pre-trained parameters not intended for further adaptation.



In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

#### *LoraConfig
This is an important step before fine tunings, since the Lora Configuration is prepared here that tells that is used for PEFT.


*   r=32: This sets the rank of the update matrices used in LoRA. The rank determines the size and complexity of these matrices, which in turn affect the trade-off between parameter reduction and accuracy.
*   lora_alpha=64: This sets the LoRA scaling factor. This factor is used to scale the update matrices during the training process. It can influence the convergence behavior and final performance of the fine-tuned model.
* target_modules: This list specifies the target modules where LoRA will be applied. Here, it targets several key modules in a typical Transformer architecture:

* * "o_proj": Refers to the output projection layer.
* * "qkv_proj": Refers to the projection layer for queries, keys, and values in the multi-head attention mechanism.
* * "gate_up_proj": The projection layer for the update gate in a gated recurrent unit (GRU) or similar architecture.
* * "up_proj": Projection layer used for dimensionality increase in the transformer encoder or decoder.
* * "down_proj": Projection layer used for dimensionality reduction in the transformer encoder or decoder.
* * "lm_head": Refers to the language modeling head that predicts the next token in a sequence.

The model is then combined with the config to get a PEFT model that will be used for training.



In [None]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=32,
    lora_alpha=64,
    target_modules=[
        "o_proj",
        "qkv_proj",
        "gate_up_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

# Apply the accelerator. You can comment this out to remove the accelerator.
model = accelerator.prepare_model(model)

trainable params: 51456000 || all params: 2060596224 || trainable%: 2.497141332236082


## Trainer
* trainer = transformers.Trainer(...): This line creates a Trainer object from the Transformers library to handle the fine-tuning process. The trainer takes several arguments:

* * model: The pre-trained causal language model (model) you want to fine-tune.
* * train_dataset: The training dataset (tokenized_train_dataset) prepared earlier.
* * eval_dataset: The validation dataset (tokenized_validation_dataset) prepared earlier.
* * args: This argument is a TrainingArguments object that defines various hyperparameters for the training process. Here's a breakdown of the important arguments used:
* * * output_dir: This specifies the directory (./ + run_name) where the training outputs (model checkpoints, logs) will be saved.
* * * warmup_steps: This sets the number of warmup steps for the learning rate scheduler (usually a small value like 1).
* * * per_device_train_batch_size: This sets the batch size per device (GPU) for training. Here, it's set to 2.
* * * gradient_accumulation_steps: This accumulates gradients over multiple batches before updating the model weights. Here, it's set to 1 (no accumulation).
* * * gradient_checkpointing: This enables gradient checkpointing (already explained earlier) to reduce memory usage during backpropagation (set to True).
* * * fp16: This enables mixed precision training using 16-bit floating-point precision (might require additional configuration, set to True here).
* * * max_steps: This sets the maximum number of training steps (51). This defines the total duration of the fine-tuning process.
* * * learning_rate: This sets the learning rate (2e-4) for the optimizer. A smaller learning rate is often used for fine-tuning compared to pre-training from scratch.
* * * bf16: This enables training with bfloat16 precision if the GPU supports it (set to False here).
* * * optim: This specifies the optimizer used for training. Here, "paged_adamw_32bit" is likely a custom optimizer variant.
* * * logging_dir: This specifies the directory (./logs) for storing training logs.
* * * save_strategy: This sets the strategy for saving model checkpoints. Here, "steps" indicates saving checkpoints every logging step.
* * * save_steps: This refines the save_strategy by specifying the exact number of steps (50) between checkpoints.
* * * evaluation_strategy: This sets the strategy for model evaluation. Here, "epoch" indicates evaluating the model every logging step.
* * * eval_steps: This refines the evaluation_strategy by specifying the exact number of steps (51) between evaluations.
* * * do_eval: This ensures evaluation is performed at the end of training (set to True).
* * * logging_steps: This sets the frequency (5) for logging training information (step and loss).
* * * run_name: This assigns a unique name to the training run using the base model name, a timestamp, and potentially other information.

**data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)**: This line defines a data collator object (data_collator) specifically designed for language modeling tasks (not masked language modeling, hence mlm=False). The data collator handles tasks like batching and padding the training data for efficient processing by the model.

In [None]:
# Defining the output directory name
base_model_name = "phi3-128k"
run_name = base_model_name + "-" + "-mini-custom-finetune"
output_dir = "./" + run_name

tokenizer.pad_token = tokenizer.eos_token

trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_validation_dataset,
    args=transformers.TrainingArguments(
        output_dir=output_dir,
        warmup_steps=1,
        per_device_train_batch_size=2,
        gradient_accumulation_steps=1,
        gradient_checkpointing=True,
        fp16=True,
        max_steps=51,       # Training steps
        learning_rate=2e-4, # Want a smaller for finetuning
        bf16=False,         # Enable if GPU supports bfloat16
        optim="paged_adamw_32bit",
        logging_dir="./logs",         # Directory for storing logs
        save_strategy="steps",        # Save the model checkpoint every logging step
        save_steps=50,                # Save checkpoints every step
        evaluation_strategy="epoch", # Evaluate the model every logging step
        eval_steps=51,               # Evaluate and save checkpoints every 50 steps
        do_eval=True,                # Perform evaluation at the end of training
        logging_steps = 5,            # Log step and training loss info
        run_name=f"{run_name}-{datetime.now().strftime('%Y-%m-%d-%H-%M')}"          # Name of the W&B run (optional)
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


## Start the training process

In [None]:
trainer.train()



Epoch,Training Loss,Validation Loss
0,1.1111,1.098168




TrainOutput(global_step=51, training_loss=1.1898251912173103, metrics={'train_runtime': 375.9443, 'train_samples_per_second': 0.271, 'train_steps_per_second': 0.136, 'total_flos': 1182571205492736.0, 'train_loss': 1.1898251912173103, 'epoch': 0.02})

### Combining the Lora Adapter with the base model

In [None]:
ft_model = PeftModel.from_pretrained(model, "/content/phi3-128k--mini-custom-finetune/checkpoint-50")

### This is done so that the next time the Training is done the model+adapters are trained together and the newer adapters will also retain previous information.

In [None]:
model = ft_model

### The formatted_test_set.jsonl stores the questions and answers required for testing the inference of the model

In [None]:
import json
question = []
with open("/content/formatted_test_set.jsonl","r") as f:
    question = [json.loads(line) for line in f]

### Creating the eval_prompt that will be passed in the fine_tuned_pi() function that we be used for generating answers.

In [None]:
# eval_prompt =f"### Question:{question[1]['input_text']}."
eval_prompt = [
    {"role": "user", "content":question[0]["content"]+'\n'+question[1]["content"] }
]
# eval_prompt =f"Write a poem about a horse?"
model_input = tokenizer(str(eval_prompt), return_tensors="pt").to("cuda")

ft_model.eval()
# model.eval()

#### This function will use the pipeline from the transformers package combined with the generation arguments and prints the output(answer) from the model.

In [None]:
def fine_tuned_pi(eval):
  pipe = pipeline(
        "text-generation",
        model=ft_model,
        tokenizer=tokenizer,
        trust_remote_code = True
    )

  generation_args = {
      "max_new_tokens": 1024,
      "return_full_text": False,
      "temperature": 0.6,
      "top_p" : 0.9,
      "top_k" : 10,
      "do_sample": True,
  }

  output = pipe(eval, **generation_args)
  print(output[0]['generated_text'])

When the model is not fine tuned

In [None]:
fine_tuned_pi(eval_prompt)

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM', 'MusicgenForCausalLM', 'MusicgenMelodyForCausalLM', 'MvpForCausalLM', 'OpenLlam

Answer: Customer Robin Systems, Inc.

Details and Thought Process: 

1. The question asked for the "Customer Name" in the context of the Master Service Agreement.

2. In the Master Service Agreement, there are multiple mentions of a "Customer". 

3. The context of each mention needs to be considered. 

4. The first mention of "Customer" is on page 2 with the content: "Customer. An “Affiliate” means any entity under the control of Customer...". Here, "Customer" refers to Customer Robin Systems, Inc.

5. The second mention of "Customer" is on page 6 with the content: "... Customer hereby acknowledges that Customer is the sole and exclusive holder of all right, title and interest in and to the trademarks of Customer...". Again, the "Customer" refers to Customer Robin Systems, Inc.

6. The core meaning of the question is to identify the entity referred to as "Customer" in the agreement.

From the analysis, it is clear that the "Customer Name" in the context of the Master Service Agreement 

When the mode is fine tuned

In [None]:
fine_tuned_pi(eval_prompt)

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM', 'MusicgenForCausalLM', 'MusicgenMelodyForCausalLM', 'MvpForCausalLM', 'OpenLlam


The question asks for the "start date of this agreement". In the context of the Master Service Agreement, this refers to the date when the contractual obligations of the agreement come into effect.

Upon reviewing the provided Master Service Agreement, the relevant information can be found in the content of the fifth page. Here, it is stated that "Subscriptions to the Application Services commence on the Subscription Start Date specified in the applicable Commercial Agreement."

However, the actual date isn't explicitly mentioned in the fifth page. But, we can infer it from the other pages where the agreement details are provided. The fifth page refers to the "Commercial Agreement", which is detailed on the eighth page. The eighth page specifies that "Subscriptions commence on the Subscription Start Date specified in the applicable Commercial Agreement".

Unfortunately, the exact date isn't provided in the eighth page either. But, the ninth page mentions that "Master Subscription Agre

In [None]:
question[10]["content"]

" Read the question properly and analyze the Master Service Agreement agreement on the basis of this question.\\    \n    Question:'''What is the billing frequency for the amount or fees or payment to be paid by the customer to service provider according to this agreement?'''"

In [None]:
fine_tuned_pi(eval_prompt)

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM', 'MusicgenForCausalLM', 'MusicgenMelodyForCausalLM', 'MvpForCausalLM', 'OpenLlam

To answer this question, we need to carefully review the Master Service Agreement, specifically the section that outlines the payment terms. This section will typically detail the frequency of payments required by the customer to the service provider.

The billing frequency can be found in the "Payment Terms" or "Invoicing" section of the agreement. It may be stated as a monthly, quarterly, semi-annual, or annual payment schedule. 

For example, the agreement might specify that the customer is required to make payments on a monthly basis. This would mean that the customer is obligated to pay the agreed-upon fees or amounts to the service provider every month.

However, without the actual text of the Master Service Agreement, it's impossible to provide a specific answer to this question.
