### Extending bidirectional attention for LLMs via ULLME. 

In [3]:
from ullme.models import ULLME

model = ULLME(
            model_name_or_path="microsoft/phi-1_5",
            model_backbone_type="phi",
            )
model.cuda()
print("Model Architecture: ")
print(model)

Tokenizer does not have a pad token. We will use the bos token as pad token.
Model Architecture: 
ULLME(
  (model): BidirectionalPhiForCausalLM(
    (model): BidirectionalPhi(
      (embed_tokens): Embedding(51200, 2048)
      (embed_dropout): Dropout(p=0.0, inplace=False)
      (layers): ModuleList(
        (0-23): 24 x PhiDecoderLayer(
          (self_attn): PhiFlashAttention2(
            (q_proj): Linear(in_features=2048, out_features=2048, bias=True)
            (k_proj): Linear(in_features=2048, out_features=2048, bias=True)
            (v_proj): Linear(in_features=2048, out_features=2048, bias=True)
            (dense): Linear(in_features=2048, out_features=2048, bias=True)
            (rotary_emb): PhiRotaryEmbedding()
          )
          (mlp): PhiMLP(
            (activation_fn): NewGELUActivation()
            (fc1): Linear(in_features=2048, out_features=8192, bias=True)
            (fc2): Linear(in_features=8192, out_features=2048, bias=True)
          )
          (input_

We also support LoRA patching for parameter-effecient fine-tuning 

In [4]:
from ullme.models import ULLME

lora_model = ULLME(
            model_name_or_path="microsoft/phi-1_5",
            model_backbone_type="phi",
            lora_name="ullme-phi",
            loar_r=16,
            lora_alpha=32,
            )
lora_model.cuda()
print("Model Architecture: ")
print(lora_model)

Tokenizer does not have a pad token. We will use the bos token as pad token.
Model Architecture: 
ULLME(
  (model): PeftModelForCausalLM(
    (base_model): LoraModel(
      (model): BidirectionalPhiForCausalLM(
        (model): BidirectionalPhi(
          (embed_tokens): Embedding(51200, 2048)
          (embed_dropout): Dropout(p=0.0, inplace=False)
          (layers): ModuleList(
            (0-23): 24 x PhiDecoderLayer(
              (self_attn): PhiFlashAttention2(
                (q_proj): lora.Linear(
                  (base_layer): Linear(in_features=2048, out_features=2048, bias=True)
                  (lora_dropout): ModuleDict(
                    (ullme-phi): Dropout(p=0.1, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (ullme-phi): Linear(in_features=2048, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (ullme-phi): Linear(in_features=16, out_features=2048, bias=False

Compute sequence representaion with Bidirectional Extended LLMs

In [5]:
input_sentence = "This a example sentence."
model_inputs = model.tokenizer(
                            [input_sentence],
                            return_tensors='pt'
                            )
model_output = model(
                    input_ids=model_inputs['input_ids'].cuda(),
                    attention_mask=model_inputs['attention_mask'].cuda(),
                    is_generate=False
                    )
reps = model_output['reps']
print("Reps Shape: ", reps.shape)
print("Reps: ", reps)

Reps Shape:  torch.Size([1, 2048])
Reps:  tensor([[ 0.7930,  1.2344, -0.4590,  ..., -0.6289, -0.3242, -0.3066]],
       device='cuda:0', dtype=torch.bfloat16, grad_fn=<ToCopyBackward0>)


### Evaluation MTEB dataset via ULLME.

Here, we support almost LLM models available in HF. For example, we try to use top1 model in MTEB (dunzhang/stella_en_1.5B_v5)

In [6]:
from ullme.models import WrappedULLME
from ullme.eval import eval_mteb_dataset


model = WrappedULLME(model_name_or_path='dunzhang/stella_en_1.5B_v5')
print("Model Architecture: ")
print(model)

Model Architecture: 
WrappedULLME(
  (model): DataParallel(
    (module): ULLME(
      (model): Qwen2Model(
        (embed_tokens): Embedding(151646, 1536)
        (layers): ModuleList(
          (0-27): 28 x Qwen2DecoderLayer(
            (self_attn): Qwen2FlashAttention2(
              (q_proj): Linear(in_features=1536, out_features=1536, bias=True)
              (k_proj): Linear(in_features=1536, out_features=256, bias=True)
              (v_proj): Linear(in_features=1536, out_features=256, bias=True)
              (o_proj): Linear(in_features=1536, out_features=1536, bias=False)
              (rotary_emb): Qwen2RotaryEmbedding()
            )
            (mlp): Qwen2MLP(
              (gate_proj): Linear(in_features=1536, out_features=8960, bias=False)
              (up_proj): Linear(in_features=1536, out_features=8960, bias=False)
              (down_proj): Linear(in_features=8960, out_features=1536, bias=False)
              (act_fn): SiLU()
            )
            (input_layer

After loading the model, you need to select specific datasets and language subsets for evaluation. 

In [7]:
eval_result = eval_mteb_dataset(
                                model=model,
                                dataset_name='ArguAna',
                                langs=['eng'],
                                )
print("Eval Result: ", eval_result)

Loading results from results/Lusifer/dev/ArguAna.json
Results for ArguAna: {'eng': 0.48909}
Eval Result:  {'eng': 0.48909}


### Fine-tune LLMs with ULLME

We support various training strategies including Constrastive Loss, SFT, DPO and GRL. The following spinet inlustrate how to use ULLME for fine-tuning LLM for Dense Retrieval. 

``` python
from ullme.trainer import GradCacheTrainer
trainer = GradCacheTrainer(
                            con_loss_type='NTXentLoss',
                            gen_loss_type='sigmoid', # 'sft'
                            use_kl_loss=True
                            )
trainer.fit_epoch(
                model=model,
                train_loader=train_dataloader,
                )
```

Besides, ULLME also support GradCache, Cross-devices Constrastive loss, Multi-GPUs training, and orther rich features for further improve the training process. Please refer to the documentation and file ```ullme/train.py``` for further information. 