Adapter le modele Falcon-7B (ou un équivalent) pour ajouter un module "auto-critic". Concretement, cela consiste en premier à ajouter une sortie à derniere sequence de decodeur.

![LLM-Critic](LLM-AutoCritic.png)

Ensuite on fait l'entrainement sur le LLM figé (transfert learning) pour pretrain la partie "autoCritic", pour ensuite FineTuner le modele complet. On commence par l'étude de ce modele sans ce module.

## Import Model

In [None]:
import transformers

In [2]:
model = transformers.AutoModelForCausalLM.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True)
model

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

RWForCausalLM(
  (transformer): RWModel(
    (word_embeddings): Embedding(65024, 4544)
    (h): ModuleList(
      (0-31): 32 x DecoderLayer(
        (input_layernorm): LayerNorm((4544,), eps=1e-05, elementwise_affine=True)
        (self_attention): Attention(
          (maybe_rotary): RotaryEmbedding()
          (query_key_value): Linear(in_features=4544, out_features=4672, bias=False)
          (dense): Linear(in_features=4544, out_features=4544, bias=False)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): MLP(
          (dense_h_to_4h): Linear(in_features=4544, out_features=18176, bias=False)
          (act): GELU(approximate='none')
          (dense_4h_to_h): Linear(in_features=18176, out_features=4544, bias=False)
        )
      )
    )
    (ln_f): LayerNorm((4544,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=4544, out_features=65024, bias=False)
)

In [5]:
tokenizer = transformers.AutoTokenizer.from_pretrained("tiiuae/falcon-7b")
input_ = tokenizer("Girafatron is obsessed.\nDaniel: Hello, Girafatron!\nGirafatron:", return_tensors="pt")

In [6]:
output_ids = model.generate(input_.input_ids, do_sample=True)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Input length of input_ids is 25, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.


In [16]:
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

Girafatron is obsessed.
Daniel: Hello, Girafatron!
Girafatron: Hi


## Prepare Fine Tuning

In [31]:
import peft

In [19]:
model.gradient_checkpointing_enable()
model_kbit = peft.prepare_model_for_kbit_training(model)

In [20]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}")

In [32]:
config = peft.LoraConfig(r=8, lora_alpha=32, target_modules=["query_key_value"],  lora_dropout=0.05, bias="none", task_type="CAUSAL_LM")

model_kbit = peft.get_peft_model(model_kbit, config)
print_trainable_parameters(model_kbit)

trainable params: 2359296 || all params: 6924080000 || trainable%: 0.03407378308742822


## Prepare Data

In [33]:
import datasets

In [34]:
data = datasets.load_dataset("Abirate/english_quotes")
data_tokenized = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)

Downloading readme:   0%|          | 0.00/5.55k [00:00<?, ?B/s]

Downloading and preparing dataset json/Abirate--english_quotes to C:/Users/ffurfaro/.cache/huggingface/datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/647k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to C:/Users/ffurfaro/.cache/huggingface/datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

In [52]:
data.num_rows

{'train': 2508}

In [44]:
data_tokenized.data

{'train': MemoryMappedTable
 quote: string
 author: string
 tags: list<item: string>
   child 0, item: string
 input_ids: list<item: int32>
   child 0, item: int32
 token_type_ids: list<item: int8>
   child 0, item: int8
 attention_mask: list<item: int8>
   child 0, item: int8
 ----
 quote: [["“Be yourself; everyone else is already taken.”","“I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.”","“Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.”","“So many books, so little time.”","“A room without books is like a body without a soul.”",...,"“You'll stay with me?'Until the very end,' said James.”","“Thatâ€™s part of what I like about the book in some ways. It portrays death truthfully. You die in the middle of your life, in the middle of a sentence”","“I read a book one day and my whole life

## Train

In [42]:
# needed for falcon
tokenizer.pad_token = tokenizer.eos_token

trainer = transformers.Trainer(
    model=model_kbit,
    train_dataset=data_tokenized["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=10,
        learning_rate=2e-4,
        #fp16=True, # if cuda device
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

In [43]:
model_kbit.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

## Save Model

In [46]:
model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model  # Take care of distributed/parallel training
model_to_save.save_pretrained("outputs")

## Tuning Import

In [None]:
lora_config = LoraConfig.from_pretrained('outputs')
model_trained = get_peft_model(model_kbit, lora_config)