# Introduction

* Datasets:
    * https://huggingface.co/datasets/timdettmers/openassistant-guanaco/viewer/default/train?row=0
* Models:
    * https://huggingface.co/state-spaces/mamba-370m-hf

In [1]:
!pip install -U accelerate transformers trl datasets causal-conv1d>=1.2.0 mamba-ssm

In [2]:
import os
import torch

from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    pipeline,
    logging,
)
from trl import SFTTrainer

## Configuration

In [3]:
batch_size = 1
num_workers = os.cpu_count()
# max_steps = -1 for epoch-wise training.
# epochs = -1 for step-wise training.
# Both cannot be -1.
max_steps = -1
epochs = 3
bf16 = False
fp16 = True
gradient_accumulation_steps = 32
context_length = 1024
logging_steps = 50
save_steps = 50
learning_rate = 0.0001
model_name = 'state-spaces/mamba-370m-hf'
out_dir = 'outputs/mamba_370m_oasst_guanaco_sft'

## Load Dataset

In [4]:
dataset = load_dataset('timdettmers/openassistant-guanaco')



In [5]:
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 9846
    })
    test: Dataset({
        features: ['text'],
        num_rows: 518
    })
})


In [6]:
print(dataset['train']['text'][0])

### Human: Can you write a short introduction about the relevance of the term "monopsony" in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research.### Assistant: "Monopsony" refers to a market structure where there is only one buyer for a particular good or service. In economics, this term is particularly relevant in the labor market, where a monopsony employer has significant power over the wages and working conditions of their employees. The presence of a monopsony can result in lower wages and reduced employment opportunities for workers, as the employer has little incentive to increase wages or provide better working conditions.

Recent research has identified potential monopsonies in industries such as retail and fast food, where a few large companies control a significant portion of the market (Bivens & Mishel, 2013). In these industries, workers often face low wages, limited benefits, and reduced bargaining power, leading

## Model

In [7]:
if bf16:
    model = AutoModelForCausalLM.from_pretrained(model_name).to(dtype=torch.bfloat16)
else:
    model = AutoModelForCausalLM.from_pretrained(model_name)

In [8]:
print(model)
# Total parameters and trainable parameters.
total_params = sum(p.numel() for p in model.parameters())
print(f"{total_params:,} total parameters.")
total_trainable_params = sum(
    p.numel() for p in model.parameters() if p.requires_grad)
print(f"{total_trainable_params:,} training parameters.")

MambaForCausalLM(
  (backbone): MambaModel(
    (embeddings): Embedding(50280, 1024)
    (layers): ModuleList(
      (0-47): 48 x MambaBlock(
        (norm): MambaRMSNorm()
        (mixer): MambaMixer(
          (conv1d): Conv1d(2048, 2048, kernel_size=(4,), stride=(1,), padding=(3,), groups=2048)
          (act): SiLU()
          (in_proj): Linear(in_features=1024, out_features=4096, bias=False)
          (x_proj): Linear(in_features=2048, out_features=96, bias=False)
          (dt_proj): Linear(in_features=64, out_features=2048, bias=True)
          (out_proj): Linear(in_features=2048, out_features=1024, bias=False)
        )
      )
    )
    (norm_f): MambaRMSNorm()
  )
  (lm_head): Linear(in_features=1024, out_features=50280, bias=False)
)
371,516,416 total parameters.
371,516,416 training parameters.


## Tokenizer

In [9]:
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


## Training

In [10]:
if max_steps == -1 and epochs > 0:
    training_args = TrainingArguments(
        output_dir=f"{out_dir}/logs",
        evaluation_strategy='epoch',
        weight_decay=0.01,
        load_best_model_at_end=True,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        logging_strategy='steps',
        save_strategy='epoch',
        logging_steps=logging_steps,
        num_train_epochs=epochs,
        save_total_limit=2,
        bf16=bf16,
        fp16=fp16,
        report_to='tensorboard',
        dataloader_num_workers=num_workers,
        gradient_accumulation_steps=gradient_accumulation_steps,
        learning_rate=learning_rate,
        lr_scheduler_type='linear',
    )

if max_steps > 0 and epochs == -1:
    training_args = TrainingArguments(
        output_dir=f"{out_dir}/logs",
        evaluation_strategy='steps',
        weight_decay=0.01,
        load_best_model_at_end=True,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        logging_strategy='steps',
        save_strategy='steps',
        logging_steps=logging_steps,
        save_steps=save_steps,
        save_total_limit=2,
        bf16=bf16,
        fp16=fp16,
        report_to='tensorboard',
        max_steps=max_steps,
        dataloader_num_workers=num_workers,
        gradient_accumulation_steps=gradient_accumulation_steps,
        learning_rate=learning_rate,
        lr_scheduler_type='constant',
    )

In [11]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    dataset_text_field='text',
    max_seq_length=context_length,
    tokenizer=tokenizer,
    args=training_args,
    packing=True
)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [12]:
dataloader = trainer.get_train_dataloader()
for i, sample in enumerate(dataloader):
    print(tokenizer.decode(sample['input_ids'][0]))
    print('#'*50)
    if i == 5:
        break

 family medicine program.<|endoftext|>### Human: Explica la suma de vectores de manera simple explica las dos formas: la grafica y la numérica### Assistant: La suma de vectores es un concepto fundamental en matemáticas y física que describe cómo combinar dos o más vectores para formar un nuevo vector. Hay dos formas de representar la suma de vectores: gráfica y numérica.

1. Suma de vectores gráfica: La forma gráfica de la suma de vectores implica dibujar los vectores en un plano cartesiano y conectar el extremo del primer vector con el origen del segundo vector. El vector resultante es la línea que va desde el origen hasta el extremo del segundo vector.

2. Suma de vectores numérica: La forma numérica de la suma de vectores implica sumar las componentes de los vectores individuales. Por ejemplo, si tenemos dos vectores A = (a1, a2) y B = (b1, b2), entonces el vector resultante de su suma sería C = A + B = (a1 + b1, a2 + b2).

En ambas formas, la suma de vectores representa la combinac

In [13]:
history = trainer.train()

Epoch,Training Loss,Validation Loss
0,1.7361,1.744936
1,1.514,1.742877
2,1.3873,1.766077


There were missing keys in the checkpoint model loaded: ['lm_head.weight'].


In [14]:
model.save_pretrained(f"{out_dir}/best_model")
tokenizer.save_pretrained(f"{out_dir}/best_model")

('outputs/mamba_370m_oasst_guanaco_sft/best_model/tokenizer_config.json',
 'outputs/mamba_370m_oasst_guanaco_sft/best_model/special_tokens_map.json',
 'outputs/mamba_370m_oasst_guanaco_sft/best_model/tokenizer.json')

## Inference

In [1]:
from transformers import (
    AutoModelForCausalLM,
    logging,
    pipeline,
    AutoTokenizer
)

import torch

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [3]:
model = AutoModelForCausalLM.from_pretrained('outputs/mamba_370m_oasst_guanaco_sft/best_model/')
tokenizer = AutoTokenizer.from_pretrained('outputs/mamba_370m_oasst_guanaco_sft/best_model/')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [4]:
pipe = pipeline(
    task='text-generation',
    model=model,
    tokenizer=tokenizer,
    max_length=1024, # Prompt + new tokens to generate.
    device_map=device
)

In [5]:
logging.set_verbosity(logging.CRITICAL)

In [6]:
template = '### Human: {}### Assistant:'

In [9]:
prompt = "Give ten tips for staying healthy."
prompt = template.format(prompt)

outputs = pipe(
    prompt,
)
print(outputs[0]['generated_text'])

### Human: Give ten tips for staying healthy.### Assistant: Here are ten tips for staying healthy:

1. Eat a balanced diet: Eat a variety of foods, including whole grains, fruits, vegetables, lean proteins, healthy fats, and healthy carbohydrates.

2. Get enough sleep: Get enough sleep each night to help your body recover from the day's activities.

3. Exercise regularly: Regular exercise can help improve your health and prevent chronic diseases.

4. Eat a balanced diet: Eat a variety of foods, including whole grains, fruits, vegetables, lean proteins, healthy fats, and healthy carbohydrates.

5. Stay hydrated: Drink plenty of water to prevent dehydration and maintain a healthy body weight.

6. Stay healthy by following a healthy diet: Eating a balanced diet that includes whole grains, fruits, vegetables, lean proteins, healthy fats, and healthy carbohydrates can help prevent chronic diseases.

7. Stay physically active: Regular physical activity can help improve your health and preven