<a href="https://colab.research.google.com/github/hezarai/notebooks/blob/main/hezar/02_train_a_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install hezar

# Training a model in Hezar

In this notebook, we're gonna demonstrate a training walkthrough. Training a model in Hezar is pretty much like any other library or even simpler! As mentioned before, any model in Hezar is also a PyTorch module. So training a model is actually training a PyTorch model with some more cool features! Lets dive in.

In [1]:
from hezar.models import BertTextClassification, BertTextClassificationConfig
from hezar.data import Dataset
from hezar.trainer import Trainer, TrainerConfig
from hezar.preprocessors import Preprocessor

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
DATASET_PATH = "hezarai/sentiment-dksf"  # dataset path on the Hub
BASE_MODEL_PATH = "hezarai/bert-base-fa"  # used as model backbone weights and tokenizer

### Build the datasets

Lets load a dataset from Hub. For this example we use the Digikala/SnappFood comments datasets which is used for sentiment analysis.

In [3]:
train_dataset = Dataset.load(DATASET_PATH, split="train", tokenizer_path=BASE_MODEL_PATH)
eval_dataset = Dataset.load(DATASET_PATH, split="test", tokenizer_path=BASE_MODEL_PATH)

### Build the model

Choose a model for this task and build the model as you would normally do in Hezar (See [models overview](01_models_overview.ipynb))

In [4]:
model = BertTextClassification(BertTextClassificationConfig(id2label=train_dataset.config.id2label))

We also need the tokenizer for our model.

In [5]:
tokenizer = Preprocessor.load(BASE_MODEL_PATH)

### Training

Hezar comes with a built-in `Trainer` so that model training is as easy and straightforward as possible. As you might have guessed, in order to use a Trainer we first need to setup the config.

In [6]:
train_config = TrainerConfig(
    output_dir="bert-sentiment-fa",
    task="text_classification",
    device="cuda",
    init_weights_from=BASE_MODEL_PATH,
    batch_size=8,
    num_epochs=5,
    checkpoints_dir="checkpoints/",
    metrics=["accuracy", "f1"],
    num_dataloader_workers=0,
    seed=42,
    optimizer="adam",
    learning_rate=2e-5,
    lr_scheduler="reduce_lr_on_plateau",
    save_freq=1,
)

Notice that our model is a BERT model with random weights, but we want to finetune it for a simple task. So we need to load the pretrained language model weights. To do this, simply provide the `init_weights_from` parameter which takes a Hub ID to a model and loads the weights to our model. (Missing classification head is automatically ignored)

Now that we have our config, lets build the Trainer.

In [7]:
trainer = Trainer(
    config=train_config,
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    data_collator=train_dataset.data_collator,
    preprocessor=tokenizer,
)

Incompatible keys: []
Missing keys: ['classifier.weight', 'classifier.bias']



Aaaannnddd lets train!

In [8]:
trainer.train()


[1m******************** Training Info ********************[0m

  [1mOutput Directory[0m: `[3mbert-sentiment-fa[0m`
  [1mTask[0m: `[3mtext_classification[0m`
  [1mModel[0m: `[3mBertTextClassification[0m`
  [1mInit Weights[0m: `[3mhezarai/bert-base-fa[0m`
  [1mDevice(s)[0m: `[3mcuda[0m`
  [1mTraining Dataset[0m: `[3mTextClassificationDataset(path=hezarai/sentiment-dksf['train'], size=28602)[0m`
  [1mEvaluation Dataset[0m: `[3mTextClassificationDataset(path=hezarai/sentiment-dksf['test'], size=2315)[0m`
  [1mOptimizer[0m: `[3madam[0m`
  [1mInitial Learning Rate[0m: `[3m2e-05[0m`
  [1mLearning Rate Decay[0m: `[3m0.0[0m`
  [1mEpochs[0m: `[3m5[0m`
  [1mBatch Size[0m: `[3m8[0m`
  [1mNumber of Parameters[0m: `[3m118299651[0m`
  [1mNumber of Trainable Parameters[0m: `[3m118299651[0m`
  [1mMixed Precision[0m: `[3mFull (fp32)[0m`
  [1mMetrics[0m: `[3m['accuracy', 'f1'][0m`
  [1mCheckpoints Path[0m: `[3mbert-sentiment-fa/checkpo

Epoch: 1/5      100%|######################################################################| 3575/3575 [06:43<00:00,  8.87batch/s, loss=0.611]
Evaluating...   100%|######################################################################| 289/289 [00:07<00:00, 37.84batch/s, accuracy=0.839, f1=0.744, loss=0.411]





Epoch: 2/5      100%|######################################################################| 3575/3575 [06:39<00:00,  8.95batch/s, loss=0.464]
Evaluating...   100%|######################################################################| 289/289 [00:07<00:00, 39.09batch/s, accuracy=0.761, f1=0.657, loss=0.566]





Epoch: 3/5      100%|######################################################################| 3575/3575 [06:42<00:00,  8.89batch/s, loss=0.335]
Evaluating...   100%|######################################################################| 289/289 [00:07<00:00, 37.68batch/s, accuracy=0.871, f1=0.806, loss=0.371]





Epoch: 4/5      100%|######################################################################| 3575/3575 [06:45<00:00,  8.81batch/s, loss=0.216]
Evaluating...   100%|######################################################################| 289/289 [00:07<00:00, 38.70batch/s, accuracy=0.875, f1=0.809, loss=0.388]





Epoch: 5/5      100%|######################################################################| 3575/3575 [06:43<00:00,  8.86batch/s, loss=0.148]
Evaluating...   100%|######################################################################| 289/289 [00:07<00:00, 37.69batch/s, accuracy=0.848, f1=0.77, loss=0.58]  
Hezar (INFO): Training done!


So we trained the model for 5 epochs. As you can see, everything is verbosed during the process. After each epoch all metrics and weights are logged and saved. Tensorboard logs are saved to a folder called `runs` (you can change this default) and you can inspect it as usual:

In [None]:
%tensorboard --logdir runs/

And the weights are saved to `checkpoints` (you can change this default).

### Push to Hub

Now we can push our model along with some training specific configs to the Hub! 

In [None]:
trainer.push_to_hub("hezarai/bert-fa-sentiment-digikala-snappfood")