# Multi-task Training with Hugging Face Transformers


## Library setup

First up, we will install the libraries. 

<font color='red'>**Note: After running the following cell, you will need to restart your runtime for the installation to work properly.**</font>

In [1]:
import numpy as np
import torch
import torch.nn as nn
import transformers
import datasets

## Fetching our data



In [2]:
dataset_dict = {
    "main": datasets.load_dataset('csv', data_files={"train": "../data/train_initial.csv", "validation": "../data/dev_data.csv"}),
    "cds": datasets.load_dataset('csv', data_files={"train": "https://huggingface.co/datasets/hungchiayu/cds-dataset2-depression/resolve/main/trainDepression.csv", "validation": "https://huggingface.co/datasets/hungchiayu/cds-dataset2-depression/resolve/main/testDepression.csv"}),
}

Found cached dataset csv (/root/.cache/huggingface/datasets/csv/default-f3c0c2a7e7025318/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)


  0%|          | 0/2 [00:00<?, ?it/s]

Found cached dataset csv (/root/.cache/huggingface/datasets/csv/default-dc3caf837d284c81/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)


  0%|          | 0/2 [00:00<?, ?it/s]

We can show one example from each task.

In [3]:
for task_name, dataset in dataset_dict.items():
    print(task_name)
    print(dataset_dict[task_name]["train"][0])
    print()

main
{'pid': 'train_pid_1', 'text data': 'Waiting for my mind to have a breakdown once the “New Year” feeling isn’t there anymore : I don’t know about anyone else, but I’m a little bit worried that I’ll go back to being depressed in a few days time or something. Last year, I tried not to have any breakdowns for the start of 2019. A mere 10 days later, I broke down crying. I wasn’t the same for that entire year. Up until December, where I was ok that month. Now I just wait... it’s a weird way to act and feel, but it feels a bit normal.', 'label': 'moderate'}

cds
{'author': 'bayouekko', 'created_utc': '2014-07-10 01:06:11', 'id': '2a94lf', 'kind': 't3', 'selftext': "My grandparents are 86 &amp; 88 years old. They live off of social security income, after having worked hard their entire lives. They loved to travel once retired, but with the cost of living and their low monthly income. My grandparents are well aware that they may not have another summer where their health is good enough t

In [4]:
# columns of input and label
#Fromat
# [text_column, label_column]
mapping_column_dict = {
    "main": ['text data', 'label'],
    "cds":  ['text', 'label_name']
}

num_classes_dict = {}
label_list_dict = {}

for dataset in dataset_dict:
    label_list = dataset_dict[dataset]['train'].unique(mapping_column_dict[dataset][1])
    label_list.sort()
    label_list_dict[dataset] = label_list
    num_labels = len(label_list)
    num_classes_dict[dataset] = num_labels
    
    #Converts str label to int label encoding
    for split in dataset_dict[dataset]:
        dataset_dict[dataset][split] = dataset_dict[dataset][split].map(lambda x: {'label_encoding': label_list.index(x[mapping_column_dict[dataset][1]])})
    mapping_column_dict[dataset][1] = 'label_encoding'
    
num_classes_dict

Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-f3c0c2a7e7025318/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-be362fb5bdca4170.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-f3c0c2a7e7025318/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-6f007ab025c4c313.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-dc3caf837d284c81/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-7085b984e1da88d4.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-dc3caf837d284c81/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-adb0e9645f702d94.arrow


{'main': 3, 'cds': 6}

In [5]:
label_list_dict

{'main': ['moderate', 'not depression', 'severe'],
 'cds': ['ADHD', 'Anxiety', 'Bipolar', 'Depression', 'EDAnonymous', 'PTSD']}

In [6]:
mapping_column_dict

{'main': ['text data', 'label_encoding'], 'cds': ['text', 'label_encoding']}

## Creating a Multi-task Model

Next up, we are going to create a multi-task model. 

First, we define our `MultitaskModel` class:

In [7]:
class MultitaskModel(transformers.PreTrainedModel):
    def __init__(self, encoder, taskmodels_dict):
        """
        Setting MultitaskModel up as a PretrainedModel allows us
        to take better advantage of Trainer features
        """
        super().__init__(transformers.PretrainedConfig())

        self.encoder = encoder
        self.taskmodels_dict = nn.ModuleDict(taskmodels_dict)

    @classmethod
    def create(cls, model_name, model_type_dict, model_config_dict):
        """
        This creates a MultitaskModel using the model class and config objects
        from single-task models. 

        We do this by creating each single-task model, and having them share
        the same encoder transformer.
        """
        shared_encoder = None
        taskmodels_dict = {}
        for task_name, model_type in model_type_dict.items():
            model = model_type.from_pretrained(
                model_name, 
                config=model_config_dict[task_name],
            )
            if shared_encoder is None:
                shared_encoder = getattr(model, cls.get_encoder_attr_name(model))
            else:
                setattr(model, cls.get_encoder_attr_name(model), shared_encoder)
            taskmodels_dict[task_name] = model
        return cls(encoder=shared_encoder, taskmodels_dict=taskmodels_dict)

    @classmethod
    def get_encoder_attr_name(cls, model):
        """
        The encoder transformer is named differently in each model "architecture".
        This method lets us get the name of the encoder attribute
        """
        model_class_name = model.__class__.__name__
        if model_class_name.startswith("Bert"):
            return "bert"
        elif model_class_name.startswith("Roberta"):
            return "roberta"
        elif model_class_name.startswith("Albert"):
            return "albert"
        else:
            raise KeyError(f"Add support for new model {model_class_name}")

    def forward(self, task_name, **kwargs):
        return self.taskmodels_dict[task_name](**kwargs)

As described above, the `MultitaskModel` class consists of only two components - the shared "encoder", a dictionary to the individual task models. Now, we can simply create the corresponding task models by supplying the invidual model classes and model configs. We will use Transformers' AutoModels to further automate the choice of model class given a model architecture (in our case, let's use `roberta-base`).

In [8]:
model_name = "models/roberta-large-mental-health-v1"

model_type_dict = {}
for dataset in dataset_dict:
    model_type_dict[dataset] = transformers.AutoModelForSequenceClassification
    
model_config_dict = {}
for dataset in dataset_dict:
    model_config_dict[dataset] = transformers.AutoConfig.from_pretrained(model_name, num_labels=num_classes_dict[dataset])

multitask_model = MultitaskModel.create(
    model_name=model_name,
    model_type_dict=model_type_dict,
    model_config_dict=model_config_dict
)

Some weights of the model checkpoint at models/roberta-large-mental-health-v1 were not used when initializing RobertaForSequenceClassification: ['lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at models/roberta-large-mental-health-v1 and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.weight', 'classifier.out_

To confirm that all three task-models use the same encoder, we can check the data pointers of the respective encoders. In this case, we'll check that the word embeddings in each model all point to the same memory location.

In [9]:
if 'roberta' in model_name:
    print('multitask', multitask_model.encoder.embeddings.word_embeddings.weight.data_ptr())
    for dataset in dataset_dict:
        print(dataset, multitask_model.taskmodels_dict[dataset].roberta.embeddings.word_embeddings.weight.data_ptr())
    #print(multitask_model.taskmodels_dict["external"].roberta.embeddings.word_embeddings.weight.data_ptr())
    # print(multitask_model.taskmodels_dict["commonsense_qa"].roberta.embeddings.word_embeddings.weight.data_ptr())
else:
    print("Exercise for the reader: add a check for other model architectures =)")

multitask 139719428857920
main 139719428857920
cds 139719428857920


## Processing our task data


In [10]:
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

In [11]:
max_length = 512

convert_func_dict = {}

def generate_convert_func(input_column, label_column):
    def convert_to_main_features(example_batch):
        inputs = example_batch[input_column]
        features = tokenizer.batch_encode_plus(
            inputs, max_length=max_length, pad_to_max_length=True
        )
        features["labels"] = example_batch[label_column]
        return features
    return convert_to_main_features

for dataset in dataset_dict:
    convert_func_dict[dataset] = generate_convert_func(mapping_column_dict[dataset][0], mapping_column_dict[dataset][1])

"""
def convert_to_main_features(example_batch):
    inputs = example_batch['text']
    features = tokenizer.batch_encode_plus(
        inputs, max_length=max_length, pad_to_max_length=True
    )
    features["labels"] = example_batch["label"]
    return features

def convert_to_external_features(example_batch):
    inputs = example_batch['text data']
    features = tokenizer.batch_encode_plus(
        inputs, max_length=max_length, pad_to_max_length=True
    )
    features["labels"] = example_batch["label"]
    return features


convert_func_dict = {
    "main": convert_to_main_features,
    "external": convert_to_external_features,
    # "commonsense_qa": convert_to_commonsense_qa_features,
}
"""

'\ndef convert_to_main_features(example_batch):\n    inputs = example_batch[\'text\']\n    features = tokenizer.batch_encode_plus(\n        inputs, max_length=max_length, pad_to_max_length=True\n    )\n    features["labels"] = example_batch["label"]\n    return features\n\ndef convert_to_external_features(example_batch):\n    inputs = example_batch[\'text data\']\n    features = tokenizer.batch_encode_plus(\n        inputs, max_length=max_length, pad_to_max_length=True\n    )\n    features["labels"] = example_batch["label"]\n    return features\n\n\nconvert_func_dict = {\n    "main": convert_to_main_features,\n    "external": convert_to_external_features,\n    # "commonsense_qa": convert_to_commonsense_qa_features,\n}\n'

Now that we have defined the above functions, we can use `dataset.map` to apply the functions over our entire datasets.

In [12]:
"""
columns_dict = {
    "main": ['input_ids', 'attention_mask', 'labels'],
    "external": ['input_ids', 'attention_mask', 'labels'],
    # "commonsense_qa": ['input_ids', 'attention_mask', 'labels'],
}
"""
columns_dict = {}
for dataset in dataset_dict:
    columns_dict[dataset] = ['input_ids', 'attention_mask', 'labels']

features_dict = {}
for task_name, dataset in dataset_dict.items():
    features_dict[task_name] = {}
    for phase, phase_dataset in dataset.items():
        features_dict[task_name][phase] = phase_dataset.map(
            convert_func_dict[task_name],
            batched=True,
            load_from_cache_file=False,
        )
        print(task_name, phase, len(phase_dataset), len(features_dict[task_name][phase]))
        features_dict[task_name][phase].set_format(
            type="torch", 
            columns=columns_dict[task_name],
        )
        print(task_name, phase, len(phase_dataset), len(features_dict[task_name][phase]))

Map:   0%|          | 0/7201 [00:00<?, ? examples/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


main train 7201 7201
main train 7201 7201


Map:   0%|          | 0/3245 [00:00<?, ? examples/s]

main validation 3245 3245
main validation 3245 3245


Map:   0%|          | 0/14404 [00:00<?, ? examples/s]

cds train 14404 14404
cds train 14404 14404


Map:   0%|          | 0/3602 [00:00<?, ? examples/s]

cds validation 3602 3602
cds validation 3602 3602


## Preparing a multi-task data loader and Trainer

In [13]:
import dataclasses
from torch.utils.data.dataloader import DataLoader
from transformers.data.data_collator import DefaultDataCollator, InputDataClass
from torch.utils.data.sampler import RandomSampler
from typing import List, Union, Dict
from transformers import default_data_collator
from torch.utils.data import Sampler

class NLPDataCollator(DefaultDataCollator):
    """
    Extending the existing DataCollator to work with NLP dataset batches
    """
    def collate_batch(self, features: List[Union[InputDataClass, Dict]]) -> Dict[str, torch.Tensor]:
        first = features[0]
        if isinstance(first, dict):
          # NLP data sets current works presents features as lists of dictionary
          # (one per example), so we  will adapt the collate_batch logic for that
          if "labels" in first and first["labels"] is not None:
              if first["labels"].dtype == torch.int64:
                  labels = torch.tensor([f["labels"] for f in features], dtype=torch.long)
              else:
                  labels = torch.tensor([f["labels"] for f in features], dtype=torch.float)
              batch = {"labels": labels}
          for k, v in first.items():
              if k != "labels" and v is not None and not isinstance(v, str):
                  batch[k] = torch.stack([f[k] for f in features])
          return batch
        else:
          # otherwise, revert to using the default collate_batch
          return default_data_collator().collate_batch(features)


class StrIgnoreDevice(str):
    """
    This is a hack. The Trainer is going call .to(device) on every input
    value, but we need to pass in an additional `task_name` string.
    This prevents it from throwing an error
    """
    def to(self, device):
        return self


class DataLoaderWithTaskname:
    """
    Wrapper around a DataLoader to also yield a task name
    """
    def __init__(self, task_name, data_loader):
        self.task_name = task_name
        self.data_loader = data_loader

        self.batch_size = data_loader.batch_size
        self.dataset = data_loader.dataset

    def __len__(self):
        return len(self.data_loader)
    
    def __iter__(self):
        for batch in self.data_loader:
            batch["task_name"] = StrIgnoreDevice(self.task_name)
            yield batch


class MultitaskDataloader:
    """
    Data loader that combines and samples from multiple single-task
    data loaders.
    """
    def __init__(self, dataloader_dict, sample_type='size-proportional'):
        self.dataloader_dict = dataloader_dict
        self.num_batches_dict = {
            task_name: len(dataloader) 
            for task_name, dataloader in self.dataloader_dict.items()
        }
        self.task_name_list = list(self.dataloader_dict)
        self.dataset = [None] * sum(
            len(dataloader.dataset) 
            for dataloader in self.dataloader_dict.values()
        )
        
        self.sample_type = sample_type
        
    def _get_infinite_generator(self, dataloader):
        while True:
            for data in dataloader:
                yield data

    def __len__(self):
        if sample == 'size-proportional:
            return sum(self.num_batches_dict.values())
        elif sample == "main-uniform":
            return int(self.num_batches_dict['main']*len(self.num_batches_dict))
        
    def __iter__(self):
        """
        For each batch, sample a task, and yield a batch from the respective
        task Dataloader.

        We use size-proportional sampling, but you could easily modify this
        to sample from some-other distribution.
        """
        if self.sample_type == 'size-proportional':
            dataloader_iter_dict = {
                task_name: iter(dataloader) 
                for task_name, dataloader in self.dataloader_dict.items()
            }
            
            task_choice_list = []
            
            for i, task_name in enumerate(self.task_name_list):
                task_choice_list += [i] * self.num_batches_dict[task_name]
                
            task_choice_list = np.array(task_choice_list)
            np.random.shuffle(task_choice_list)
            
            for task_choice in task_choice_list:
                task_name = self.task_name_list[task_choice]
                yield next(dataloader_iter_dict[task_name]) 
        elif self.sample_type == "main-uniform":
            if not hasattr(self, 'dataloader_iter_dict'):
                self.dataloader_iter_dict = {
                    task_name: self._get_infinite_generator(dataloader) 
                    for task_name, dataloader in self.dataloader_dict.items()
                    if task_name != 'main'
                }
            self.dataloader_iter_dict['main'] = iter(dataloader)
            
            task_choice_list = []
            
            for i, task_name in enumerate(self.task_name_list):
                task_choice_list += [i] * self.num_batches_dict['main']
                
            task_choice_list = np.array(task_choice_list)
            np.random.shuffle(task_choice_list)
            
            for task_choice in task_choice_list:
                task_name = self.task_name_list[task_choice]
                yield next(self.dataloader_iter_dict[task_name])
        elif:
            raise Exception(f"Invalid sample_type {self.sample_type} for class MultitaskDataloader")

            
def get_single_train_dataloader(task_name, train_dataset, train_batch_size=32, data_collator=NLPDataCollator()):
        """
        Create a single-task data loader that also yields task names
        """
        if train_dataset is None:
            raise ValueError("Trainer: training requires a train_dataset.")
        
        train_sampler = (
            RandomSampler(train_dataset)
        )

        data_loader = DataLoaderWithTaskname(
            task_name=task_name,
            data_loader=DataLoader(
              train_dataset,
              batch_size=train_batch_size,
              sampler=train_sampler,
              collate_fn=data_collator.collate_batch,
            ),
        )

        return data_loader

In [14]:
import accelerate
from accelerate import Accelerator
from transformers import get_scheduler
import math 

#Hyperparameters
learning_rate = 2e-6
epochs = 8
train_batch_size = 8
gradient_accumulation_steps = 1
weight_decay = 0.0
eps=1e-08
num_warmup_steps=200
scheduler_type="linear"
max_grad_norm = 1.0
output_path = './multitask/multitask_large'
early_stopping_patience: 2
early_stopping_threshold: 0.025
sample_type="main-uniform"

In [15]:
optimizer_grouped_parameters = [
        {
            "params": [p for n, p in multitask_model.named_parameters()],
            "weight_decay": weight_decay,
        },

    ]
optimizer = torch.optim.AdamW(optimizer_grouped_parameters, lr=learning_rate, eps=eps)

train_dataset = {
    task_name: dataset["train"] 
    for task_name, dataset in features_dict.items()
}
train_dataloader = MultitaskDataloader({
            task_name: get_single_train_dataloader(task_name, task_dataset, train_batch_size=train_batch_size)
            for task_name, task_dataset in train_dataset.items()
        }, sample_type=sample_type)

In [16]:
eval_dataset = {
    "main": features_dict["main"]["validation"] 
}
eval_dataloader = MultitaskDataloader({
            "main": get_single_train_dataloader("main", features_dict["main"]["validation"], train_batch_size=train_batch_size )
        })

In [17]:
num_training_steps = epochs * math.ceil(len(train_dataloader) / gradient_accumulation_steps)
lr_scheduler = get_scheduler(
    name=scheduler_type,
    optimizer=optimizer,
    num_warmup_steps=num_warmup_steps,
    num_training_steps=num_training_steps,
)

## Time to train!


In [18]:
from tqdm.auto import tqdm
import os
import sklearn

os.environ["CUDA_VISIBLE_DEVICES"]="2"

#device = torch.device("cuda:2") if torch.cuda.is_available() else torch.device("cpu")

accelerator = Accelerator(device_placement=True, gradient_accumulation_steps=gradient_accumulation_steps, log_with="wandb", mixed_precision="fp16")

accelerator.init_trackers(
    project_name="depsign_multitask", 
    config={
        "learning_rate": learning_rate,
        "epochs": epochs,
        "train_batch_size": train_batch_size,
        "gradient_accumulation_steps":  gradient_accumulation_steps,
        "weight_decay":  weight_decay,
        "eps": eps,
        "num_warmup_steps": num_warmup_steps,
        "scheduler_type": scheduler_type,
        "max_grad_norm": max_grad_norm,
        "output_path": output_path,
        "max_seq_length": max_length,
        "sample_type": sample_type
    },
    init_kwargs={
        "wandb": {
            "entity": "dlb-depsign",
            "notes": "testing accelerate pipeline",
            "tags": ["cds"]
        }
    }
)

model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(
        multitask_model, optimizer, train_dataloader, eval_dataloader, lr_scheduler
    )

progress_bar = tqdm(range(num_training_steps), disable=not accelerator.is_local_main_process)

#model = accelerator.prepare(multitask_model.train())
#multitask_model.train().to(accelerator.device)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[34m[1mwandb[0m: Currently logged in as: [33meduagarcia[0m ([33mdlb-depsign[0m). Use [1m`wandb login --relogin`[0m to force relogin


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  0%|          | 0/21616 [00:00<?, ?it/s]

In [19]:
last_main_metric = 0

avg_loss = 0
global_step = 0

main_task_loss = 0
main_task_step = 0

other_tasks_loss = 0
other_tasks_step = 0

patience_count = 0

for epoch in range(epochs):
    model.train()
    for step, batch in enumerate(train_dataloader):
        with accelerator.accumulate(model):
            task_name = batch["task_name"]
            input_ids = batch["input_ids"].to(accelerator.device)
            attention_mask = batch["attention_mask"].to(accelerator.device)
            labels = batch["labels"].to(accelerator.device)
            outputs = model.taskmodels_dict[task_name](input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            #accelerator.print(loss)
            #accelerator.log({"train_loss": 1.12}, step=step)
            avg_loss += loss.item()
            if task_name == 'main':
                main_task_loss += loss.item()
                main_task_step += 1
            else:
                other_tasks_loss += loss.item()
                other_tasks_step += 1
            accelerator.backward(loss)
            if accelerator.sync_gradients:
                accelerator.clip_grad_norm_(model.parameters(), max_grad_norm)
            #if step % gradient_accumulation_steps == 0 or step == len(train_dataloader) - 1:
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()
            progress_bar.update(1)
            
        if accelerator.sync_gradients and accelerator.is_main_process:
            global_step += 1
            current_loss = accelerator.gather(avg_loss / global_step)
            accelerator.log({
                "train_loss": current_loss,
                "main_task_step": accelerator.gather(main_task_step),
                "other_tasks_step": accelerator.gather(other_tasks_step),
                "main_task_train_loss": accelerator.gather(main_task_loss / main_task_step),
                "other_tasks_train_loss": accelerator.gather(other_tasks_loss / other_tasks_step),
                'current_learning_rate': accelerator.gather(lr_scheduler.get_last_lr())
            }, step=global_step)
            #accelerator.print("train_loss:", current_loss)
            
    model.eval() 
    all_predictions = {}
    all_references = {}
    eval_loss = 0
    accelerator.print(f'Evaluating... Epoch: {epoch}, Global Step: {global_step}')
    for step, batch in enumerate(eval_dataloader):
        task_name = batch["task_name"]
        if task_name != "main":
            continue
        # print(task_name)
        input_ids = batch["input_ids"].to(accelerator.device)
        attention_mask = batch["attention_mask"].to(accelerator.device)
        labels = batch["labels"].to(accelerator.device)
        # print(len(labels), len(input_ids))
        #outputs = multitask_model.taskmodels_dict[task_name](input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        with torch.no_grad():
            outputs = model.taskmodels_dict[task_name](input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            # print(outputs.logits)
        loss = outputs.loss
        eval_loss += loss.item()
        predictions = outputs.logits.argmax(dim=-1) 
        # predictions, references = accelerator.gather((predictions, batch["labels"]))
        # print(predictions)
        predictions = predictions.cpu().tolist()
        references = labels.cpu().tolist()

        if task_name in all_predictions:
            all_predictions[task_name].extend(predictions)
        else:
            all_predictions[task_name] = predictions

        if task_name in all_references:
            all_references[task_name].extend(references)
        else:
            all_references[task_name] = references
            
    metrics = sklearn.metrics.classification_report(all_references["main"], all_predictions["main"], digits=4)
    accelerator.print(metrics)
    
    main_metric = sklearn.metrics.f1_score(all_references["main"], all_predictions["main"], average='macro')
    accelerator.log({"eval_loss": accelerator.gather(eval_loss / len(eval_dataloader)), "eval_f1 (macro)": main_metric}, step=global_step)
    
    if main_metric - last_main_metric < early_stopping_threshold:
        patience_count += 1
    
    if main_metric > last_main_metric:
        accelerator.print(f'Saving new best model at epoch {epoch}')
        accelerator.wait_for_everyone()
        unwrap_model = accelerator.unwrap_model(model)
        
        unwrap_model.save_pretrained(output_path)
        accelerator.save(unwrap_model.state_dict(), os.path.join(output_path, "state_dict.bin"))
        last_main_metric = main_metric
        with open( os.path.join(output_path, "report.txt"), 'w') as f:
            f.write(metrics+'\n')
            f.write(f'Epoch: {epoch}, Global Step: {global_step}')
    
    if patience_count >= early_stopping_patience:
        accelerator.print(f'Early stopping paticience {patience_count} reached, stop training...')
        break
        
accelerator.end_training()

Evaluating...
              precision    recall  f1-score   support

           0     0.7251    0.8451    0.7805      2169
           1     0.5000    0.3632    0.4208       848
           2     0.4950    0.2193    0.3040       228

    accuracy                         0.6752      3245
   macro avg     0.5734    0.4759    0.5017      3245
weighted avg     0.6501    0.6752    0.6530      3245

Saving new best model at epoch 0




Evaluating...
              precision    recall  f1-score   support

           0     0.7570    0.7441    0.7505      2169
           1     0.4725    0.5165    0.4935       848
           2     0.4839    0.3947    0.4348       228

    accuracy                         0.6601      3245
   macro avg     0.5711    0.5518    0.5596      3245
weighted avg     0.6635    0.6601    0.6612      3245

Saving new best model at epoch 1




Evaluating...
              precision    recall  f1-score   support

           0     0.7742    0.7049    0.7379      2169
           1     0.4621    0.5967    0.5208       848
           2     0.5257    0.4035    0.4566       228

    accuracy                         0.6555      3245
   macro avg     0.5873    0.5684    0.5718      3245
weighted avg     0.6752    0.6555    0.6614      3245

Saving new best model at epoch 2




Evaluating...
              precision    recall  f1-score   support

           0     0.7937    0.6385    0.7077      2169
           1     0.4530    0.6415    0.5310       848
           2     0.4247    0.5570    0.4820       228

    accuracy                         0.6336      3245
   macro avg     0.5571    0.6124    0.5736      3245
weighted avg     0.6787    0.6336    0.6457      3245

Saving new best model at epoch 3




Evaluating...
              precision    recall  f1-score   support

           0     0.7604    0.7681    0.7642      2169
           1     0.4926    0.4729    0.4826       848
           2     0.4750    0.5000    0.4872       228

    accuracy                         0.6721      3245
   macro avg     0.5760    0.5803    0.5780      3245
weighted avg     0.6704    0.6721    0.6711      3245

Saving new best model at epoch 4




Evaluating...
              precision    recall  f1-score   support

           0     0.7614    0.7487    0.7550      2169
           1     0.4934    0.4858    0.4896       848
           2     0.4368    0.5307    0.4792       228

    accuracy                         0.6647      3245
   macro avg     0.5639    0.5884    0.5746      3245
weighted avg     0.6685    0.6647    0.6663      3245

Evaluating...
              precision    recall  f1-score   support

           0     0.7665    0.7294    0.7475      2169
           1     0.4767    0.5307    0.5022       848
           2     0.4684    0.4868    0.4774       228

    accuracy                         0.6604      3245
   macro avg     0.5705    0.5823    0.5757      3245
weighted avg     0.6698    0.6604    0.6644      3245




KeyboardInterrupt



In [None]:
#model.save_pretrained("./multitask/multitask_large")
#torch.save(model.state_dict(), "./multitask/multitask_large/state_dict.bin")