# Introduction to different transformer models




Let us first import the pipeline function from the transformers module.

In [1]:
from transformers import pipeline

With this function, pre-processing, model estimation and post-processing are all done in the background. We only need to decide for the method to use.

In a first step, let us just use the pre-trained transformer models which can help us with an array of tasks.

For instance, we could classify the sentiment of a text by calling *pipeline("sentiment-analysis")*.


In [2]:
classifier = pipeline("sentiment-analysis")
classifier(["The economy is running pretty badly.", "That's a fantastic growth rate.", "Austerity measures put our economy under pressure"])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'NEGATIVE', 'score': 0.9997888207435608},
 {'label': 'POSITIVE', 'score': 0.999885082244873},
 {'label': 'NEGATIVE', 'score': 0.975756824016571}]

Like in GPT, we can also use the pipeline function to generate some text.

In [4]:
generation = pipeline("text-generation")
generation("Roses are red, violets are")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Roses are red, violets are blue\n\nRed vs Blue\n\nWhat color do the red vs blue snakes have on their faces?\n\nRed vs Blue has a slight preference for red-blue\n\nBoth species have different bodies'}]

...well, let's better use a more recent version of GPT for that.

But the pipeline method has another very neat feature, it can help us with translation. How do we make a reservation in Italian again?

For translations, we need to pick a model from [HUGGING FACE](https://huggingface.co/models?pipeline_tag=translation&sort=trending&search=it-en), just pick someone translating from the source to the target language.

In [7]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-it")
translator("Can I make a reservation for two at 7pm?")

[{'translation_text': 'Sind Sie der Ansicht, dass die Debatte über die folgenden Politiken geregelt oder im Gange ist?'}]

All of this is pretty neat but let's move on to classification tasks we used before.

Let's begin with a method called "zero-shot-classification", this is basically what GPT does if we ask to classify our statement. We can use this to classify different statements into pre-defined categories. Be aware, zero-shot-classification takes some more time on a CPU.

In [None]:
classifier = pipeline("zero-shot-classification")
classifier(["We need further investments into green growth.", "Show some humanity, all refugees are welcome.", "Stop the Russian war on Ukraine."],
           candidate_labels=["international relations", "economy", "migration"])


No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'sequence': 'We need further investments into green growth.',
  'labels': ['economy', 'migration', 'international relations'],
  'scores': [0.71065354347229, 0.22171807289123535, 0.06762837618589401]},
 {'sequence': 'Show some humanity, all refugees are welcome.',
  'labels': ['migration', 'economy', 'international relations'],
  'scores': [0.9619788527488708, 0.020190801471471786, 0.017830340191721916]},
 {'sequence': 'Stop the Russian war on Ukraine.',
  'labels': ['international relations', 'migration', 'economy'],
  'scores': [0.8622368574142456, 0.07648172229528427, 0.061281390488147736]}]

Generally, the outcome is encouraging. But maybe, with a better model, we could improve our predictions.

What happens under the hood? Let us look at the different steps that happen when calling the pipeline.

Each model has a different architecture and allows for a different amount of tokens. To make our lives easier, we can simply use the AutoTokenizer which transforms our data automatically in the correct format. For this purpose, let's use the auto-tokenizer on the model which was automatically used for classification above.

padding=True makes sure that every sentence has the lengths of the longest input vector (it adds some zeros for non-existent features).

truncation=True ensures that no input sentence has more tokens than allowed by the model.

In [8]:
from transformers import AutoTokenizer
checkpoint = "facebook/bart-large-mnli"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

sentences = ["We need further investments into green growth.", "Show some humanity, all refugees are welcome.", "Stop the Russian war on Ukraine."]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
print(inputs)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

{'input_ids': tensor([[    0,   170,   240,   617,  3227,    88,  2272,   434,     4,     2,
             1],
        [    0, 27477,   103,  9187,     6,    70,  4498,    32,  2814,     4,
             2],
        [    0, 22174,     5,  1083,   997,    15,  4174,     4,     2,     1,
             1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]])}


So, what we can see here is that each word gets assigned to a unique number. Same words get the same number. As the sequences are of different lengths, the shorter sequences get a 1 (empty) as filling token. The attention mask shows which tokens are considered - the filling tokens receive a 0 (not considered).

Another thing which happens automatically in the pipeline is that a pre-trained model is loaded. The model has a certain architecture which might look familiar if you think of our previous exercise in R with keras. For example, BART has several layers and activation functions. In addition, there's an encoder and decoder in this model.

In [10]:
from transformers import BartModel

model = BartModel.from_pretrained("facebook/bart-large-mnli")

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

In [None]:
print(model)

BartModel(
  (shared): Embedding(50265, 1024, padding_idx=1)
  (encoder): BartEncoder(
    (embed_tokens): Embedding(50265, 1024, padding_idx=1)
    (embed_positions): BartLearnedPositionalEmbedding(1026, 1024)
    (layers): ModuleList(
      (0-11): 12 x BartEncoderLayer(
        (self_attn): BartSdpaAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (activation_fn): GELUActivation()
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
    )
    (lay

One short note of caution: Always be aware of bias in these models. For instance, let us look as a masking task. In these tasks, we mask a word and let the model return a value that is often used in such a sequence.

In [11]:
unmasker = pipeline("fill-mask", model="bert-base-uncased")
result = unmasker("This man works as a [MASK].")
print([r["token_str"] for r in result])

result = unmasker("This woman works as a [MASK].")
print([r["token_str"] for r in result])

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

['carpenter', 'lawyer', 'farmer', 'businessman', 'doctor']
['nurse', 'maid', 'teacher', 'waitress', 'prostitute']


## Fine-tuning our dataset

Alright, working with pre-trained transformers worked pretty well. However, if you've got domain-specific knowledge, they may be too general. In this scenario, we want to train our own model as we've done before with quanteda and keras in R.

For this purpose, we need to load our own data. Moreover, fine-tuning a transformer model is computationally very intense. It makes sense to change the runtime to a GPU. Using a GPU is limited by the free version of Colab. You can buy credits to use it. It's billed by time, so make sure everything's running smoothly before you waste computing units.

To use our own data, we need to use one more module, called datasets.

In [1]:
!pip install datasets==2.6
!pip install -U accelerate
!pip install -U transformers

Collecting datasets==2.6
  Downloading datasets-2.6.0-py3-none-any.whl (441 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m441.5/441.5 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.6 (from datasets==2.6)
  Downloading dill-0.3.5.1-py2.py3-none-any.whl (95 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.8/95.8 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets==2.6)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets==2.6)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
Collecting responses<0.19 (from datasets==2.6)
  Downloading responses-0.18.0-py3-non

...for data wrangling in Python, we use pandas and numpy. Data table makes it easier to visualize data frames in Python.

In [2]:
## Load general packages
# some more specialised packages are loaded in each sub section
import pandas as pd
import numpy as np
from google.colab.data_table import DataTable

...to ensure that our code is reproducible, let's set a seed.

In [3]:
# set random seed for reproducibility
seed = 504
np.random.seed(seed)

### Prepare the data

First, download the data from my webpage.

In [4]:
df_train = pd.read_csv("https://mirko-wegemann.github.io/assets/data/training.csv")
df_test = pd.read_csv("https://mirko-wegemann.github.io/assets/data/test.csv")
print("Training data:", len(df_train), "Test data:", len(df_test))

Training data: 18796 Test data: 4699


As you will see, transformer models are quite slow - but they also do not need as much information as other models. So let's just take a subset of our training and test data.

In [None]:
sample_size = 2000
df_train = df_train.sample(n=min(sample_size, len(df_train)), random_state=seed).copy(deep=True)
df_test = df_test.sample(n=min(sample_size*4, len(df_test)), random_state=seed).copy(deep=True)

Usually, it's good to get a glimpse into the data, ensuring that the import went well.

In [None]:
print("Labels of training data:", df_train["labels_text"].value_counts(), "Labels of test data:", df_test["labels_text"].value_counts())

Labels of training data: labels_text
welfare      1797
migration     203
Name: count, dtype: int64 Labels of test data: labels_text
welfare      4253
migration     446
Name: count, dtype: int64


In [None]:
DataTable(df_train, num_rows_per_page=10)

Unnamed: 0.1,Unnamed: 0,sentence_context,labels,labels_text
6547,labels6548,Sinn Féin is proposing that a new short term t...,1,welfare
0,labels1,"Peter Costello, Chris McDiven, my parliamentar...",0,migration
6411,labels6412,– a scheme that has no mandate and is being fa...,1,welfare
8643,labels8644,Yet the signs are already there. The first pri...,1,welfare
10997,labels10998,"There are currently 287,000 children aged 13¬1...",1,welfare
...,...,...,...,...
16294,labels16295,We will reinstate the powers of the Secretary ...,1,welfare
11034,labels11035,and the environment in order to achieve this. ...,1,welfare
4428,labels4429,The Green Party Government will ensure that th...,1,welfare
13483,labels13484,Human resources The health care system depends...,1,welfare


...start with the pre-processing.
We first load the transformer model of our choice. AutoTokenizer and AutoModel does the rest.

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
import torch

# first, set the model and load the tokenizer for the respective model
checkpoint = "microsoft/deberta-v3-base"
tokenizer = AutoTokenizer.from_pretrained(checkpoint, num_labels=2)

# link the numeric labels to the label texts
#label_text = np.sort(df_test.label_text.unique()).tolist()
#label2id = dict(zip(np.sort(label_text), np.sort(pd.factorize(label_text, sort=True)[0]).tolist()))
#id2label = dict(zip(np.sort(pd.factorize(label_text, sort=True)[0]).tolist(), np.sort(label_text)))
#config = AutoConfig.from_pretrained(model_name, label2id=label2id, id2label=id2label, num_labels=len(label2id));
#print("\n", label2id, "\n")

# now load the configuration of a model
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

# use GPU (cuda) if available, otherwise use CPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device: {device}")
model.to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Device: cuda


DebertaV2ForSequenceClassification(
  (deberta): DebertaV2Model(
    (embeddings): DebertaV2Embeddings(
      (word_embeddings): Embedding(128100, 768, padding_idx=0)
      (LayerNorm): LayerNorm((768,), eps=1e-07, elementwise_affine=True)
      (dropout): StableDropout()
    )
    (encoder): DebertaV2Encoder(
      (layer): ModuleList(
        (0-11): 12 x DebertaV2Layer(
          (attention): DebertaV2Attention(
            (self): DisentangledSelfAttention(
              (query_proj): Linear(in_features=768, out_features=768, bias=True)
              (key_proj): Linear(in_features=768, out_features=768, bias=True)
              (value_proj): Linear(in_features=768, out_features=768, bias=True)
              (pos_dropout): StableDropout()
              (dropout): StableDropout()
            )
            (output): DebertaV2SelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-07, elementwise_affine

In [None]:
# convert pandas dataframes to Hugging Face dataset object to facilitate pre-processing
import datasets

dataset = datasets.DatasetDict({
    "train": datasets.Dataset.from_pandas(df_train),
    "test": datasets.Dataset.from_pandas(df_test)
})

# tokenize
def tokenize(examples):
  return tokenizer(examples["sentence_context"], truncation=True, max_length=512)  # max_length can be reduced to e.g. 256 to increase speed, but long texts will be cut off

dataset["train"] = dataset["train"].map(tokenize, batched=True)
dataset["test"] = dataset["test"].map(tokenize, batched=True)

# remove unnecessary columns for model training
dataset = dataset.remove_columns(['labels_text'])

  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/5 [00:00<?, ?ba/s]

When training, we can set different hyperparameters.

In [None]:
from transformers import TrainingArguments, Trainer, logging

# set a temporary training directory
training_directory = "classifier_tutorial"

train_args = TrainingArguments(
    output_dir=f'./results/{training_directory}',
    logging_dir=f'./logs/{training_directory}',
    num_train_epochs=5,
    learning_rate=2e-5,
    per_device_train_batch_size=8,  # these can be adjusted if memory runs out (smaller value -> less memory consumption)
    per_device_eval_batch_size=16,
    weight_decay=0.1,
    seed=seed,
    metric_for_best_model="accuracy"
)


create metrics

In [None]:
from datasets import load_dataset, load_metric
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix

def compute_metrics(eval_preds):
    labels = eval_preds.label_ids
    preds = eval_preds.predictions.argmax(-1)
    accuracy = accuracy_score(labels, preds)
    f1 = f1_score(labels, preds, average="macro")
    cm = confusion_matrix(labels, preds)
    return {"accuracy": accuracy, "f1": f1, "confusion_matrix": cm}

actually run the training

In [None]:
# training
trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=train_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    compute_metrics=compute_metrics
)

trainer.train()


Step,Training Loss
500,0.1523
1000,0.0402


TrainOutput(global_step=1250, training_loss=0.08189927253723145, metrics={'train_runtime': 333.7386, 'train_samples_per_second': 29.964, 'train_steps_per_second': 3.745, 'total_flos': 510329488315776.0, 'train_loss': 0.08189927253723145, 'epoch': 5.0})

In [None]:
#!pip install evaluate

In [None]:
# Evaluate the fine-tuned model on the held-out test set
results = trainer.evaluate()
print(results)

Trainer is attempting to log a value of "[[ 373   73]
 [  33 4220]]" of type <class 'numpy.ndarray'> for key "eval/confusion_matrix" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.


{'eval_loss': 0.1402156949043274, 'eval_accuracy': 0.977442008938072, 'eval_f1': 0.931591695425694, 'eval_confusion_matrix': array([[ 373,   73],
       [  33, 4220]]), 'eval_runtime': 38.0425, 'eval_samples_per_second': 123.52, 'eval_steps_per_second': 7.728, 'epoch': 5.0}


## Save and load your fine-tuned model

This segment provides code for saving the model to your hard-disk or for uploading it to the Hugging Face hub.

In [None]:
## first you need to connect to your google drive with your google account
from google.colab import drive
import os
drive.mount('/content/drive', force_remount=False)
#drive.flush_and_unmount()

# insert the path where you want to save the model
os.chdir("/content/drive/My Drive/")
print(os.getcwd())


Mounted at /content/drive
/content/drive/My Drive


In [None]:
### save best model to disk
directory_save_model = f"{training_directory}/"
model_name_custom = f"{model_name.split('/')[-1]}-custom"
mode_custom_path = directory_save_model + model_name_custom

# save the model to google drive
trainer.save_model(output_dir=mode_custom_path)

Saving model checkpoint to BERT-nli-demo/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7-custom
Configuration saved in BERT-nli-demo/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7-custom/config.json
Model weights saved in BERT-nli-demo/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7-custom/pytorch_model.bin
tokenizer config file saved in BERT-nli-demo/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7-custom/tokenizer_config.json
Special tokens file saved in BERT-nli-demo/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7-custom/special_tokens_map.json


In [None]:
### Push to Hugging Face hub
# install necessary dependencies
# you need to create an account on https://huggingface.co/ for this
!sudo apt-get install git-lfs
!huggingface-cli login

Reading package lists... Done
Building dependency tree       
Reading state information... Done
git-lfs is already the newest version (2.9.2-1).
0 upgraded, 0 newly installed, 0 to remove and 28 not upgraded.

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log

In [None]:
!git config --global credential.helper store

In [None]:
# load your models and tokenizer saved before from disk
model = AutoModelForSequenceClassification.from_pretrained(mode_custom_path)
tokenizer = AutoTokenizer.from_pretrained(mode_custom_path, use_fast=True, model_max_length=512)  # we load the tokenizer from the original BERT-NLI model

NameError: ignored

In [None]:
# https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.push_to_hub
repo_id = 'wegemanm/mdeberta_gender'  # e.g. "JaneJones/DeBERTa-v3-nli-custom". note that the repo name is case-sensitive
model.push_to_hub(repo_id=repo_id, use_temp_dir=True, private=True, use_auth_token="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")
tokenizer.push_to_hub(repo_id=repo_id, use_temp_dir=True, private=True, use_auth_token="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")


Configuration saved in /tmp/tmps1kemcvz/config.json
Model weights saved in /tmp/tmps1kemcvz/pytorch_model.bin
Uploading the following files to wegemanm/mdeberta_gender: config.json,pytorch_model.bin


Upload 1 LFS files:   0%|          | 0/1 [00:00<?, ?it/s]

pytorch_model.bin:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

tokenizer config file saved in /tmp/tmphb9zaxa_/tokenizer_config.json
Special tokens file saved in /tmp/tmphb9zaxa_/special_tokens_map.json
Uploading the following files to wegemanm/mdeberta_gender: tokenizer_config.json,spm.model,added_tokens.json,special_tokens_map.json,tokenizer.json


spm.model:   0%|          | 0.00/4.31M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/16.3M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

CommitInfo(commit_url='https://huggingface.co/wegemanm/mdeberta_gender/commit/54a15a31880e97a6041c2be08013ad7d6d8a3e35', commit_message='Upload tokenizer', commit_description='', oid='54a15a31880e97a6041c2be08013ad7d6d8a3e35', pr_url=None, pr_revision=None, pr_num=None)