downloading DistilBERT using the AutoModelForMaskedLM class:

In [1]:
from transformers import AutoModelForMaskedLM

model_checkpoint = "distilbert-base-uncased"

model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)

We can see how many parameters this model has by calling the num_parameters() method:

In [2]:
distilbert_num_parameters = model.num_parameters() / 1_000_000
print(f"'>>> DistilBERT number of parameters: {round(distilbert_num_parameters)}M'")
print(f"'>>> BERT number of parameters: 110M'")

'>>> DistilBERT number of parameters: 67M'
'>>> BERT number of parameters: 110M'


see what kinds of tokens this model predicts are the most likely completions of a small sample of text:

In [3]:
text = "This is a great [MASK]."

To predict the mask we need DistilBERT’s tokenizer to produce the inputs for the model, so downloading that from the Hub as well:

In [4]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

With a tokenizer and a model, we can now pass our text example to the model, extract the logits, and print out the top 5 candidates:

In [5]:
import torch

inputs = tokenizer(text, return_tensors="pt")
token_logits = model(**inputs).logits
# Find the location of [MASK] and extract its logits
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
mask_token_logits = token_logits[0, mask_token_index, :]
# Pick the [MASK] candidates with the highest logits
top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()

for token in top_5_tokens:
    print(f"'>>> {text.replace(tokenizer.mask_token, tokenizer.decode([token]))}'")

'>>> This is a great deal.'
'>>> This is a great success.'
'>>> This is a great adventure.'
'>>> This is a great idea.'
'>>> This is a great feat.'


In [6]:
import torch

inputs = tokenizer(text, return_tensors="pt")
token_logits = model(**inputs).logits

# Find the location of [MASK] and extract its logits
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
mask_token_logits = token_logits[0, mask_token_index, :]

# Calculate probabilities using softmax
mask_token_probs = torch.softmax(mask_token_logits, dim=1)

# Pick the [MASK] candidates with the highest logits
top_5 = torch.topk(mask_token_logits, 5, dim=1)

for i in range(5):
    token_index = top_5.indices[0, i].item()
    token_prob = mask_token_probs[0, i].item()
    predicted_token = tokenizer.decode([token_index])  # Use the decoded token directly

    predicted_token
    print(f"Prediction {i + 1}: '{text.replace(tokenizer.mask_token, predicted_token)}' with probability: {token_prob:.4e}")

Prediction 1: 'This is a great deal.' with probability: 2.4904e-07
Prediction 2: 'This is a great success.' with probability: 3.0297e-07
Prediction 3: 'This is a great adventure.' with probability: 1.8798e-07
Prediction 4: 'This is a great idea.' with probability: 2.5635e-07
Prediction 5: 'This is a great feat.' with probability: 2.3657e-07


In [7]:
import torch
from transformers import pipeline

inputs = tokenizer(text, return_tensors="pt")
token_logits = model(**inputs).logits

# Find the location of [MASK] and extract its logits
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
mask_token_logits = token_logits[0, mask_token_index, :]

# Pick the [MASK] candidates with the highest logits
top_10_result = torch.topk(mask_token_logits, 10, dim=1)

top_10_tokens = top_10_result.indices[0].tolist()
top_10_logits = top_10_result.values[0].tolist()

for token, logit in zip(top_10_tokens, top_10_logits):
    replaced_text = text.replace(tokenizer.mask_token, tokenizer.decode([token]))
    print(f"'>>> {replaced_text}' - Logit: {logit}")


'>>> This is a great deal.' - Logit: 7.07273530960083
'>>> This is a great success.' - Logit: 6.651430130004883
'>>> This is a great adventure.' - Logit: 6.642456531524658
'>>> This is a great idea.' - Logit: 6.252985000610352
'>>> This is a great feat.' - Logit: 5.861796855926514
'>>> This is a great mistake.' - Logit: 5.811717510223389
'>>> This is a great shame.' - Logit: 5.80202579498291
'>>> This is a great undertaking.' - Logit: 5.687473773956299
'>>> This is a great achievement.' - Logit: 5.625697612762451
'>>> This is a great coincidence.' - Logit: 5.591372489929199


getting the data from the Hugging Face Hub with the load_dataset() function from HF Datasets:

In [8]:
from datasets import load_dataset

imdb_dataset = load_dataset("imdb")
imdb_dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

Chaining the Dataset.shuffle() and Dataset.select() functions to create a random sample:

In [9]:
sample = imdb_dataset["train"].shuffle(seed=42).select(range(3))

for row in sample:
    print(f"\n'>>> Review: {row['text']}'")
    print(f"'>>> Label: {row['label']}'")


'>>> Review: There is no relation at all between Fortier and Profiler but the fact that both are police series about violent crimes. Profiler looks crispy, Fortier looks classic. Profiler plots are quite simple. Fortier's plot are far more complicated... Fortier looks more like Prime Suspect, if we have to spot similarities... The main character is weak and weirdo, but have "clairvoyance". People like to compare, to judge, to evaluate. How about just enjoying? Funny thing too, people writing Fortier looks American but, on the other hand, arguing they prefer American series (!!!). Maybe it's the language, or the spirit, but I think this series is more English than American. By the way, the actors are really good and funny. The acting is not superficial at all...'
'>>> Label: 1'

'>>> Review: This movie is a great. The plot is very true to the book which is a classic written by Mark Twain. The movie starts of with a scene where Hank sings a song with a bunch of kids called "when you stu

preparing the data for masked language modeling.first tokenize our corpus as usual, but without setting the truncation=True option in our tokenizer and grab the word IDs if they are available as we will need them later on to do whole word masking. Finally, wrapping this in a simple function, and while we’re at it we’ll remove the text and label columns since we don’t need them any longer:

In [10]:
def tokenize_function(examples):
    result = tokenizer(examples["text"])
    if tokenizer.is_fast:
        result["word_ids"] = [result.word_ids(i) for i in range(len(result["input_ids"]))]
    return result


# Use batched=True to activate fast multithreading!
tokenized_datasets = imdb_dataset.map(
    tokenize_function, batched=True, remove_columns=["text", "label"]
)
tokenized_datasets

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids'],
        num_rows: 50000
    })
})

inspecting the model_max_length attribute of the tokenizer:

In [11]:
tokenizer.model_max_length

512

pick something a bit smaller that can fit in memory:

In [12]:
chunk_size = 128

taking a few reviews from our tokenized training set and print out the number of tokens per review:

In [13]:
# Slicing produces a list of lists for each feature
tokenized_samples = tokenized_datasets["train"][:3]

for idx, sample in enumerate(tokenized_samples["input_ids"]):
    print(f"'>>> Review {idx} length: {len(sample)}'")

'>>> Review 0 length: 363'
'>>> Review 1 length: 304'
'>>> Review 2 length: 133'


concatenating all these examples with a simple dictionary comprehension, as follows:

In [14]:
concatenated_examples = {
    k: sum(tokenized_samples[k], []) for k in tokenized_samples.keys()
}
total_length = len(concatenated_examples["input_ids"])
print(f"'>>> Concatenated reviews length: {total_length}'")

'>>> Concatenated reviews length: 800'


spliting the concatenated reviews into chunks of the size given by chunk_size. To do so, iterating over the features in concatenated_examples and using a list comprehension to create slices of each feature. The result is a dictionary of chunks for each feature:

In [15]:
chunks = {
    k: [t[i : i + chunk_size] for i in range(0, total_length, chunk_size)]
    for k, t in concatenated_examples.items()
}

for chunk in chunks["input_ids"]:
    print(f"'>>> Chunk length: {len(chunk)}'")

'>>> Chunk length: 128'
'>>> Chunk length: 128'
'>>> Chunk length: 128'
'>>> Chunk length: 128'
'>>> Chunk length: 128'
'>>> Chunk length: 128'
'>>> Chunk length: 32'


wraping all of the above logic in a single function that we can apply to our tokenized datasets:

In [16]:
def group_texts(examples):
    # Concatenate all texts
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    # Compute length of concatenated texts
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # We drop the last chunk if it's smaller than chunk_size
    total_length = (total_length // chunk_size) * chunk_size
    # Split by chunks of max_len
    result = {
        k: [t[i : i + chunk_size] for i in range(0, total_length, chunk_size)]
        for k, t in concatenated_examples.items()
    }
    # Create a new labels column
    result["labels"] = result["input_ids"].copy()
    return result

appling group_texts() to our tokenized datasets using our trusty Dataset.map() function:

In [17]:
lm_datasets = tokenized_datasets.map(group_texts, batched=True)
lm_datasets

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids', 'labels'],
        num_rows: 61291
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids', 'labels'],
        num_rows: 59904
    })
    unsupervised: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids', 'labels'],
        num_rows: 122957
    })
})

In [18]:
tokenizer.decode(lm_datasets["train"][1]["input_ids"])

"as the vietnam war and race issues in the united states. in between asking politicians and ordinary denizens of stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men. < br / > < br / > what kills me about i am curious - yellow is that 40 years ago, this was considered pornographic. really, the sex and nudity scenes are few and far between, even then it's not shot like some cheaply made porno. while my countrymen mind find it shocking, in reality sex and nudity are a major staple in swedish cinema. even ingmar bergman,"

In [19]:
tokenizer.decode(lm_datasets["train"][1]["labels"])

"as the vietnam war and race issues in the united states. in between asking politicians and ordinary denizens of stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men. < br / > < br / > what kills me about i am curious - yellow is that 40 years ago, this was considered pornographic. really, the sex and nudity scenes are few and far between, even then it's not shot like some cheaply made porno. while my countrymen mind find it shocking, in reality sex and nudity are a major staple in swedish cinema. even ingmar bergman,"

passing DataCollatorForLanguageModeling with the tokenizer and an mlm_probability argument that specifies what fraction of the tokens to mask. 

we first iterate over the dataset before feeding the batch to the collator

In [20]:
from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15)

samples = [lm_datasets["train"][i] for i in range(2)]
for sample in samples:
    _ = sample.pop("word_ids")

for chunk in data_collator(samples)["input_ids"]:
    print(f"\n'>>> {tokenizer.decode(chunk)}'")

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.



'>>> [CLS] i rented i am curious - yellow from my video store because of [MASK] the controversy that surrounded it when it [MASK] first released in [MASK]. i also heard that at first it was hammersmith by u. s. [MASK] if it ever tried to enter this [MASK], therefore being a fan of films considered " controversial " i really had [MASK] see this for myself. [MASK] br / > < br / > the plot is centered around a [MASK] swedish drama student named lena who [MASK] to [MASK] everything she can [MASK] life. in particular she wants to focus her attentions to making some sort of documentary on what [MASK] [MASK] swede thought about certain political issues such'

'>>> as the vietnam war and race issues in the united states. in between asking politicians and ordinary denizens of stockholm about their opinions on politics [MASK] she has sex with her drama teacher, classmates, and [MASK] men. < br / > < br / > what kills [MASK] about [MASK] [MASK] curious - yellow is that 40 years ago, this was con

masking a whole words together, we will need to build a data collator ourselves. A data collator is just a function that takes a list of samples and converts them into a batch, so we’ll use the word IDs computed earlier to make a map between word indices and the corresponding tokens, then randomly decide which words to mask and apply that mask on the inputs

In [21]:
import collections
import numpy as np

from transformers import default_data_collator

wwm_probability = 0.2


def whole_word_masking_data_collator(features):
    for feature in features:
        word_ids = feature.pop("word_ids")

        # Create a map between words and corresponding token indices
        mapping = collections.defaultdict(list)
        current_word_index = -1
        current_word = None
        for idx, word_id in enumerate(word_ids):
            if word_id is not None:
                if word_id != current_word:
                    current_word = word_id
                    current_word_index += 1
                mapping[current_word_index].append(idx)

        # Randomly mask words
        mask = np.random.binomial(1, wwm_probability, (len(mapping),))
        input_ids = feature["input_ids"]
        labels = feature["labels"]
        new_labels = [-100] * len(labels)
        for word_id in np.where(mask)[0]:
            word_id = word_id.item()
            for idx in mapping[word_id]:
                new_labels[idx] = labels[idx]
                input_ids[idx] = tokenizer.mask_token_id
        feature["labels"] = new_labels

    return default_data_collator(features)
#Next, we can try it on the same samples as before:

samples = [lm_datasets["train"][i] for i in range(2)]
batch = whole_word_masking_data_collator(samples)

for chunk in batch["input_ids"]:
    print(f"\n'>>> {tokenizer.decode(chunk)}'")


'>>> [CLS] [MASK] rented [MASK] am curious - [MASK] from [MASK] video store [MASK] of all the controversy [MASK] surrounded it [MASK] it was first released in 1967 [MASK] [MASK] also heard that at first it was [MASK] by u [MASK] s [MASK] customs if it ever tried [MASK] enter this country [MASK] [MASK] being a fan [MASK] [MASK] considered " controversial " i really had to [MASK] this for myself [MASK] < br [MASK] [MASK] < br / > the [MASK] is centered around a young swedish drama [MASK] named [MASK] who wants [MASK] learn everything she [MASK] about life. in [MASK] she wants to focus her attentions to making some sort [MASK] [MASK] on what the [MASK] swede thought [MASK] certain political [MASK] [MASK]'

'>>> [MASK] [MASK] [MASK] war and race issues in the united states [MASK] in between asking politicians and ordinary denizens of stockholm about their opinions on politics, she has sex with her drama teacher, classmates, [MASK] married [MASK]. < br / > < br / > what kills me [MASK] [MA

downsampling the size of the training set to a few thousand examples

In [22]:
train_size = 10_000
test_size = int(0.1 * train_size)

downsampled_dataset = lm_datasets["train"].train_test_split(
    train_size=train_size, test_size=test_size, seed=42
)
downsampled_dataset

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids', 'labels'],
        num_rows: 10000
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids', 'labels'],
        num_rows: 1000
    })
})

logging in to the Hugging Face Hub

In [23]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

specifying the arguments for the Trainer

In [24]:
from transformers import TrainingArguments

batch_size = 64
# Show the training loss with every epoch
logging_steps = len(downsampled_dataset["train"]) // batch_size
model_name = model_checkpoint.split("/")[-1]

training_args = TrainingArguments(
    output_dir=f"{model_name}-finetuned-imdb",
    overwrite_output_dir=True,
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    weight_decay=0.01,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    push_to_hub=True,
    fp16=True,
    logging_steps=logging_steps,
)

instantiating the Trainer

In [25]:
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=downsampled_dataset["train"],
    eval_dataset=downsampled_dataset["test"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)

calculating the perplexity of our pretrained model by using the Trainer.evaluate() function to compute the cross-entropy loss on the test set and then taking the exponential of the result:

In [26]:
import math

eval_results = trainer.evaluate()
print(f">>> Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

eval_results = trainer.evaluate()
entropy = eval_results['eval_loss']
print(f">>> Entropy: {entropy:.4f}")



>>> Perplexity: 22.75
>>> Entropy: 3.1004


run the training loop to see if we can lower it by fine-tuning

In [27]:
trainer.train()

Epoch,Training Loss,Validation Loss
1,No log,2.517202
2,2.656100,2.465651
3,2.656100,2.449105


TrainOutput(global_step=237, training_loss=2.6243806991899064, metrics={'train_runtime': 114.6002, 'train_samples_per_second': 261.78, 'train_steps_per_second': 2.068, 'total_flos': 994208670720000.0, 'train_loss': 2.6243806991899064, 'epoch': 3.0})

computing the resulting perplexity on the test set as before:

In [28]:
eval_results = trainer.evaluate()
print(f">>> Perplexity: {math.exp(eval_results['eval_loss']):.2f}")


eval_results = trainer.evaluate()
entropy = eval_results['eval_loss']
print(f">>> Entropy: {entropy:.4f}")

>>> Perplexity: 12.16
>>> Entropy: 2.4636


In [29]:
trainer.push_to_hub()

training_args.bin:   0%|          | 0.00/4.79k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/shradha01/distilbert-base-uncased-finetuned-imdb/commit/1c9185b66050bf95808963014f65bddb01abae03', commit_message='End of training', commit_description='', oid='1c9185b66050bf95808963014f65bddb01abae03', pr_url=None, pr_revision=None, pr_num=None)

DataCollatorForLanguageModeling also applies random masking with each evaluation, so we’ll see some fluctuations in our perplexity scores with each training run. One way to eliminate this source of randomness is to apply the masking once on the whole test set, and then use the default data collator in HF Transformers to collect the batches during evaluation. So, implementing a simple function that applies the masking on a batch, similar to our first encounter with DataCollatorForLanguageModeling:

In [30]:
def insert_random_mask(batch):
    features = [dict(zip(batch, t)) for t in zip(*batch.values())]
    masked_inputs = data_collator(features)
    # Create a new "masked" column for each column in the dataset
    return {"masked_" + k: v.numpy() for k, v in masked_inputs.items()}

applying this function to our test set and drop the unmasked columns so we can replace them with the masked ones.

In [31]:
downsampled_dataset = downsampled_dataset.remove_columns(["word_ids"])
eval_dataset = downsampled_dataset["test"].map(
    insert_random_mask,
    batched=True,
    remove_columns=downsampled_dataset["test"].column_names,
)
eval_dataset = eval_dataset.rename_columns(
    {
        "masked_input_ids": "input_ids",
        "masked_attention_mask": "attention_mask",
        "masked_labels": "labels",
    }
)

 setting up the dataloaders as usual, but using the default_data_collator from HF Transformers for the evaluation set:

In [32]:
from torch.utils.data import DataLoader
from transformers import default_data_collator

batch_size = 64
train_dataloader = DataLoader(
    downsampled_dataset["train"],
    shuffle=True,
    batch_size=batch_size,
    collate_fn=data_collator,
)
eval_dataloader = DataLoader(
    eval_dataset, batch_size=batch_size, collate_fn=default_data_collator
)

loading a fresh version of the pretrained model:

In [33]:
model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)

specifying the optimizer; we’ll use the standard AdamW:

In [34]:
from torch.optim import AdamW

optimizer = AdamW(model.parameters(), lr=5e-5)

preparing everything for training with the Accelerator object:

In [35]:
from accelerate import Accelerator

accelerator = Accelerator()
model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
    model, optimizer, train_dataloader, eval_dataloader
)

our model, optimizer, and dataloaders are configured, specifying the learning rate scheduler as follows:

In [36]:
from transformers import get_scheduler

num_train_epochs = 3
num_update_steps_per_epoch = len(train_dataloader)
num_training_steps = num_train_epochs * num_update_steps_per_epoch

lr_scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=num_training_steps,
)

In [37]:
from huggingface_hub import get_full_repo_name

model_name = "distilbert-base-uncased-finetuned-imdb"
repo_name = get_full_repo_name(model_name)
repo_name

'shradha01/distilbert-base-uncased-finetuned-imdb'

In [38]:
from huggingface_hub import Repository

output_dir = model_name
repo = Repository(output_dir, clone_from=repo_name)

For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.
/home/user1-selab3/shradha_test/roberta/distilbert-base-uncased-finetuned-imdb is already a clone of https://huggingface.co/shradha01/distilbert-base-uncased-finetuned-imdb. Make sure you pull the latest changes with `repo.git_pull()`.


In [None]:
# from huggingface_hub import Repository

# model_name = "distilbert-base-uncased-finetuned-imdb-accelerate"
# output_dir = "/home/user1-selab3/shradha_test/roberta"

# # Clone the repository using the Repository class
# repo_name = f"shradha01/{model_name}"  # Replace 'username' with the actual username
# repo = Repository(output_dir, clone_from=repo_name)

writing out the full training and evaluation loop:

In [39]:
from subprocess import run
from tqdm.auto import tqdm
import torch
import math
from huggingface_hub import Repository
from transformers import Trainer

progress_bar = tqdm(range(num_training_steps))

for epoch in range(num_train_epochs):
    # Training
    model.train()
    for batch in train_dataloader:
        with accelerator.autocast():
            outputs = model(**batch)
            loss = outputs.loss

        # Scale the loss before backward
        accelerator.backward(loss)

        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(1)

    # Evaluation
    model.eval()
    losses = []
    for step, batch in enumerate(eval_dataloader):
        with torch.no_grad():
            with accelerator.autocast():
                outputs = model(**batch)

            loss = outputs.loss
            losses.append(accelerator.gather(loss.repeat(batch_size)))

    losses = torch.cat(losses)
    losses = losses[: len(eval_dataset)]
    try:
        perplexity = math.exp(torch.mean(losses))
    except OverflowError:
        perplexity = float("inf")

    print(f">>> Epoch {epoch}: Perplexity: {perplexity}")

    # Save and upload
    accelerator.wait_for_everyone()
    unwrapped_model = accelerator.unwrap_model(model)
    unwrapped_model.save_pretrained(output_dir, save_function=accelerator.save)
    if accelerator.is_main_process:
        tokenizer.save_pretrained(output_dir)
        # Commit and push to Hugging Face Hub
        repo = Repository(output_dir)

    # Commit changes
        repo.git_add(".")
        run(["git", "commit", "-m", f"Training in progress epoch {epoch}"], cwd=output_dir)

    # Push to the Hub
    repo.push_to_hub(blocking=False)
        

  0%|          | 0/471 [00:00<?, ?it/s]

>>> Epoch 0: Perplexity: 12.164272943094197


For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.


[main 96122a4] Training in progress epoch 0
 2 files changed, 2 insertions(+), 2 deletions(-)
>>> Epoch 1: Perplexity: 11.634498857155373


For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.


[main f1d5813] Training in progress epoch 1
 1 file changed, 1 insertion(+), 1 deletion(-)
>>> Epoch 2: Perplexity: 11.44816272077993


For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.


[main 18a1d34] Training in progress epoch 2
 1 file changed, 1 insertion(+), 1 deletion(-)


Using our fine-tuned model by using locally with the pipeline from HF Transformers. 

In [40]:
from transformers import pipeline

mask_filler = pipeline(
    "fill-mask", model="shradha01/distilbert-base-uncased-finetuned-imdb"
)

 feeding the pipeline our sample text of “This is a great [MASK]” and see what the top 5 predictions are:

In [41]:
preds = mask_filler(text)

for pred in preds:
    sequence = pred['sequence']
    score = pred['score']
    print(f">>> {sequence} with probability: {score:.4f}")

>>> this is a great film. with probability: 0.0586
>>> this is a great movie. with probability: 0.0462
>>> this is a great idea. with probability: 0.0450
>>> this is a great deal. with probability: 0.0348
>>> this is a great one. with probability: 0.0259
