This script installs the Natural Language Toolkit (NLTK), a Python library used for natural language processing tasks such as tokenization, tagging, and text prediction.

In [111]:
!pip install nltk



This part of script imports necessary libraries for deep learning with PyTorch.

Sets up text preprocessing with NLTK, including tokenization and stopword removal.

In [112]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from collections import Counter
from torch.utils.data import DataLoader, TensorDataset
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

Here we used the following dataset for training.

In [113]:
document="""Exploring Data Availability in LLM Development
When developing a large language model (LLM), it's crucial to consider the availability of
labeled data for the specific task you want the model to perform. An LLM is a complex AI
model trained to understand and generate human-like language based on patterns learned
from vast amounts of text data. However, general-purpose LLMs often need fine-tuning—
 additional, focused training on a smaller, task-specific dataset—to perform well on a
specialized task, like summarizing scientific articles or answering customer support queries.
Fine-tuning, or adapting a model to a new task, is particularly valuable when data is limited.
In such cases, fine-tuning the model with smaller, targeted datasets allows it to perform
specialized tasks effectively. When labeled data is minimal, methods like zero-shot, few-shot,
and multi-shot learning—referred to collectively as N-shot learning—become essential to
adapt the model.
Understanding Transfer Learning
Transfer learning is a powerful AI approach that enables models trained on one task to apply
the knowledge they gained to a related but different task. This is similar to how humans can
transfer knowledge across skills. For example, a musician trained in piano can transfer skills
like reading music and understanding rhythm to learning the guitar. In the context of LLMs, a
model trained on broad, general text (like news articles, books, and websites) can transfer its
language understanding to tasks that require specialized knowledge, like medical or legal
language processing. Transfer learning lets the model reuse its base knowledge of language
and semantics to perform well on a task for which it may not have specific training data.
Zero-shot Learning
Zero-shot learning is a technique that allows LLMs to perform tasks they haven’t explicitly
trained for. It works by leveraging the model's broad understanding of language and context
to apply this knowledge to new scenarios. Imagine a child who has never seen a zebra but
knows what a horse looks like. If someone tells the child that a zebra looks like a "striped
horse," the child can identify the zebra without any specific training. Similarly, an LLM
trained on a variety of text can use zero-shot learning to answer questions about topics it
hasn’t directly learned by making educated guesses based on its general language
understanding.
Example:
Suppose an LLM is asked to translate a sentence into a language it hasn’t been trained on
directly. If the model has learned similarities and patterns in other languages, it might
approximate the translation with some level of accuracy, even without having any data on
that specific language.
Few-shot Learning
Few-shot learning allows a model to learn a new task with only a few examples. This
approach relies on the model’s ability to generalize from previous tasks, making it more
adaptable to new ones even with limited examples. For instance, a student who has attended
lectures on a topic might answer an exam question based on what they learned in class
without much additional study. Few-shot learning similarly enables LLMs to perform a new
task effectively with just a small number of training examples.
Example:
If an LLM has been trained to understand language structure and is given only three or four
labeled examples of how to summarize news articles, it can still generalize well enough to
summarize new articles by using those few examples to infer the general rules of
summarization.
One-shot Learning as Part of Few-shot Learning
A specific case of few-shot learning, one-shot learning, requires only a single example to
teach the model a task. For example, suppose a student sees one example of how to solve a
math problem. They might then apply that single example to solve similar problems on their
own. For LLMs, one-shot learning is useful when training data is particularly scarce but the
model can generalize well enough from just one labeled example.
Example:
If you want the model to recognize a new product category (like "smart thermostats") and
you provide only one example of a product description in this category, the model may use
that single instance to identify other smart thermostat products based on similarities in
language and function.
Multi-shot Learning
Multi-shot learning is similar to few-shot learning but involves more examples, which
typically improves the model's accuracy and generalization. This approach requires a set of
labeled examples for the model to learn from, though it’s still smaller than the amount
required for traditional supervised learning. Multi-shot learning strikes a balance between
extensive training data and the adaptability of fewer examples.
Example:
Imagine training an LLM to recognize different dog breeds. By showing it several images of
a Golden Retriever, the model starts learning the features of this breed. With a few more
images of similar breeds, like Labradors, it can generalize its knowledge to recognize these as
well, enabling it to distinguish breeds without needing thousands of examples.
Task:
Question: You are part of a team working on an innovative project aiming to adapt a pre
trained language model to a new, related task without much data. To ensure the project's
success, you need to adapt and fine-tune the model. Which general approach leverages prior
knowledge from one task to help train a model on a new, related task?
Select one answer:
1. N-shot learning
2. Zero-shot learning
3. Few-shot learning
4. Transfer learning
5. One-shot learning
The correct answer is:
4. Transfer learning
Building Blocks to Train LLMs
In this section we focuses on two core techniques to pre-train large language models (LLMs)
— next word prediction and masked language modeling. These methods serve as
foundational steps in training many advanced language models, including those used in
natural language processing (NLP) tasks. Pre-training a model involves using a massive
dataset to give the model a general understanding of language before it’s fine-tuned for
specific tasks. Although pre-training from scratch can be costly and time-consuming, many
organizations fine-tune pre-existing pre-trained models instead, adapting them to their
particular needs.
Generative Pre-Training
Generative pre-training is a technique where the model is given sequences of words or text
tokens and learns to predict the next token in that sequence. Through repeated exposure to
different text sequences, the model learns to generate language that is coherent and
contextually relevant. This pre-training process lays the groundwork for the model’s ability to
understand and produce natural language. Two main types of generative pre-training
techniques are next word prediction and masked language modeling, both of which allow the
model to learn patterns, relationships, and the contextual meaning of words.
1. Next Word Prediction
Next word prediction is a supervised learning technique where the model is trained to predict
the next word in a sequence based on the words that come before it. In supervised learning,
the model learns from labeled data—in this case, sentences with a specific sequence of
words. As the model processes each word in a sentence, it builds a contextual understanding
of how words typically follow one another.
For example, in the sentence “The quick brown fox jumps over the lazy dog,” the model
might be given the input “The quick brown” and be trained to predict the word “fox” as the
most likely next word. After correctly predicting “fox,” this word is added to the input
sequence, creating “The quick brown fox,” and the model then tries to predict “jumps.” This
process continues, with each prediction added to the sequence, helping the model capture
dependencies between words and improve at generating coherent text. Suppose you give the
model a prompt, like "I like to drink coffee in the __." The model, having seen many similar
sentences during training, will likely predict "morning" as the next word based on the
common association between coffee and morning routines.
Training Data for Next Word Prediction
To train the model, large datasets are used to create numerous input-output pairs. Each output
is then added back into the sequence for the next input, helping the model learn longer
patterns and more complex word dependencies. Using a single sentence, like “The quick
brown fox jumps over the lazy dog,” training pairs might look like this:
 Input: “The quick brown” → Output: “fox”
 Input: “The quick brown fox” → Output: “jumps”
 Input: “The quick brown fox jumps” → Output: “over”
Through many such examples, the model begins to understand common word associations.
For instance, when prompted with “I like to eat pizza with __,” it might predict “cheese”
rather than words like “oregano” or “ketchup,” because it has learned that “cheese”
frequently appears with “pizza” in similar contexts. This type of learning lets the model
generate more accurate and realistic sentences.
2. Masked Language Modeling
Masked language modeling (MLM) is another popular technique for generative pre-training,
but instead of predicting the next word in a sequence, it involves predicting a word that has
been “masked” or hidden within a sentence. This approach challenges the model to infer
missing information from surrounding words, helping it learn contextual clues and develop a
nuanced understanding of language.
In MLM, a word within a sentence is randomly replaced with a “[MASK]” token. For
example, in the sentence “The quick brown fox jumps over the lazy dog,” the word “brown”
might be masked, so the input becomes “The quick [MASK] fox jumps over the lazy dog.”
The model is trained to predict the missing word (“brown”) by analyzing the context
provided by the rest of the sentence. Even though “brown” could theoretically be replaced by
many different adjectives, the model learns through training data that “brown” is the most
likely option here. Suppose the model encounters the sentence “I enjoy reading books on
[MASK] weekends.” Based on its prior training, the model will likely predict “the” as the
masked word, since “on the weekends” is a common phrase structure. This ability to predict
missing words based on context helps the model develop a better sense of language structure
and word relationships.
Task:
Question: As part of a sales company's AI development team, you have been asked to
explain how masked language modeling works to business stakeholders. You present a
sample of masked data to help illustrate this pre-training process:
Sample: "The [MASK] support [MASK] quickly resolved the [MASK]."
What words have been masked?
Possible Options:
1. office, manager, fight
2. work, dog, bone
3. customer, agent, issue
4. station, officer, feedback
Correct Answer:
3. customer, agent, issue
Question:
You have been working on training an LLM using next word prediction. You have provided
the model with the following training data to help it learn how to predict the next word:
 What is
 What is the
 What is the weather
 What is the weather like
 What is the weather like ……
Which would be the correct prediction for the next word(s)?
Possible Options:
1. "in the cupboard?"
2. "today?"
3. "I don't know."
4. "rainy?"
Correct Answer:
2. "today?"
Question:
You are a data scientist planning to develop large language models from scratch, which
involves building a large generic model for different applications the organization anticipates.
The organization also intends to build a customer service bot to address the high volume of
customer queries. To ensure optimal performance of their AI-driven chatbot, you are
expected to use a combination of techniques in a specific order.
Arrange the techniques in the order the company should use them for their language
model:
1. Tokenize, remove stop words, and lemmatize the raw text
2. Generate word embeddings to convert language to numbers
3. Train the model using masked language modeling
4. Fine-tune the model using task-specific data
Answer:
1. Tokenize, remove stop words, and lemmatize the raw text – Start by preprocessing the
text data to ensure that the raw text is clean and standardized.
2. Generate word embeddings to convert language to numbers – Convert the cleaned text
into numerical representations that the model can process.
3. Train the model using masked language modeling – Use masked language modeling
to pre-train the model on a large dataset, allowing it to understand language structure
and context.
4. Fine-tune the model using task-specific data – Finally, adapt the pre-trained model to
the specific customer service task by fine-tuning it on relevant labeled data. """

This script prepares text data for a next-word prediction model using PyTorch, with tokenization and stopword removal handled via NLTK.

In [114]:
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

This script preprocesses a text dataset for next-word prediction by tokenizing the input document and converting it to lowercase using NLTK.

In [115]:
tokens = word_tokenize(document.lower())

In [116]:
tokens

['exploring',
 'data',
 'availability',
 'in',
 'llm',
 'development',
 'when',
 'developing',
 'a',
 'large',
 'language',
 'model',
 '(',
 'llm',
 ')',
 ',',
 'it',
 "'s",
 'crucial',
 'to',
 'consider',
 'the',
 'availability',
 'of',
 'labeled',
 'data',
 'for',
 'the',
 'specific',
 'task',
 'you',
 'want',
 'the',
 'model',
 'to',
 'perform',
 '.',
 'an',
 'llm',
 'is',
 'a',
 'complex',
 'ai',
 'model',
 'trained',
 'to',
 'understand',
 'and',
 'generate',
 'human-like',
 'language',
 'based',
 'on',
 'patterns',
 'learned',
 'from',
 'vast',
 'amounts',
 'of',
 'text',
 'data',
 '.',
 'however',
 ',',
 'general-purpose',
 'llms',
 'often',
 'need',
 'fine-tuning—',
 'additional',
 ',',
 'focused',
 'training',
 'on',
 'a',
 'smaller',
 ',',
 'task-specific',
 'dataset—to',
 'perform',
 'well',
 'on',
 'a',
 'specialized',
 'task',
 ',',
 'like',
 'summarizing',
 'scientific',
 'articles',
 'or',
 'answering',
 'customer',
 'support',
 'queries',
 '.',
 'fine-tuning',
 ',',
 'o

In [117]:
len(tokens)

2363

This script builds a vocabulary for the next-word prediction model by assigning unique integer indices to tokens in the dataset, with an unknown token (<UNK>) initialized in the vocabulary.

In [118]:
vocab={'<UNK>':0}
Counter(tokens).keys
for token in Counter(tokens).keys():
  if token not in vocab:
    vocab[token]=len(vocab)

vocab

{'<UNK>': 0,
 'exploring': 1,
 'data': 2,
 'availability': 3,
 'in': 4,
 'llm': 5,
 'development': 6,
 'when': 7,
 'developing': 8,
 'a': 9,
 'large': 10,
 'language': 11,
 'model': 12,
 '(': 13,
 ')': 14,
 ',': 15,
 'it': 16,
 "'s": 17,
 'crucial': 18,
 'to': 19,
 'consider': 20,
 'the': 21,
 'of': 22,
 'labeled': 23,
 'for': 24,
 'specific': 25,
 'task': 26,
 'you': 27,
 'want': 28,
 'perform': 29,
 '.': 30,
 'an': 31,
 'is': 32,
 'complex': 33,
 'ai': 34,
 'trained': 35,
 'understand': 36,
 'and': 37,
 'generate': 38,
 'human-like': 39,
 'based': 40,
 'on': 41,
 'patterns': 42,
 'learned': 43,
 'from': 44,
 'vast': 45,
 'amounts': 46,
 'text': 47,
 'however': 48,
 'general-purpose': 49,
 'llms': 50,
 'often': 51,
 'need': 52,
 'fine-tuning—': 53,
 'additional': 54,
 'focused': 55,
 'training': 56,
 'smaller': 57,
 'task-specific': 58,
 'dataset—to': 59,
 'well': 60,
 'specialized': 61,
 'like': 62,
 'summarizing': 63,
 'scientific': 64,
 'articles': 65,
 'or': 66,
 'answering': 67,


In [119]:
len(vocab)

574

This script splits the input document into individual sentences, preparing the text data for further processing in the next-word prediction model.

In [120]:
input_sentences=document.split('\n')

In [121]:
input_sentences

['Exploring Data Availability in LLM Development ',
 "When developing a large language model (LLM), it's crucial to consider the availability of ",
 'labeled data for the specific task you want the model to perform. An LLM is a complex AI ',
 'model trained to understand and generate human-like language based on patterns learned ',
 'from vast amounts of text data. However, general-purpose LLMs often need fine-tuning—',
 ' additional, focused training on a smaller, task-specific dataset—to perform well on a ',
 'specialized task, like summarizing scientific articles or answering customer support queries. ',
 'Fine-tuning, or adapting a model to a new task, is particularly valuable when data is limited. ',
 'In such cases, fine-tuning the model with smaller, targeted datasets allows it to perform ',
 'specialized tasks effectively. When labeled data is minimal, methods like zero-shot, few-shot, ',
 'and multi-shot learning—referred to collectively as N-shot learning—become essential to 

This function converts a sentence into a sequence of numerical indices based on the vocabulary, using the <UNK> token for unknown words not found in the vocabulary.

In [122]:
def text_indices(sentence, vocab):
  numerical_sentence=[]
  for token in sentence:
    if token not in vocab:
      numerical_sentence.append(vocab['<UNK>'])
    else:
      numerical_sentence.append(vocab[token])
  return numerical_sentence

This script converts the list of input sentences into sequences of numerical indices by tokenizing each sentence, converting to lowercase, and mapping tokens to their corresponding indices in the vocabulary.

In [123]:
input_numerical_sentences = []

for sentence in input_sentences:
  input_numerical_sentences.append(text_indices(word_tokenize(sentence.lower()), vocab))


In [124]:
len(input_numerical_sentences)

195

This script generates training sequences for the next-word prediction model by creating subsequences from each sentence, where each sequence includes progressively more tokens to predict the next word.

In [125]:
training_sequences = []
for sentence in input_numerical_sentences:
  for i in range(1,len(sentence)):
    training_sequences.append(sentence[:i+1])


In [126]:
len(training_sequences)

2169

In [127]:
training_sequences[:9]

[[1, 2],
 [1, 2, 3],
 [1, 2, 3, 4],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5, 6],
 [7, 8],
 [7, 8, 9],
 [7, 8, 9, 10],
 [7, 8, 9, 10, 11]]

This script calculates the lengths of all training sequences and finds the maximum sequence length, which can be useful for padding or defining input size for the model.

In [128]:
len_list=[]
for sequence in training_sequences:
  len_list.append(len(sequence))

max(len_list)

23

This script pads the training sequences with zeros to ensure they all have the same length, based on the maximum sequence length, preparing the data for input into the model.

In [129]:
padded_training_sequence=[]
for sequence in training_sequences:
 padded_training_sequence.append([0]*(max(len_list)-len(sequence))+sequence)

In [130]:
len(padded_training_sequence[100])

23

This script converts the padded training sequences into a PyTorch tensor of type long, making it ready for training in a deep learning model.

In [131]:
padded_training_sequence=torch.tensor(padded_training_sequence, dtype=torch.long)

In [132]:
padded_training_sequence.shape

torch.Size([2169, 23])

This script splits the padded training sequences into input (x) and target (y) tensors, where x contains all tokens except the last one (input sequence), and y contains the last token (target word to predict).

In [133]:
x=padded_training_sequence[:, :-1]
y=padded_training_sequence[:, -1]

In [134]:
x.shape

torch.Size([2169, 22])

In [135]:
from torch.utils.data import Dataset, dataloader

This script defines a custom PyTorch dataset class that stores the input (x) and target (y) sequences, enabling easy batching and access to training data during model training.

In [136]:
class CustomDataset(Dataset):
  def __init__(self, x, y):
    self.x=x
    self.y=y
  def __len__(self):
    return self.x.shape[0]
  def __getitem__(self, idx):
    return self.x[idx], self.y[idx]

This script creates a custom dataset instance, dataset, using the input (x) and target (y) tensors, ready for use in data loading and model training.

In [137]:
dataset=CustomDataset(x,y)

In [138]:
len(dataset)

2169

This script creates a DataLoader for the custom dataset, enabling efficient batch processing and shuffling of the training data with a batch size of 32.

In [139]:
dataloader=DataLoader(dataset=dataset, batch_size=32, shuffle=True)

In [140]:
dataset[1]

(tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2]),
 tensor(3))

This script defines an LSTM-based model for next-word prediction, using an embedding layer for token representation, an LSTM layer for sequence learning, and a fully connected layer for generating predictions based on the final hidden state.

In [141]:
class LSTMmodel(nn.Module):
  def __init__(self, vocab_size):
    super().__init__()
    self.emmbedding=nn.Embedding(vocab_size, 100)
    self.lstm=nn.LSTM(100, 150, batch_first=True)
    self.fc=nn.Linear(150, vocab_size)


  def forward(self,x):
    embedded=self.emmbedding(x)
    intermediate_hidden_state, (final_hidden_state, final_cell_state)=self.lstm(embedded)
    output= self.fc(final_hidden_state.squeeze(0))
    return output

This script creates an instance of the LSTM model, initializing it with the vocabulary size to ensure the model can handle the input data and output the correct word predictions.

In [142]:
model=LSTMmodel(vocab_size=len(vocab))

This script checks if a GPU is available and sets the device to CUDA for faster training, otherwise defaults to using the CPU, and prints the selected device.

In [143]:
device= torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cuda


This script moves the model to the selected device (GPU or CPU) for training, ensuring the model operates on the appropriate hardware

In [144]:
model.to(device)

LSTMmodel(
  (emmbedding): Embedding(574, 100)
  (lstm): LSTM(100, 150, batch_first=True)
  (fc): Linear(in_features=150, out_features=574, bias=True)
)

This script sets the learning rate to 0.001, which controls the step size during model training to optimize the loss function.

In [145]:
epochs=50
learning_rate=0.001

This script defines the loss function as CrossEntropyLoss for multi-class classification and initializes the Adam optimizer with the model parameters and a learning rate of 0.00

In [146]:
criterion=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(), lr=learning_rate)

This script trains the LSTM model for the specified number of epochs, using the data from the DataLoader, calculating the loss, performing backpropagation, and updating the model weights using the Adam optimizer.

In [147]:
for epoch in range(epochs):
  total_loss=0
  for batch_x, batch_y in dataloader:
    batch_x, batch_y=batch_x.to(device), batch_y.to(device)
    optimizer.zero_grad()
    output=model(batch_x)
    loss=criterion(output, batch_y)
    loss.backward()
    optimizer.step()
    total_loss+=loss.item()
  print(f"Epoch: {epoch+1}, loss {total_loss:.4f}")

Epoch: 1, loss 398.0926
Epoch: 2, loss 349.6567
Epoch: 3, loss 320.6432
Epoch: 4, loss 289.9704
Epoch: 5, loss 260.8035
Epoch: 6, loss 232.9226
Epoch: 7, loss 207.2139
Epoch: 8, loss 183.7036
Epoch: 9, loss 161.3597
Epoch: 10, loss 141.0205
Epoch: 11, loss 122.4925
Epoch: 12, loss 105.7705
Epoch: 13, loss 91.2682
Epoch: 14, loss 78.1486
Epoch: 15, loss 66.8981
Epoch: 16, loss 57.3497
Epoch: 17, loss 49.2239
Epoch: 18, loss 42.1559
Epoch: 19, loss 36.7549
Epoch: 20, loss 31.8738
Epoch: 21, loss 28.0127
Epoch: 22, loss 24.7704
Epoch: 23, loss 22.0688
Epoch: 24, loss 19.8429
Epoch: 25, loss 17.9109
Epoch: 26, loss 16.3299
Epoch: 27, loss 15.1393
Epoch: 28, loss 14.0044
Epoch: 29, loss 12.8681
Epoch: 30, loss 12.0303
Epoch: 31, loss 11.2284
Epoch: 32, loss 10.6628
Epoch: 33, loss 10.1317
Epoch: 34, loss 9.6728
Epoch: 35, loss 9.3557
Epoch: 36, loss 9.0651
Epoch: 37, loss 8.7826
Epoch: 38, loss 8.2612
Epoch: 39, loss 8.0548
Epoch: 40, loss 7.7418
Epoch: 41, loss 7.5077
Epoch: 42, loss 7.344

This script defines a function for predicting the next word given a text input, by tokenizing, converting to numerical indices, padding the sequence, and using the trained model to generate the next word prediction based on the highest output probability.

In [148]:
def prediction(model, vocab, text):
  tokenized_text=word_tokenize(text.lower())
  numerical_text=text_indices(tokenized_text, vocab)
  padded_text=torch.tensor([0]*(23-len(numerical_text))+numerical_text, dtype=torch.long).unsqueeze(0).to(device)
  output=model(padded_text)
  value, index=torch.max(output, dim=1)
  # print(list(vocab.keys())[index])
  return text +" "+ list(vocab.keys())[index]





This script predicts the next word after "Transfer learning is a powerful AI" by processing the input text through the trained model and returning the predicted next word from the vocabulary.

In [149]:
prediction(model, vocab, "Transfer learning is a powerful AI")

'Transfer learning is a powerful AI approach'

This script generates a sequence of 25 predicted words starting from the input text "Zero-shot learning", using the model to predict the next word iteratively and appending it to the input text, with a 0.3-second delay between each prediction.

In [150]:
import time
num_token=30
input_text="Zero-shot learning"
for token in range(num_token):
  output=prediction(model, vocab, input_text)
  print(output)
  input_text=output
  time.sleep(0.3)


Zero-shot learning is
Zero-shot learning is a
Zero-shot learning is a technique
Zero-shot learning is a technique that
Zero-shot learning is a technique that allows
Zero-shot learning is a technique that allows llms
Zero-shot learning is a technique that allows llms to
Zero-shot learning is a technique that allows llms to perform
Zero-shot learning is a technique that allows llms to perform tasks
Zero-shot learning is a technique that allows llms to perform tasks they
Zero-shot learning is a technique that allows llms to perform tasks they haven
Zero-shot learning is a technique that allows llms to perform tasks they haven ’
Zero-shot learning is a technique that allows llms to perform tasks they haven ’ t
Zero-shot learning is a technique that allows llms to perform tasks they haven ’ t explicitly
Zero-shot learning is a technique that allows llms to perform tasks they haven ’ t explicitly explicitly
Zero-shot learning is a technique that allows llms to perform tasks they haven ’ t ex