# Generating Unigram, Bigram, Trigram and Ngrams in NLTK

# What is n-gram Model?

In natural language processing n-gram is a contiguous sequence of n items generated from a given sample of text where the items can be characters or words and n can be any numbers like 1,2,3, etc.

For example, let us consider a line – “Either my way or no way”, so below is the possible n-gram models that we can generate –

![Screenshot%20from%202025-07-23%2000-29-27.jpeg](attachment:Screenshot%20from%202025-07-23%2000-29-27.jpeg)

Due to their frequent uses, n-gram models for n=1,2,3 have specific names as Unigram, Bigram, and Trigram models respectively.

# Use of n-grams in NLP

1) N-Grams are useful to create features from text corpus for machine learning algorithms like SVM, Naive Bayes, etc.
    
2) N-Grams are useful for creating capabilities like autocorrect, autocompletion of sentences, text summarization, speech recognition, etc.

# Unigrams or 1-grams

In [1]:
from nltk.util import ngrams

In [2]:
n = 1
sentence = 'You will face many defeats in life, but never let yourself be defeated.'

In [3]:
unigrams = ngrams(sentence.split(), n)

In [4]:
for item in unigrams:
    print(item)

('You',)
('will',)
('face',)
('many',)
('defeats',)
('in',)
('life,',)
('but',)
('never',)
('let',)
('yourself',)
('be',)
('defeated.',)


# Bigrams or 2-grams

In [5]:
from nltk.util import ngrams
n=2

In [6]:
sentence = 'The purpose of our life is to be happy'

In [7]:
bigrams = ngrams(sentence.split(),n)

In [8]:
for item in bigrams:
    print(item)

('The', 'purpose')
('purpose', 'of')
('of', 'our')
('our', 'life')
('life', 'is')
('is', 'to')
('to', 'be')
('be', 'happy')


# Trigrams or 3-grams 

In [9]:
from nltk.util import ngrams
n=3 #trigrams

In [10]:
sentence = 'The cat sat on the mat.'

In [11]:
trigrams = ngrams(sentence.split(),n)

In [12]:
for item in trigrams:
    print(item)

('The', 'cat', 'sat')
('cat', 'sat', 'on')
('sat', 'on', 'the')
('on', 'the', 'mat.')


# NLTK Everygrams

NLTK provides another function everygrams that converts a sentence into unigram, bigram, trigram, and so on till the ngrams, where n is the length of the sentence. In short, this function generates ngrams for all possible values of n.

In [13]:
from nltk.util import everygrams

In [14]:
message = "who let the dogs out"

In [15]:
msg_split = message.split()

list(everygrams(msg_split))

[('who',),
 ('who', 'let'),
 ('who', 'let', 'the'),
 ('who', 'let', 'the', 'dogs'),
 ('who', 'let', 'the', 'dogs', 'out'),
 ('let',),
 ('let', 'the'),
 ('let', 'the', 'dogs'),
 ('let', 'the', 'dogs', 'out'),
 ('the',),
 ('the', 'dogs'),
 ('the', 'dogs', 'out'),
 ('dogs',),
 ('dogs', 'out'),
 ('out',)]

# Ngrams in Textblob

Textblob is another NLP library in Python which is quite user-friendly for beginners. Below is an example of how to generate ngrams in Textblob

In [16]:
from textblob import TextBlob

data = 'Who let the dog out'
num = 3

n_grams = TextBlob(data).ngrams(num)
for grams in n_grams:
    print(grams)

['Who', 'let', 'the']
['let', 'the', 'dog']
['the', 'dog', 'out']
