<a href="https://colab.research.google.com/github/pranalibose/LLM_Workshop/blob/main/LLM_Applications.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import warnings

warnings.filterwarnings('ignore')

In [2]:
%%capture
!pip install transformers[sentencepiece]

In [3]:
from transformers import pipeline
import textwrap
wrapper = textwrap.TextWrapper(width=80, break_long_words=False, break_on_hyphens=False)

# Classifying whole sentences

In [4]:
sentence = '''
Here comes the part which brought a smile to my face. On our way back we took Vistara flight and
during check-in they took a written note of my damaged bag. However, when I received the bag at the conveyor
belt on arrival, I realised they had fixed it with another wheel, assuming the damage might have happened at
their end. This Gesture of Vistara will always stay with me.
'''
classifier = pipeline('text-classification', model='distilbert-base-uncased-finetuned-sst-2-english')
c = classifier(sentence)
print('\nSentence:')
print(wrapper.fill(sentence))
print(f"\nThis sentence is classified with a {c[0]['label']} sentiment")

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]


Sentence:
 Here comes the part which brought a smile to my face. On our way back we took
Vistara flight and  during check-in they took a written note of my damaged bag.
However, when I received the bag at the conveyor  belt on arrival, I realised
they had fixed it with another wheel, assuming the damage might have happened at
their end. This Gesture of Vistara will always stay with me.

This sentence is classified with a POSITIVE sentiment


# Classifying each word in a sentence (Named Entity Recognition)

In [5]:
sentence = '''
In 2013, two legendary brands, Tata Sons and Singapore Airlines, decided to fulfil a
long-cherished shared dream to bring forth a distinguished flying experience to air travellers in India.
With its strong historical ties with aviation, the Tata group had long wished to re-enter the aviation sector,
after Tata Airlines was renamed Air India and eventually, nationalised. Both, Tata group and Singapore
Airlines were also firm believers in the growth potential of the Indian aviation sector and hence tried to
enter the market in the past; first, in 1994 by setting up a joint venture to start an airline in India and
then in 2000, teaming up to purchase stakes in Air India. However, after the lifting of foreign investment
restrictions in 2012, the partners once again sought approval for a tie-up, which it obtained in October 2013.
On November 5, 2013, Vistara’s holding company, TATA SIA Airlines Limited, was incorporated.
'''

classifier = pipeline('text-classification', model='distilbert-base-uncased-finetuned-sst-2-english')
c = classifier(sentence)
print('\nSentence:')
print(wrapper.fill(sentence))
print(f"\nThis sentence is classified with a {c[0]['label']} sentiment")


Sentence:
 In 2013, two legendary brands, Tata Sons and Singapore Airlines, decided to
fulfil a  long-cherished shared dream to bring forth a distinguished flying
experience to air travellers in India.  With its strong historical ties with
aviation, the Tata group had long wished to re-enter the aviation sector,  after
Tata Airlines was renamed Air India and eventually, nationalised. Both, Tata
group and Singapore  Airlines were also firm believers in the growth potential
of the Indian aviation sector and hence tried to  enter the market in the past;
first, in 1994 by setting up a joint venture to start an airline in India and
then in 2000, teaming up to purchase stakes in Air India. However, after the
lifting of foreign investment  restrictions in 2012, the partners once again
sought approval for a tie-up, which it obtained in October 2013.  On November 5,
2013, Vistara’s holding company, TATA SIA Airlines Limited, was incorporated.

This sentence is classified with a POSITIVE sentim

# Classifying each word in a sentence (Named Entity Recognition)

In [6]:
sentence = '''
In 2013, two legendary brands, Tata Sons and Singapore Airlines, decided to fulfil a long-cherished shared
dream to bring forth a distinguished flying experience to air travellers in India.
'''

ner = pipeline('token-classification', model='dbmdz/bert-large-cased-finetuned-conll03-english', grouped_entities=True)
ners = ner(sentence)
print('\nSentence:')
print(wrapper.fill(sentence))
print('\n')
for n in ners:
    print(f"{n['word']} -> {n['entity_group']}")

config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]


Sentence:
 In 2013, two legendary brands, Tata Sons and Singapore Airlines, decided to
fulfil a long-cherished shared  dream to bring forth a distinguished flying
experience to air travellers in India.


Tata Sons -> ORG
Singapore Airlines -> ORG
India -> LOC


# Answering a question given a context

In [7]:
context = '''
In 2013, two legendary brands, Tata Sons and Singapore Airlines, decided to fulfil a long-cherished shared
dream to bring forth a distinguished flying experience to air travellers in India.
With its strong historical ties with aviation, the Tata group had long wished to re-enter the aviation sector,
after Tata Airlines was renamed Air India and eventually, nationalised. Both, Tata group and
Singapore Airlines were also firm believers in the growth potential of the Indian aviation sector and hence
tried to enter the market in the past; first, in 1994 by setting up a joint venture to start an airline in
India and then in 2000, teaming up to purchase stakes in Air India. However, after the lifting of foreign
investment restrictions in 2012, the partners once again sought approval for a tie-up, which it obtained in
October 2013. On November 5, 2013, Vistara’s holding company, TATA SIA Airlines Limited, was incorporated.'''


question = 'When was Vistara established?'

print('Text:')
print(wrapper.fill(context))
print('\nQuestion:')
print(question)

Text:
 In 2013, two legendary brands, Tata Sons and Singapore Airlines, decided to
fulfil a long-cherished shared  dream to bring forth a distinguished flying
experience to air travellers in India.  With its strong historical ties with
aviation, the Tata group had long wished to re-enter the aviation sector,  after
Tata Airlines was renamed Air India and eventually, nationalised. Both, Tata
group and  Singapore Airlines were also firm believers in the growth potential
of the Indian aviation sector and hence  tried to enter the market in the past;
first, in 1994 by setting up a joint venture to start an airline in  India and
then in 2000, teaming up to purchase stakes in Air India. However, after the
lifting of foreign  investment restrictions in 2012, the partners once again
sought approval for a tie-up, which it obtained in  October 2013. On November 5,
2013, Vistara’s holding company, TATA SIA Airlines Limited, was incorporated.

Question:
When was Vistara established?


In [8]:
from transformers import pipeline

qa = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')

print('\nQuestion:')
print(question + '\n')
print('Answer:')
a = qa(context=context, question=question)
a['answer']

config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]


Question:
When was Vistara established?

Answer:


'November 5, 2013'

# Text summarization

In [9]:
review = '''
I've never felt so utterly helpless and let down as I did with Vistara today. No staff present, no proper
guidance provided, and an obscene amount of time wasted. After repeatedly trying to contact the staff,
I was forced to wait an eternity, and they were completely useless. Because no staff was present,
I couldn't do web check-in myself. Later, they had the audacity to blame me and refused to accept any
responsibility for their failures. Despite not doing web check-in, nothing was refunded. Instead, they made
me wait endlessly for staff to address my concerns. If the management had any competence, I wouldn't have
missed my flight despite arriving 50 minutes early. This pathetic behavior makes me question if Vistara even
cares about its passengers. I would strongly urge everyone to avoid this airline and choose one where you at
least get proper guidance and don't have to wait endlessly for staff to appear. This was an absolutely awful
experience, and I feel utterly devastated by how I've been treated.'''

print('\nOriginal text:\n')
print(wrapper.fill(review))
summarize = pipeline('summarization', model='sshleifer/distilbart-cnn-12-6')
summarized_text = summarize(review)[0]['summary_text']
print('\nSummarized text:')
print(wrapper.fill(summarized_text))


Original text:

 I've never felt so utterly helpless and let down as I did with Vistara today.
No staff present, no proper  guidance provided, and an obscene amount of time
wasted. After repeatedly trying to contact the staff,  I was forced to wait an
eternity, and they were completely useless. Because no staff was present,  I
couldn't do web check-in myself. Later, they had the audacity to blame me and
refused to accept any  responsibility for their failures. Despite not doing web
check-in, nothing was refunded. Instead, they made  me wait endlessly for staff
to address my concerns. If the management had any competence, I wouldn't have
missed my flight despite arriving 50 minutes early. This pathetic behavior makes
me question if Vistara even  cares about its passengers. I would strongly urge
everyone to avoid this airline and choose one where you at  least get proper
guidance and don't have to wait endlessly for staff to appear. This was an
absolutely awful  experience, and I feel u

config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]


Summarized text:
 No staff present, no proper guidance provided, and an obscene amount of time
wasted . After repeatedly trying to contact the staff, I was forced to wait an
eternity, and they were useless . Because no staff was present,  I couldn't do
web check-in myself . Later, they had the audacity to blame me and refused to
accept any  responsibilities for their failures .


# Fill in the blanks

In [10]:
sentence = 'Singapore Airlines and Tata Group are the <mask> of Vistara'
mask = pipeline('fill-mask', model='distilroberta-base')
masks = mask(sentence)
for m in masks:
    print(m['sequence'])

config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Singapore Airlines and Tata Group are the owners of Vistara
Singapore Airlines and Tata Group are the shareholders of Vistara
Singapore Airlines and Tata Group are the founders of Vistara
Singapore Airlines and Tata Group are the subsidiaries of Vistara
Singapore Airlines and Tata Group are the sponsors of Vistara


# Translation (English to German)

In [11]:
english = '''Vistara is a joint venture of 2 individual entities
Tata Sons Private Limited and Singapore Airlines Limited'''

translator = pipeline('translation_en_to_de', model='t5-base')
german = translator(english)
print('\nEnglish:')
print(english)
print('\nGerman:')
print(german[0]['translation_text'])

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]


English:
Vistara is a joint venture of 2 individual entities 
Tata Sons Private Limited and Singapore Airlines Limited

German:
Vistara ist ein Joint Venture aus 2 Einzelunternehmen Tata Sons Private Limited und Singapore Airlines Limited.


**Note: All of the texts used in the examples are just for reference and demonstration purpose.**