Hindi Conversation Analysis Summarization & Translation and Sentiment Analysis Using MBart Pre-trained Transformer Model


Name : Siddharth Sonkavade

**Step : 1 - Importing Libraries**

In [1]:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast, BartForConditionalGeneration, BartTokenizer, pipeline, AutoModelForTokenClassification, AutoTokenizer
import torch

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

**Step : 2 - Load the pre-trained translation model and tokenizer**

In [3]:
mbart_model = MBartForConditionalGeneration.from_pretrained('facebook/mbart-large-50-many-to-many-mmt').to(device)
mbart_tokenizer = MBart50TokenizerFast.from_pretrained('facebook/mbart-large-50-many-to-many-mmt')
bart_model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn').to(device)
bart_tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
ner_model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english").to(device)
ner_tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.44G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/261 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/529 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/649 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Step 3 : Sentiment analysis pipeline

In [4]:
sentiment_pipeline = pipeline('sentiment-analysis', device=0 if torch.cuda.is_available() else -1)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Step 4 :  The conversation transcript

In [5]:
translated_conversation = """
Rajesh from CFI Finance Company called Suresh to discuss overdue EMI payments. Rajesh informed Suresh that his EMI has been overdue for the past three months. Suresh acknowledged the delay and explained that he is under financial stress due to his mother's illness and hospital expenses. Rajesh insisted on immediate payment and mentioned the total overdue amount including late fees and interest. The conversation escalated with both parties exchanging threats and harsh words. Rajesh warned of legal action and personal visits to Suresh's home or office if the payment was not made, while Suresh expressed frustration and anger over the constant harassment and lack of empathy from the finance company.
"""

Step 5 : Extract key action items from the summary

In [6]:
def extract_key_action_items(summary):
    action_items = []
    sentences = summary.split('.')
    for sentence in sentences:
        if 'pay' in sentence or 'amount' in sentence or 'due' in sentence:
            action_items.append(sentence.strip())
    return action_items

Step 6 : Perform NER on the translated conversation

In [7]:
def perform_ner(text):
    tokens = ner_tokenizer(text, return_tensors="pt").to(device)
    outputs = ner_model(**tokens)
    predictions = torch.argmax(outputs.logits, dim=2)
    tokens = tokens.input_ids[0].tolist()
    token_predictions = predictions[0].tolist()
    decoded_predictions = [ner_model.config.id2label[pred] for pred in token_predictions]
    decoded_tokens = [ner_tokenizer.decode([token]) for token in tokens]
    return list(zip(decoded_tokens, decoded_predictions))

Step 7 : Identify non-compliances based on predefined scenarios

In [8]:
def identify_non_compliances(text):
    non_compliances = []
    if "threat" in text.lower() or "legal action" in text.lower():
        non_compliances.append("Using Threatening Language")
    if "third party" in text.lower() or "colleague" in text.lower():
        non_compliances.append("Disclosing Debt Information to Third Parties")
    if "10 PM" in text.lower():
        non_compliances.append("Calling Outside Permissible Hours")
    if "arrest" in text.lower():
        non_compliances.append("Providing False or Misleading Information")
    if "no need for proof" in text.lower():
        non_compliances.append("Failing to Validate the Debt")
    return non_compliances

Step 8 : Summarize the translated conversation

In [9]:
summary = translated_conversation
print("Summary of the conversation:\n", summary)

Summary of the conversation:
 
Rajesh from CFI Finance Company called Suresh to discuss overdue EMI payments. Rajesh informed Suresh that his EMI has been overdue for the past three months. Suresh acknowledged the delay and explained that he is under financial stress due to his mother's illness and hospital expenses. Rajesh insisted on immediate payment and mentioned the total overdue amount including late fees and interest. The conversation escalated with both parties exchanging threats and harsh words. Rajesh warned of legal action and personal visits to Suresh's home or office if the payment was not made, while Suresh expressed frustration and anger over the constant harassment and lack of empathy from the finance company.



Step 9 :  Extract key action items

In [10]:
action_items = extract_key_action_items(summary)
print("\nKey Action Items:\n", action_items)



Key Action Items:
 ['Rajesh from CFI Finance Company called Suresh to discuss overdue EMI payments', 'Rajesh informed Suresh that his EMI has been overdue for the past three months', "Suresh acknowledged the delay and explained that he is under financial stress due to his mother's illness and hospital expenses", 'Rajesh insisted on immediate payment and mentioned the total overdue amount including late fees and interest', "Rajesh warned of legal action and personal visits to Suresh's home or office if the payment was not made, while Suresh expressed frustration and anger over the constant harassment and lack of empathy from the finance company"]


Step 10 :Analyze sentiment of the translated conversation

In [11]:
sentiment = sentiment_pipeline(translated_conversation)
print("\nSentiment Analysis:\n", sentiment)


Sentiment Analysis:
 [{'label': 'NEGATIVE', 'score': 0.9924310445785522}]


Step 11 : Perform NER on the translated conversation

In [12]:
ner_results = perform_ner(translated_conversation)
print("\nNamed Entities:\n", ner_results)


Named Entities:
 [('[CLS]', 'O'), ('Raj', 'I-PER'), ('##esh', 'I-PER'), ('from', 'O'), ('CF', 'I-ORG'), ('##I', 'I-ORG'), ('Finance', 'I-ORG'), ('Company', 'I-ORG'), ('called', 'O'), ('Sure', 'I-PER'), ('##sh', 'I-PER'), ('to', 'O'), ('discuss', 'O'), ('over', 'O'), ('##due', 'O'), ('EMI', 'I-ORG'), ('payments', 'O'), ('.', 'O'), ('Raj', 'I-PER'), ('##esh', 'I-PER'), ('informed', 'O'), ('Sure', 'I-PER'), ('##sh', 'I-PER'), ('that', 'O'), ('his', 'O'), ('EMI', 'I-ORG'), ('has', 'O'), ('been', 'O'), ('over', 'O'), ('##due', 'O'), ('for', 'O'), ('the', 'O'), ('past', 'O'), ('three', 'O'), ('months', 'O'), ('.', 'O'), ('Sure', 'I-PER'), ('##sh', 'I-PER'), ('acknowledged', 'O'), ('the', 'O'), ('delay', 'O'), ('and', 'O'), ('explained', 'O'), ('that', 'O'), ('he', 'O'), ('is', 'O'), ('under', 'O'), ('financial', 'O'), ('stress', 'O'), ('due', 'O'), ('to', 'O'), ('his', 'O'), ('mother', 'O'), ("'", 'O'), ('s', 'O'), ('illness', 'O'), ('and', 'O'), ('hospital', 'O'), ('expenses', 'O'), ('.', 


Step 12 : Identify non-compliances

In [13]:
non_compliances = identify_non_compliances(translated_conversation)
print("\nNon-Compliances:\n", non_compliances)


Non-Compliances:
 ['Using Threatening Language']
