**Hindi Conversation Analysis Summarization & Translation and Sentiment Analysis Using MBart Pre-trained Transformer Model**


Name : Siddharth Sonkavade



**Step : 1 - Importing Libraries**

In [None]:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast, BartForConditionalGeneration, BartTokenizer

**Step : 2 - Load the pre-trained translation model and tokenizer**


In [None]:
translation_model_name = "facebook/mbart-large-50-many-to-many-mmt"
translation_model = MBartForConditionalGeneration.from_pretrained(translation_model_name)
translation_tokenizer = MBart50TokenizerFast.from_pretrained(translation_model_name)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


**Step : 3 - Load the pre-trained summarization model and tokenizer**

In [None]:
summarization_model_name = "facebook/bart-large-cnn"
summarization_model = BartForConditionalGeneration.from_pretrained(summarization_model_name)
summarization_tokenizer = BartTokenizer.from_pretrained(summarization_model_name)


**Step : 4 - Set the source and target language for translation**

In [None]:
translation_tokenizer.src_lang = "hi_IN"
target_lang = "en_XX"


Step : 5 - Text Translation Function Using Transformers

In [None]:
def translate_text(text):

    encoded_text = translation_tokenizer(text, return_tensors="pt")

    generated_tokens = translation_model.generate(**encoded_text, forced_bos_token_id=translation_tokenizer.lang_code_to_id[target_lang])

    translated_text = translation_tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
    return translated_text


Step : 6 - Text Summarization Function Using Transformers

In [None]:
def summarize_text(text):

    inputs = summarization_tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=1024, truncation=True)

    summary_ids = summarization_model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)

    summary = summarization_tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary


The Hindi conversation

In [None]:
hindi_conversation = """
Recovery Agent (RA): नमस्ते श्री कुमार, मैं एक्स वाई जेड फाइनेंस से बोल रहा हूं। आपके लोन के बारे में बात करनी थी।
Borrower (B): हां, बोलिए। क्या बात है?
RA: सर, आपका पिछले महीने का EMI अभी तक नहीं आया है। क्या कोई समस्या है?
B: हां, थोड़ी दिक्कत है। मेरी नौकरी चली गई है और मैं नया काम ढूंढ रहा हूं।
RA: ओह, यह तो बुरा हुआ। लेकिन सर, आपको समझना होगा कि लोन का भुगतान समय पर करना बहुत जरूरी है।
B: मैं समझता हूं, लेकिन अभी मेरे पास पैसे नहीं हैं। क्या कुछ समय मिल सकता है?
RA: हम समझते हैं आपकी स्थिति। क्या आप अगले हफ्ते तक कुछ भुगतान कर सकते हैं?
B: मैं कोशिश करूंगा, लेकिन पूरा EMI नहीं दे पाऊंगा। क्या आधा भुगतान चलेगा?
RA: ठीक है, आधा भुगतान अगले हफ्ते तक कर दीजिए। बाकी का क्या प्लान है आपका?
B: मुझे उम्मीद है कि अगले महीने तक मुझे नया काम मिल जाएगा। तब मैं बाकी बकाया चुका दूंगा।
RA: ठीक है। तो हम ऐसा करते हैं - आप अगले हफ्ते तक आधा EMI जमा कर दीजिए, और अगले महीने के 15 तारीख तक बाकी का भुगतान कर दीजिए। क्या यह आपको स्वीकार है?
B: हां, यह ठीक रहेगा। मैं इस प्लान का पालन करने की पूरी कोशिश करूंगा।
RA: बहुत अच्छा। मैं आपको एक SMS भेज रहा हूं जिसमें भुगतान की डिटेल्स होंगी। कृपया इसका पालन करें और समय पर भुगतान करें।
B: ठीक है, धन्यवाद आपके समझने के लिए।
RA: आपका स्वागत है। अगर कोई और सवाल हो तो मुझे बताइएगा। अलविदा।
B: अलविदा।
"""


Translate the conversation

In [None]:
english_translation = translate_text(hindi_conversation)
print("Translated Conversation:\n", english_translation)


Translated Conversation:
 Recovery Agent (RA): Hello, Mr. Kumar, I'm talking to X Y Z Finance. We had to talk about your loan. Borrower (B): Yes, talk. What's the matter? RA: Sir, your last month EMI hasn' t come yet. Is there a problem? B: Yes, it's a little late. My job has left and I' m looking for a new job. RA: Oh, that's bad. But sir, you have to understand that it' s very important to pay the loan on time. B: I understand, but I don 't have the money yet. Can I get some time? RA: We understand your situation. Can you make some payments by next week? B: I' ll try, but I won 't be able to pay the full EMI. Can half the payment go? RA: Okay, make half the payments by next


Summarize the translated conversation

In [None]:
conversation_summary = summarize_text(english_translation)
print("\nSummarized Conversation:\n", conversation_summary)



Summarized Conversation:
 Recovery Agent: Sir, your last month EMI hasn' t come yet. Borrower: Yes, it's a little late. My job has left and I' m looking for a new job.


Import Library's Sentiment Analysis Download and Initialization with VADER

In [None]:
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import re

nltk.download('vader_lexicon')

sid = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Conversation Splitting Function for Translated Text

In [None]:
def split_conversation(english_translation):
    ra_text = []
    b_text = []

    ra_tags = ["Recovery Agent (RA):", "RA:"]
    b_tags = ["Borrower (B):", "B:"]

    parts = english_translation.split()

    current_speaker = None
    current_sentence = []

    for word in parts:
        if any(tag in word for tag in ra_tags):
            if current_speaker == "B":
                b_text.append(" ".join(current_sentence).strip())
                current_sentence = []
            current_speaker = "RA"
            current_sentence.append(word.replace("Recovery Agent (RA):", "").replace("RA:", "").strip())
        elif any(tag in word for tag in b_tags):
            if current_speaker == "RA":
                ra_text.append(" ".join(current_sentence).strip())
                current_sentence = []
            current_speaker = "B"
            current_sentence.append(word.replace("Borrower (B):", "").replace("B:", "").strip())
        else:
            current_sentence.append(word)

    if current_speaker == "RA":
        ra_text.append(" ".join(current_sentence).strip())
    elif current_speaker == "B":
        b_text.append(" ".join(current_sentence).strip())

    return " ".join(ra_text), " ".join(b_text)


In [None]:
ra_text, b_text = split_conversation(english_translation)

Print Separated Recovery Agent and Borrower Texts

In [None]:
print(ra_text)
print(b_text)

Recovery Agent (RA): Hello, Mr. Kumar, I'm talking to X Y Z Finance. We had to talk about your loan. Borrower (B): Yes, talk. What's the matter?  Sir, your last month EMI hasn' t come yet. Is there a problem? Oh, that's bad. But sir, you have to understand that it' s very important to pay the loan on time. We understand your situation. Can you make some payments by next week? Okay, make half the payments by next
Yes, it's a little late. My job has left and I' m looking for a new job. I understand, but I don 't have the money yet. Can I get some time? I' ll try, but I won 't be able to pay the full EMI. Can half the payment go?


In [None]:
def analyze_sentiment(text):
    scores = sid.polarity_scores(text)
    return scores

Analyzing and Printing Sentiment Scores for Conversations

In [None]:
ra_sentiment = analyze_sentiment(ra_text)
b_sentiment = analyze_sentiment(b_text)

print("Recovery Agent Sentiment Scores:", ra_sentiment)
print("Borrower Sentiment Scores:", b_sentiment)

Recovery Agent Sentiment Scores: {'neg': 0.076, 'neu': 0.82, 'pos': 0.105, 'compound': 0.3256}
Borrower Sentiment Scores: {'neg': 0.032, 'neu': 0.822, 'pos': 0.146, 'compound': 0.7691}
