# Text Translation using Transfer Model

In [1]:
'''!pip install torch
!pip install sentencepiece
!pip install transformers'''

'!pip install torch\n!pip install sentencepiece\n!pip install transformers'

In [2]:
from transformers import MarianMTModel, MarianTokenizer
import warnings
warnings.filterwarnings("ignore")

# Load the pre-trained English to Hindi translation model and tokenizer
model_name = "Helsinki-NLP/opus-mt-en-hi"
model = MarianMTModel.from_pretrained(model_name)
tokenizer = MarianTokenizer.from_pretrained(model_name)

In [3]:
# Define a function for translation
def translate_text(input_text):
    # Tokenize the input text
    input_ids = tokenizer.encode(input_text, return_tensors="pt")

    # Generate translation
    output_ids = model.generate(input_ids, max_length=50, num_beams=5, length_penalty=0.6, no_repeat_ngram_size=2)

    # Decode and return the translated text
    translated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return translated_text

In [4]:
# Example translation
input_text = "Hello, how are you?"
translated_text = translate_text(input_text)

In [5]:
# Print the results
print("Input Text: ", input_text)
print("Translated Text: ", translated_text)


Input Text:  Hello, how are you?
Translated Text:  हैलो, तुम कैसे हो?


In [6]:
from nltk.translate.bleu_score import sentence_bleu

# Reference translation
reference_text = "नमस्ते, आप कैसे हैं?"

# Translated text (your model's output)
translated_text = "हैलो, तुम कैसे हो?"

# Tokenize the reference and translated text
reference_tokens = [token.lower() for token in reference_text.split()]
translated_tokens = [token.lower() for token in translated_text.split()]

# Calculate BLEU score
bleu_score = sentence_bleu([reference_tokens], translated_tokens)
print(f'BLEU Score: {bleu_score}')

BLEU Score: 1.2882297539194154e-231


# Text Summerization using Transfer Model

In [7]:
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

# Load pre-trained T5 model and tokenizer for summarization
tokenizer = T5Tokenizer.from_pretrained("t5-base")
model = T5ForConditionalGeneration.from_pretrained("t5-base")


You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [8]:
# Sample input text for summarization
input_text = """
Generating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day. The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges. The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create. The third option is to have the random paragraph be the ending paragraph in a short story. No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.
"""

In [9]:
# Tokenize and generate summary
inputs = tokenizer.encode("summarize: " + input_text, return_tensors="pt", max_length=512, truncation=True)
summary_ids = model.generate(inputs, max_length=150, min_length=50, length_penalty=2.0, num_beams=4, early_stopping=True)

In [10]:
# Decode and print the summary
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Input Text:\n", input_text)
print("\nGenerated Summary:\n", summary)

Input Text:
 
Generating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day. The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges. The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create. The third option is to have the random paragraph be the ending paragraph in a short story. No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.


Generated Summary:
 writer has no idea what topic the random paragraph will be about when it appears. writer can use the paragraph as the first one of a short story and build upon it. writer can also use the random paragraph somewhere in a short story they create.


 # Comparison with other models
 Transformers are like smart tools for understanding and working with language. They are really good at handling different words and figuring out their relationships, even if they are far apart. Transformers learn from lots of examples before they are asked to do a specific job, and then they fine-tune themselves to do that job really well. They are better at understanding language than other tools like RNNs or LSTMs. Transformers are especially great for tasks like translating languages, where knowing the context of the words is crucial. However, using transformers can be like having a super powerful computer – it needs a lot of energy. So, for simpler tasks or when there isn't a lot of information, using transformers might be too much. In the end, choosing the right tool depends on what job needs to be done, what resources are available, and how much information there is, and transformers work really well in many language-related jobs.