# Transformers: A review and use in Text Analytics, Topic Modelling and Summarization.

In [10]:
import pandas as pd
import matplotlib.pyplot as plt
from sentence_transformers import SentenceTransformer, SimilarityFunction
import torch

In [11]:
from transformers import pipeline

# BART trained on the MultiNLI (MNLI) dataset

# Topic Modeling with Zero-Shot Classification


1. **Import Libraries:** Necessary libraries like pandas, matplotlib, sentence transformers, torch, and transformers are imported.

2. **Zero-Shot Classifier:** A zero-shot classification pipeline is initialized using the `facebook/bart-large-mnli` model. This model is capable of classifying text into predefined categories without any prior training on those categories.

3. **Candidate Labels:** A list of potential topics (candidate labels) is defined: "Politics", "Sport", "Technology", "Entertainment", and "Business".

4. **Input Sequence:**  The text to be classified is provided.  This text discusses the Samsung Galaxy S25 Ultra and its connection to Galaxy AI.

5. **Classification:** The zero-shot classifier processes the input text and compares it to the provided candidate labels. It assigns probabilities (scores) to each label representing the likelihood of the text belonging to that topic.

6. **Output:** The results are displayed, showing each candidate label and its corresponding probability score.  The label with the highest probability is the predicted topic for the given input text.  The example text would most likely be classified as "Technology" due to its subject matter.

7. **Key Concept: Zero-Shot Learning:** The core of this approach is zero-shot learning. The model doesn't require explicit training data for each topic. Instead, it leverages its pre-trained knowledge to understand the relationship between the input text and the provided labels.

8. **No Training Data:** Importantly, no training data specific to these topics is used.  The model's general language understanding allows it to infer the most probable topic based on the input text and the provided labels.

In [12]:
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="facebook/bart-large-mnli")

Device set to use cpu


In [13]:
candidate_labels = ["Politics", "Sport", "Technology", "Entertainment", "Business"]

In [14]:
sequence_to_classify  = """The Samsung Galaxy S25 Ultra will be the flagship handset for the company's Galaxy AI software. Following the launch at the upcoming Galaxy Unpacked event, the S25 family, including the powerful Galaxy S25 Ultra, will be the basis for the development and growth of Galaxy AI through 2025 and beyond.
It’s an opportunity for Samsung to take the initiative and determine the future of mobile artificial intelligence.
Although Google used the launch of the Pixel 8 and Pixel 8 Pro in October 2023 as the signal to start the AI-powered smartphone revolution, Samsung’s simultaneous launch of the Galaxy S24 family and the introduction of Galaxy AI brought AI to the mainstream consumer. Kantar Research cited AI as a driving force behind sales in 2024, with nearly 1 in 4 consumers purchasing a Galaxy S24 handset on the strength of AI.
"""

In [15]:
result = classifier(sequence_to_classify, candidate_labels, multi_label=False)

# Print the result in the desired format
for label, score in zip(result['labels'], result['scores']):
  print(f"{label}: {score:.2f}")

Technology: 0.51
Business: 0.20
Sport: 0.13
Entertainment: 0.08
Politics: 0.08


# Text Summarization

1. **Utilizes BART Model:** Employs the BART (Bidirectional and Auto-Regressive Transformers) model, specifically "facebook/bart-large-xsum," which is pre-trained for extractive summarization tasks.  This model excels at condensing longer texts into shorter, coherent summaries while preserving key information.

2. **Generates Concise Summaries:** The code uses the `summarizer` pipeline to process input text (`sequence_to_classify`) and generate a summary with a specified maximum and minimum length.  The `do_sample=False` argument ensures a deterministic output (always the same summary for the same input), which is useful for reproducibility.

In [17]:
summarizer = pipeline("summarization", model="facebook/bart-large-xsum")

summary = summarizer(sequence_to_classify, max_length=80, min_length=70, do_sample=False)

Device set to use cpu


In [18]:
res=summary[0]['summary_text']

In [19]:
res

"Samsung has announced the launch of the Galaxy S25 Ultra, the first smartphone to be powered by the company's Galaxy AI artificial intelligence (AI) operating system, in October 2018 and will be available in the US in January 2019, the South Korean firm has said in a press release, citing a report by research firm Kantar Research, which cited a study by Samsung."