<a href="https://colab.research.google.com/github/meetAmarAtGithub/Research-Papers/blob/main/Text_Summary%5CText_Summary.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The proposed work on "opinion summarization" using abstractive-based and extractive-based techniques. The abstractive-based approach involves constructing graphs from text and generating candidate summaries by exploring graph properties and considering sentiment fusion. The extractive-based approach aims to reduce dimensions and find summary sentences based on thematic words and ranking.

The proposed algorithm consists of several steps. In the abstractive-based approach, it starts with constructing graphs where nodes represent tokens in the text and edges represent word adjacency. The accuracy of the constructed sentences is ensured by following a set of rules. Sentences are scored, sentiments are merged, and redundant sentences are removed. Finally, the sentences are ranked for summarization.


The code implementation is text summarization using **graph-based ranking**.
Following is approach for generating summaries:

1. It tokenizes the input text into sentences using the `sent_tokenize` function from NLTK.
2. It builds a graph representation of the sentences, where each node represents a sentence and edges represent the similarity between sentences.
3. The similarity between two sentences is calculated using cosine similarity based on the bag-of-words representation of the sentences.
4. The graph is then ranked using the PageRank algorithm, assigning scores to each sentence.
5. The sentences are sorted based on their scores, and the top-ranked sentences are selected to form the summary.
6. Finally, the selected sentences are concatenated to generate the summary.

In [20]:
import nltk
nltk.download('punkt')
nltk.download('vader_lexicon')
from nltk.tokenize import sent_tokenize
from nltk.sentiment import SentimentIntensityAnalyzer
import networkx as nx
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def build_graph(sentences):
    graph = nx.Graph()
    graph.add_nodes_from(range(len(sentences)))

    for i in range(len(sentences)):
        for j in range(i + 1, len(sentences)):
            similarity_score = calculate_similarity(sentences[i], sentences[j])

            if similarity_score > 0:
                graph.add_edge(i, j, weight=similarity_score)

    return graph

def calculate_similarity(sentence1, sentence2):
    vectorizer = CountVectorizer().fit_transform([sentence1, sentence2])
    vectors = vectorizer.toarray()
    similarity = cosine_similarity(vectors)

    return similarity[0][1]

def generate_summary(text, num_sentences):
    sentences = sent_tokenize(text)

    if num_sentences >= len(sentences):
        return text

    graph = build_graph(sentences)
    scores = nx.pagerank(graph)

    ranked_sentences = sorted(((scores[i], i) for i in graph.nodes()), reverse=True)
    summary_sentences = [sentences[idx] for _, idx in ranked_sentences[:num_sentences]]

    summary = " ".join(summary_sentences)
    return summary

# Example usage
text = "Cerussite is a mineral consisting of lead carbonate (PbCO3), and is an important ore of lead. The name is from the Latin cerussa, white lead. Cerussa nativa was mentioned by Conrad Gessner in 1565, and in 1832 François Sulpice Beudant applied the name céruse to the mineral, while the present form, cerussite, is due to Wilhelm Karl Ritter von Haidinger in 1845. Miners' names for cerussite in early use were lead-spar and white-lead-ore. In a hydrate form known as white lead, the mineral is a key ingredient in lead paints and has also been used in cosmetics, but both uses are now discontinued in many places as a result of lead poisoning. These cerussite crystals, measuring approximately 4.0 cm × 3.0 cm × 2.0 cm (1.57 in × 1.18 in × 0.79 in), were found in a mine in Madan-e Nakhlak, Iran."
summary = generate_summary(text, num_sentences=1)
print(summary)


In a hydrate form known as white lead, the mineral is a key ingredient in lead paints and has also been used in cosmetics, but both uses are now discontinued in many places as a result of lead poisoning.


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
