# Extractive Summarization

In [1]:
import spacy
import pytextrank

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
#Downloading spacy model from official release
!python -m spacy download en_core_web_lg

Collecting en-core-web-lg==3.6.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.6.0/en_core_web_lg-3.6.0-py3-none-any.whl (587.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.7/587.7 MB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')


In [3]:
# Loading text rank algorithm from the model
nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("textrank")

<pytextrank.base.BaseTextRankFactory at 0x16824d7b0>

In [4]:
example_text =     "Climate change refers to significant, long-term changes in the global climate. The global climate is a connected system that is always in motion, and it is being affected by human activities. One of the most noticeable effects of climate change in the past century has been the increase in temperature around the world. The average global temperature has increased by about 1.1 to 1.2 degrees Celsius since 1900. This change has led to a wide range of impacts on the environment, ecosystems, and human societies. One of the primary causes of climate change is the release of greenhouse gases into the Earth's atmosphere. These gases, such as carbon dioxide (CO2), methane (CH4), and nitrous oxide (N2O), trap heat from the sun, leading to a warming effect known as the greenhouse effect. The majority of these emissions come from human activities, including the burning of fossil fuels for energy, deforestation, and industrial processes. The consequences of climate change are far-reaching and diverse. One of the most critical impacts is the rise in sea levels caused by the melting of polar ice caps and glaciers, as well as the expansion of seawater as it warms. This rise in sea levels poses a significant threat to coastal communities and islands. Additionally, climate change has been linked to more frequent and severe weather events, such as hurricanes, droughts, heatwaves, and heavy rainfall. Ecosystems are also being affected by climate change. Shifts in temperature and weather patterns can disrupt the natural habitats of many species, leading to changes in biodiversity. Some species may become extinct if they cannot adapt quickly enough to these changes. Furthermore, climate change can exacerbate existing environmental problems, such as habitat destruction and pollution, making it even harder for ecosystems to maintain their balance. The impacts of climate change extend to human societies as well. These impacts include threats to food and water supplies, increased risks to health, economic consequences, and potential displacement of populations. For instance, changes in precipitation patterns and temperature can affect crop yields, leading to food shortages and increased prices. Warmer temperatures can also contribute to the spread of diseases. Addressing climate change requires coordinated global action. This includes reducing greenhouse gas emissions, transitioning to renewable energy sources, and protecting and restoring forests. Additionally, societies need to adapt to the changes that are already underway. This involves building resilient infrastructure, developing sustainable agricultural practices, and planning for potential climate-related disasters. In conclusion, climate change is a complex and urgent issue that impacts the entire planet. It demands immediate and sustained action to mitigate its effects and safeguard the future of the environment and human societies."

In [5]:
doc = nlp(example_text)

In [6]:
for sent in doc._.textrank.summary(limit_phrases=2):
      print(sent)

Climate change refers to significant, long-term changes in the global climate.
One of the most noticeable effects of climate change in the past century has been the increase in temperature around the world.
One of the primary causes of climate change is the release of greenhouse gases into the Earth's atmosphere.
The consequences of climate change are far-reaching and diverse.


In [7]:
#Top phrases
phrases_and_ranks = [ 
    (phrase.chunks[0], phrase.rank) for phrase in doc._.phrases
]
phrases_and_ranks[:10]

[(Climate change, 0.10869038225661731),
 (climate change, 0.10869038225661731),
 (changes, 0.09387385362416058),
 (human societies, 0.08751819660811201),
 (human activities, 0.07806247408198873),
 (industrial processes, 0.06757004693261125),
 (greenhouse gas emissions, 0.06711841921905067),
 (greenhouse gases, 0.0651740756208189),
 (societies, 0.06437598245715484),
 (increased prices, 0.06271670953238262)]

In [8]:
import pandas as pd
data = pd.read_csv('Summarizer_Data-Final.csv')
data.head()

Unnamed: 0,title,content,summary
0,TnT - A Statistical Part-Of-Speech Tagger,Trigrams'n'Tags (TnT) is an efficient statisti...,Trigrams'n'Tags (TnT) is an efficient statisti...
1,Sentence Reduction For Automatic Text Summariz...,Figure 2: Sample sentence and parse tree we ha...,We present a novel sentence reduction system f...
2,Advances In Domain Independent Linear Text Seg...,This paper describes a method for linear text ...,This paper describes a method for linear text ...
3,A Simple Approach To Building Ensembles Of Nai...,This paper presents a corpus-based approach to...,This paper presents a corpus-based approach to...
4,A Maximum-Entropy-Inspired Parser,We present a new parser for parsing down to Pe...,We present a new parser for parsing down to Pe...


In [9]:
#!pip install pandas summa rouge-score spacy
#!pip install rouge-score
import pandas as pd
from summa import summarizer
from rouge_score import rouge_scorer
import spacy

# Load the spaCy model with TextRank
nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("textrank")

df = data

# Initialize the ROUGE scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

# Lists to store reference and hypothesis summaries for ROUGE evaluation
reference_summaries = []
hypothesis_summaries = []

# Iterate through each row in the DataFrame
for index, row in df.iterrows():
    # Extract the paper text and summary from the current row
    paper_text = str(row['content'])  # Convert to Unicode string
    gold_summary = str(row['summary'])  # Convert to Unicode string

    # Apply TextRank for extractive summarization
    summarized_text = summarizer.summarize(paper_text)

    # Append the reference and hypothesis summaries for ROUGE evaluation
    reference_summaries.append(gold_summary)
    hypothesis_summaries.append(summarized_text)

# Calculate ROUGE scores
total_scores = {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0}
num_samples = len(df)

for ref_summary, hyp_summary in zip(reference_summaries, hypothesis_summaries):
    scores = scorer.score(ref_summary, hyp_summary)
    total_scores['rouge1'] += scores['rouge1'].fmeasure
    total_scores['rouge2'] += scores['rouge2'].fmeasure
    total_scores['rougeL'] += scores['rougeL'].fmeasure

# Calculate average ROUGE scores
avg_scores = {metric: score / num_samples for metric, score in total_scores.items()}

# Print average ROUGE scores
print("Average ROUGE Scores:")
print(avg_scores)


Average ROUGE Scores:
{'rouge1': 0.18292068971825537, 'rouge2': 0.10993173711589825, 'rougeL': 0.12635901084968618}
