# **Automated Research Paper Analysis Tool**

In [None]:
!pip install datasets


Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl (1

In [None]:
import nltk

# Download the required 'punkt' tokenizer data
nltk.download('punkt')


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
import nltk

# Specify a directory to store nltk data
nltk.data.path.append('/usr/nltk_data')

# Download the 'punkt' tokenizer
nltk.download('punkt')


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [None]:
nltk.download('punkt_tab')



[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [None]:
!pip install rouge_score


Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24935 sha256=da379bccdc9331b233f7b9d222de5794668e4f242d503c693526788877db8733
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


In [None]:
!pip install rouge_score




In [None]:
!pip install evaluate


Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.3


**Import necessary libraries**

In [None]:
# Import necessary libraries
from transformers import pipeline
import nltk
from nltk.translate.bleu_score import sentence_bleu
from evaluate import load

# Download NLTK resources (Ensure 'punkt' and any other required resources are available)
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

**Define a simple research paper text (example)**

In [None]:
# Define a simple research paper text (example)
research_paper_text = """
AI in Cybersecurity – Enhancing Threat Detection and Response
Introduction
As digital technologies advance, cybersecurity threats have grown in complexity, creating significant challenges for individuals, organizations, and governments.
Cybercriminals now exploit sophisticated techniques, making traditional security solutions inadequate. Artificial Intelligence (AI) offers new avenues for addressing these
challenges by enabling advanced threat detection, predictive analytics, and automated incident response.
AI’s Role in Threat Detection
AI has redefined threat detection by surpassing the limitations of traditional rule-based security systems. Conventional systems rely heavily on signature-based methods,
which are ineffective against zero-day attacks and new, unknown malware. AI, through machine learning (ML) and deep learning (DL), learns from historical data and detects anomalies,
even in complex attack scenarios.
AI-Driven Incident Response
AI significantly enhances incident response capabilities, enabling security teams to act quickly and effectively. Automated response systems powered by AI can isolate affected systems,
neutralize threats, and restore operations with minimal human intervention.AI facilitates real-time threat intelligence by processing global cybersecurity data from diverse sources, such as threat feeds, darknet forums, and malware analysis repositories. This information is distilled into actionable insights that inform security teams of emerging attack trends, newly discovered vulnerabilities, and potential adversary tactics.
By leveraging this intelligence, organizations can adjust their defenses dynamically, staying ahead of cybercriminals.The future of AI in cybersecurity lies in the convergence of advanced technologies such as blockchain, quantum computing, and edge computing. Blockchain can enhance data integrity and transparency, while quantum-resistant algorithms will prepare defenses against future quantum-based attacks. Edge AI, deployed closer to data sources, enables real-time threat detection and response in distributed environments, such as IoT networks.

Additionally, the emergence of explainable AI (XAI) is set to address the "black-box" nature of current models, providing transparency into
 AI decision-making processes. This not only improves trust but also ensures regulatory compliance.
"""

**function to analyze the research paper (generate summary and questions)**

In [None]:
# Define preprocessing function (simple sentence tokenization)
def preprocess_text(text):
    sentences = nltk.sent_tokenize(text)  # Tokenize into sentences
    return ' '.join(sentences)  # Return as a single string for summarization

# Define the function to analyze the research paper (generate summary and questions)
def analyze_research_paper(text):
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    max_length = min(130, len(text.split()))  # Prevent overly long summaries
    summary = summarizer(
        text,
        max_length=max_length,
        min_length=30,  # Ensure the summary isn't too short
        do_sample=False
    )[0]['summary_text']

    question_generator = pipeline("text2text-generation", model="valhalla/t5-small-qg-prepend")
    questions = question_generator(
        "generate questions: " + text,
        max_length=100,
        num_return_sequences=3,
        num_beams=5
    )

    generated_questions = [q['generated_text'] for q in questions]
    return summary, generated_questions



**model evaluation**

In [None]:
# Define model evaluation function (ROUGE for Summarization, BLEU for Question Generation)
def evaluate_model(generated_summary, reference_summary, generated_questions, reference_questions):
    # ROUGE Evaluation for Summarization
    rouge_metric = load("rouge")
    rouge_results = rouge_metric.compute(predictions=[generated_summary], references=[reference_summary])
    print(f"ROUGE Score: {rouge_results}")

    # BLEU Evaluation for Question Generation
    bleu_scores = []
    for generated, reference in zip(generated_questions, reference_questions):
        bleu_score = sentence_bleu([reference.split()], generated.split())
        bleu_scores.append(bleu_score)
        print(f"BLEU Score for generated question: '{generated}' and reference: '{reference}' = {bleu_score:.4f}")

    avg_bleu_score = sum(bleu_scores) / len(bleu_scores) if bleu_scores else 0
    print(f"Average BLEU Score: {avg_bleu_score:.4f}")

# Preprocess the research paper text
preprocessed_text = preprocess_text(research_paper_text)

# Analyze the text (generate summary and questions)
generated_summary, generated_questions = analyze_research_paper(preprocessed_text)

# Define reference summary and questions for evaluation
reference_summary = "AI enhances cybersecurity by improving threat detection and enabling automated incident response."
reference_questions = [
    "How does AI help improve cybersecurity?",
    "What is AI's role in threat detection?",
    "How does AI contribute to incident response?"
]

# Print generated summary and questions
print("\033[1m📘 MAIN OUTPUT\033[0m")
print("\033[1m📄 Generated Summary:\033[0m")
print(generated_summary)
print("\n\033[1m❓ Generated Questions:\033[0m")
for question in generated_questions:
    print("-", question)


print("\n\033[1m📊 EVALUATION METRICS\033[0m")
print("\033[1m🔍 ROUGE Scores (Summarization):\033[0m")
# Evaluate the model using ROUGE and BLEU
evaluate_model(generated_summary, reference_summary, generated_questions, reference_questions)


[1m📘 MAIN OUTPUT[0m
[1m📄 Generated Summary:[0m
Artificial Intelligence (AI) enables advanced threat detection, predictive analytics, and automated incident response. The future of AI in cybersecurity lies in the convergence of advanced technologies such as blockchain, quantum computing, and edge computing.

[1m❓ Generated Questions:[0m
- What is AI's role in Threat Detection and Response?
- What is AI's role in Threat Detection?
- AI in Cybersecurity – Enhancing Threat Detection and Response Introduction

[1m📊 EVALUATION METRICS[0m
[1m🔍 ROUGE Scores (Summarization):[0m
ROUGE Score: {'rouge1': 0.3478260869565218, 'rouge2': 0.13636363636363635, 'rougeL': 0.3043478260869565, 'rougeLsum': 0.3043478260869565}
BLEU Score for generated question: 'What is AI's role in Threat Detection and Response?' and reference: 'How does AI help improve cybersecurity?' = 0.0000
BLEU Score for generated question: 'What is AI's role in Threat Detection?' and reference: 'What is AI's role in threat d