<a href="https://colab.research.google.com/github/rabeahmed2002/Summaric/blob/main/Sentiment_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [None]:
t5_model_path = '/content/drive/My Drive/NLP/BART'
roberta_model_path = '/content/drive/My Drive/NLP/Roberta'

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForSequenceClassification, pipeline

# for summarization
t5_tokenizer = AutoTokenizer.from_pretrained(t5_model_path)
t5_model = AutoModelForSeq2SeqLM.from_pretrained(t5_model_path)

# for sentiment analysis
roberta_tokenizer = AutoTokenizer.from_pretrained(roberta_model_path)
roberta_model = AutoModelForSequenceClassification.from_pretrained(roberta_model_path)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at /content/drive/My Drive/NLP/Roberta and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
from transformers import pipeline

summarizer = pipeline(
    "summarization",
    model=t5_model,
    tokenizer=t5_tokenizer
)

sentiment_analyzer = pipeline(
    "sentiment-analysis",
    model=roberta_model,
    tokenizer=roberta_tokenizer
)


Device set to use cuda:0
Device set to use cuda:0


In [None]:

def summarize_text(text):
    # Split long texts into chunks for better summarization
    if len(text.split()) > 100:
        print("Input text is long; splitting into smaller chunks.")
        text_chunks = [text[i:i + 500] for i in range(0, len(text), 500)]
        summaries = [
            summarizer(chunk, max_length=100, min_length=50, do_sample=False)[0]['summary_text']
            for chunk in text_chunks
        ]
        return " ".join(summaries)
    else:
        # Summarize directly for shorter texts
        summary = summarizer(text, max_length=100, min_length=50, do_sample=False)
        return summary[0]['summary_text']


# Sentiment analysis function
def analyze_sentiment(text):
    # Analyze sentiment
    sentiment = sentiment_analyzer(text)

    # Map model labels to human-readable labels
    label_map = {
        "LABEL_0": "Negative",
        "LABEL_1": "Neutral",
        "LABEL_2": "Positive"
    }

    model_label = sentiment[0]['label']
    confidence = sentiment[0]['score']

    human_readable_label = label_map.get(model_label, "Unknown")
    return human_readable_label, confidence

# Combined function for summarization and sentiment analysis
def summarize_and_analyze(text):
    print("Original Text:\n", text)

    # Summarize the text
    summarized_text = summarize_text(text)
    print("\nSummarized Text:\n", summarized_text)

    # Perform sentiment analysis on the summarized text
    sentiment_label, sentiment_score = analyze_sentiment(summarized_text)
    print("\nSentiment Analysis:")
    print(f"Sentiment: {sentiment_label} (Confidence: {sentiment_score:.2f})")
    return summarized_text, sentiment_label, sentiment_score

# Example usage
text = """
The hotel stay was a mix of good and bad experiences. The rooms were spacious and clean, and the staff was polite. However, the food quality was subpar, and the pool area was overcrowded. Additionally, there was constant noise from nearby construction, which made relaxing difficult. Despite these issues, the location was convenient for accessing tourist attractions."""
summarized_text, sentiment_label, sentiment_score = summarize_and_analyze(text)

Your max_length is set to 100, but your input_length is only 74. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=37)


Original Text:
 
The hotel stay was a mix of good and bad experiences. The rooms were spacious and clean, and the staff was polite. However, the food quality was subpar, and the pool area was overcrowded. Additionally, there was constant noise from nearby construction, which made relaxing difficult. Despite these issues, the location was convenient for accessing tourist attractions.

Summarized Text:
 The hotel stay was a mix of good and bad experiences. The rooms were spacious and clean, and the staff was polite. However, the food quality was subpar and the pool area was overcrowded. Despite these issues, the location was convenient for accessing tourist attractions.

Sentiment Analysis:
Sentiment: Negative (Confidence: 0.53)


In [None]:
# Install necessary libraries
!pip install transformers torch sklearn

from transformers import pipeline
from sklearn.metrics import accuracy_score, precision_score, recall_score

# Initialize Sentiment Analysis Pipeline
def initialize_sentiment_model(roberta_model_path):
    return pipeline("sentiment-analysis", model=roberta_model_path)

# Initialize Summarization Model
def initialize_summarization_model(t5_model_path):
    return pipeline("summarization", model=t5_model_path)

# Perform Sentiment Analysis
def analyze_sentiment(sentiment_model, text):
    sentiment = sentiment_model(text)
    label = sentiment[0]['label']
    score = sentiment[0]['score']

    # Convert labels to Positive, Negative, Neutral for consistency
    if label.lower() == "positive":
        label = "Positive"
    elif label.lower() == "negative":
        label = "Negative"
    else:
        label = "Neutral"

    return label, score

# Perform Text Summarization
def summarize_text(summarization_model, text):
    summarized = summarization_model(text, max_length=100, min_length=50, do_sample=False)
    return summarized[0]['summary_text']

# Evaluate Model Metrics
def evaluate_model(y_true, y_pred):
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average='weighted', zero_division=0)
    recall = recall_score(y_true, y_pred, average='weighted', zero_division=0)
    return accuracy, precision, recall

# Main Function
def main():
    # Define model paths
    t5_model_path = '/content/drive/My Drive/NLP/BART'
    roberta_model_path = '/content/drive/My Drive/NLP/Roberta'

    # Initialize models
    sentiment_model = initialize_sentiment_model(roberta_model_path)
    summarization_model = initialize_summarization_model(t5_model_path)

    # Input text for testing
    text = """The customer service at the restaurant was absolutely terrible. The staff were rude and inattentive, and it took ages for our order to arrive. When the food finally came, it was cold and poorly prepared, completely ruining the experience. The ambiance was also disappointing, with loud noises and a chaotic environment. Overall, it felt like a waste of money and time, and I would not recommend this place to anyone."""

    # Perform summarization
    summarized_text = summarize_text(summarization_model, text)

    # Perform sentiment analysis
    sentiment_label, sentiment_score = analyze_sentiment(sentiment_model, text)

    # Simulated ground truth for evaluation (for demonstration purposes)
    ground_truth_sentiment = ["Negative"]  # Replace with actual labels if available
    predicted_sentiment = [sentiment_label]

    # Evaluate model metrics
    accuracy, precision, recall = evaluate_model(ground_truth_sentiment, predicted_sentiment)

    # Output results
    print("Original Text:\n", text)
    print("\nSummarized Text:\n", summarized_text)
    print("\nSentiment Analysis:\n", f"Sentiment: {sentiment_label} (Confidence: {sentiment_score:.2f})")
    print("\nModel Evaluation Metrics:")
    print(f"Accuracy: {accuracy:.2f}")
    print(f"Precision: {precision:.2f}")
    print(f"Recall: {recall:.2f}")

if __name__ == "__main__":
    main()


Collecting sklearn
  Downloading sklearn-0.0.post12.tar.gz (2.6 kB)
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py egg_info[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... [?25l[?25herror
[1;31merror[0m: [1mmetadata-generation-failed[0m

[31m×[0m Encountered error while generating package metadata.
[31m╰─>[0m See above for output.

[1;35mnote[0m: This is an issue with the package mentioned above, not pip.
[1;36mhint[0m: See above for details.


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at /content/drive/My Drive/NLP/Roberta and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0
Device set to use cuda:0
Your max_length is set to 100, but your input_length is only 87. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=43)


Original Text:
 The customer service at the restaurant was absolutely terrible. The staff were rude and inattentive, and it took ages for our order to arrive. When the food finally came, it was cold and poorly prepared, completely ruining the experience. The ambiance was also disappointing, with loud noises and a chaotic environment. Overall, it felt like a waste of money and time, and I would not recommend this place to anyone.

Summarized Text:
 The customer service at the restaurant was absolutely terrible. The staff were rude and inattentive, and it took ages for our order to arrive. The ambiance was also disappointing, with loud noises and a chaotic environment. Overall, it felt like a waste of money and time. I would not recommend this place to anyone.

Sentiment Analysis:
 Sentiment: Neutral (Confidence: 0.51)

Model Evaluation Metrics:
Accuracy: 0.00
Precision: 0.00
Recall: 0.00
