# WebPage AI Chatbot Extension - Project Demonstration

## Overview
This notebook demonstrates the NLP techniques and algorithms used in the WebPage AI Chatbot Chrome Extension project.

**Course**: Natural Language Processing (SE-3213)  
**Project**: Semester End Project  
**Student**: [Your Name]  
**University**: University of Azad Jammu & Kashmir

## 1. Project Architecture

The Chrome extension consists of several components that work together:

1. **Content Script**: Extracts and analyzes webpage content
2. **Popup Interface**: User interaction and chat interface
3. **Background Service**: Manages extension lifecycle and advanced processing
4. **NLP Pipeline**: Text processing and analysis algorithms

In [None]:
# Import necessary libraries for demonstration
import re
import nltk
from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
from wordcloud import WordCloud

# Download required NLTK data
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

try:
    nltk.data.find('corpora/stopwords')
except LookupError:
    nltk.download('stopwords')

from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords

print("Libraries imported successfully!")

## 2. Text Preprocessing Pipeline

The first step in our NLP pipeline is to clean and preprocess the extracted webpage content.

In [None]:
class TextPreprocessor:
    def __init__(self):
        self.stop_words = set(stopwords.words('english'))
    
    def clean_text(self, text):
        """Clean and normalize text content"""
        # Remove extra whitespace and normalize
        text = re.sub(r'\s+', ' ', text)
        text = text.strip()
        
        # Remove special characters but keep basic punctuation
        text = re.sub(r'[^a-zA-Z0-9\s.,!?;:\'\"-]', '', text)
        
        return text
    
    def tokenize_text(self, text):
        """Tokenize text into sentences and words"""
        sentences = sent_tokenize(text)
        words = word_tokenize(text.lower())
        
        # Filter out stop words and punctuation
        filtered_words = [
            word for word in words 
            if word.isalnum() and word not in self.stop_words and len(word) > 2
        ]
        
        return sentences, filtered_words
    
    def extract_key_phrases(self, words, top_n=10):
        """Extract key phrases using frequency analysis"""
        word_freq = Counter(words)
        return word_freq.most_common(top_n)

# Example usage
preprocessor = TextPreprocessor()

# Sample webpage content
sample_text = """
Natural Language Processing (NLP) is a fascinating field that combines computer science, 
artificial intelligence, and linguistics to help computers understand human language. 
This technology powers many applications we use daily, including chatbots, search engines, 
and language translation services. Machine learning algorithms are essential for modern NLP, 
enabling systems to learn patterns from large datasets of text. Deep learning has revolutionized 
the field with transformer models like BERT and GPT, which can generate human-like text and 
understand context better than ever before.
"""

# Preprocess the sample text
clean_text = preprocessor.clean_text(sample_text)
sentences, words = preprocessor.tokenize_text(clean_text)
key_phrases = preprocessor.extract_key_phrases(words)

print(f"Original text length: {len(sample_text)} characters")
print(f"Clean text length: {len(clean_text)} characters")
print(f"Number of sentences: {len(sentences)}")
print(f"Number of unique words: {len(set(words))}")
print(f"\nTop key phrases: {key_phrases[:5]}")

## 3. Sentiment Analysis Implementation

Our extension includes sentiment analysis to understand the tone of webpage content.

In [None]:
class SentimentAnalyzer:
    def __init__(self):
        # Define sentiment word lists
        self.positive_words = {
            'excellent', 'amazing', 'wonderful', 'fantastic', 'great', 'good', 'awesome',
            'perfect', 'outstanding', 'brilliant', 'superb', 'magnificent', 'marvelous',
            'love', 'like', 'enjoy', 'pleased', 'happy', 'delighted', 'satisfied',
            'impressive', 'remarkable', 'exceptional', 'valuable', 'useful', 'helpful'
        }
        
        self.negative_words = {
            'terrible', 'awful', 'horrible', 'bad', 'worst', 'disgusting', 'disappointing',
            'poor', 'fail', 'failed', 'wrong', 'error', 'problem', 'issue', 'difficult',
            'hate', 'dislike', 'annoying', 'frustrating', 'useless', 'worthless',
            'inadequate', 'insufficient', 'unacceptable', 'unsatisfactory', 'flawed'
        }
    
    def analyze_sentiment(self, text):
        """Analyze sentiment of text using word-based approach"""
        words = word_tokenize(text.lower())
        
        positive_score = sum(1 for word in words if word in self.positive_words)
        negative_score = sum(1 for word in words if word in self.negative_words)
        
        total_sentiment_words = positive_score + negative_score
        
        if total_sentiment_words == 0:
            return 'neutral', 0.0
        
        positive_ratio = positive_score / total_sentiment_words
        
        if positive_ratio > 0.6:
            return 'positive', positive_ratio
        elif positive_ratio < 0.4:
            return 'negative', 1 - positive_ratio
        else:
            return 'neutral', 0.5
    
    def get_sentiment_words(self, text):
        """Extract sentiment-bearing words from text"""
        words = word_tokenize(text.lower())
        
        found_positive = [word for word in words if word in self.positive_words]
        found_negative = [word for word in words if word in self.negative_words]
        
        return found_positive, found_negative

# Test sentiment analysis
sentiment_analyzer = SentimentAnalyzer()

test_texts = [
    "This is an excellent tutorial that explains the concepts very well!",
    "The interface is terrible and the features don't work properly.",
    "The article provides information about machine learning algorithms.",
    "I love how this amazing tool makes complex tasks so easy and wonderful!"
]

print("Sentiment Analysis Results:")
for i, text in enumerate(test_texts, 1):
    sentiment, confidence = sentiment_analyzer.analyze_sentiment(text)
    pos_words, neg_words = sentiment_analyzer.get_sentiment_words(text)
    
    print(f"\nText {i}: {text[:50]}...")
    print(f"Sentiment: {sentiment} (confidence: {confidence:.2f})")
    if pos_words:
        print(f"Positive words: {pos_words}")
    if neg_words:
        print(f"Negative words: {neg_words}")

## 4. Text Summarization Algorithm

The extension uses extractive summarization to provide concise summaries of webpage content.

In [None]:
class TextSummarizer:
    def __init__(self):
        self.stop_words = set(stopwords.words('english'))
    
    def score_sentences(self, sentences, word_freq):
        """Score sentences based on word frequency and position"""
        sentence_scores = {}
        
        for i, sentence in enumerate(sentences):
            words = word_tokenize(sentence.lower())
            words = [word for word in words if word.isalnum() and word not in self.stop_words]
            
            if len(words) == 0:
                continue
            
            # Calculate frequency score
            freq_score = sum(word_freq.get(word, 0) for word in words) / len(words)
            
            # Position bonus (first and last sentences are often important)
            position_score = 0
            if i == 0:  # First sentence
                position_score = 2
            elif i == len(sentences) - 1:  # Last sentence
                position_score = 1
            
            # Length penalty for very short or very long sentences
            length_score = 0
            word_count = len(words)
            if 10 <= word_count <= 25:  # Ideal length
                length_score = 1
            elif word_count < 5:  # Too short
                length_score = -1
            
            total_score = freq_score + position_score + length_score
            sentence_scores[sentence] = total_score
        
        return sentence_scores
    
    def summarize(self, text, num_sentences=2):
        """Generate extractive summary"""
        sentences = sent_tokenize(text)
        
        if len(sentences) <= num_sentences:
            return text
        
        # Get word frequency
        words = word_tokenize(text.lower())
        words = [word for word in words if word.isalnum() and word not in self.stop_words]
        word_freq = Counter(words)
        
        # Score sentences
        sentence_scores = self.score_sentences(sentences, word_freq)
        
        # Select top sentences
        top_sentences = sorted(sentence_scores.items(), key=lambda x: x[1], reverse=True)[:num_sentences]
        
        # Sort by original order
        summary_sentences = []
        for sentence in sentences:
            if any(sentence == s[0] for s in top_sentences):
                summary_sentences.append(sentence)
        
        return ' '.join(summary_sentences)

# Test summarization
summarizer = TextSummarizer()

long_text = """
Artificial Intelligence (AI) has become one of the most transformative technologies of our time. 
It encompasses various techniques including machine learning, deep learning, and neural networks. 
Machine learning algorithms can learn patterns from data without being explicitly programmed. 
Deep learning, a subset of machine learning, uses artificial neural networks with multiple layers. 
These networks can process complex data like images, text, and audio with remarkable accuracy. 
Natural Language Processing is a crucial branch of AI that focuses on language understanding. 
Computer vision enables machines to interpret and understand visual information from the world. 
AI applications are everywhere, from recommendation systems to autonomous vehicles. 
The future of AI holds even more promising developments in various fields. 
However, ethical considerations and responsible AI development remain important challenges.
"""

# Generate summary
summary = summarizer.summarize(long_text, num_sentences=3)

print("Original text:")
print(long_text.strip())
print(f"\nOriginal length: {len(long_text.split())} words")

print("\n" + "="*50)
print("Summary:")
print(summary)
print(f"\nSummary length: {len(summary.split())} words")
print(f"Compression ratio: {len(summary.split())/len(long_text.split()):.2%}")

## 5. Named Entity Recognition

The extension identifies various types of entities in webpage content.

In [None]:
import re
from datetime import datetime

class EntityRecognizer:
    def __init__(self):
        # Define regex patterns for different entity types
        self.patterns = {
            'email': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
            'url': r'https?://[^\s]+',
            'phone': r'\b(?:\+?1[-.]?)?\(?([0-9]{3})\)?[-.]?([0-9]{3})[-.]?([0-9]{4})\b',
            'date': r'\b(?:\d{1,2}[/-]\d{1,2}[/-]\d{2,4}|\d{4}[/-]\d{1,2}[/-]\d{1,2})\b',
            'time': r'\b\d{1,2}:\d{2}(?::\d{2})?\s?(?:AM|PM|am|pm)?\b',
            'currency': r'\$\d+(?:,\d{3})*(?:\.\d{2})?|\d+(?:,\d{3})*(?:\.\d{2})?\s?(?:USD|EUR|GBP|dollars?|euros?|pounds?)\b',
            'percentage': r'\d+(?:\.\d+)?%',
            'number': r'\b\d+(?:,\d{3})*(?:\.\d+)?\b'
        }
    
    def extract_entities(self, text):
        """Extract various types of entities from text"""
        entities = {}
        
        for entity_type, pattern in self.patterns.items():
            matches = re.findall(pattern, text, re.IGNORECASE)
            if matches:
                entities[entity_type] = list(set(matches))  # Remove duplicates
        
        return entities
    
    def extract_organizations(self, text):
        """Simple organization detection using common patterns"""
        # Look for company-like patterns
        org_patterns = [
            r'\b[A-Z][a-zA-Z]+\s+(?:Inc|Corp|LLC|Ltd|Company|Corporation|Technologies|Systems|Solutions)\b',
            r'\b(?:University|College|Institute)\s+of\s+[A-Z][a-zA-Z\s]+\b',
            r'\b[A-Z][a-zA-Z]+\s+(?:University|College|Institute)\b'
        ]
        
        organizations = []
        for pattern in org_patterns:
            matches = re.findall(pattern, text)
            organizations.extend(matches)
        
        return list(set(organizations))
    
    def extract_locations(self, text):
        """Simple location detection"""
        # Common location indicators
        location_patterns = [
            r'\b[A-Z][a-zA-Z]+,\s+[A-Z]{2}\b',  # City, State
            r'\b(?:in|at|from|to)\s+([A-Z][a-zA-Z\s]+?)(?:\s+(?:city|state|country|region))\b'
        ]
        
        locations = []
        for pattern in location_patterns:
            matches = re.findall(pattern, text)
            locations.extend(matches)
        
        return list(set(locations))

# Test entity recognition
entity_recognizer = EntityRecognizer()

test_text = """
Contact us at support@example.com or visit our website https://www.example.com for more information. 
You can also call us at (555) 123-4567 during business hours from 9:00 AM to 5:00 PM. 
Our office is located in New York, NY and we've been serving customers since 01/15/2020. 
The project budget is $50,000 and we expect 95% completion by the deadline. 
Microsoft Corporation and Google Inc. are major players in the technology industry. 
The University of California offers excellent computer science programs.
"""

# Extract entities
entities = entity_recognizer.extract_entities(test_text)
organizations = entity_recognizer.extract_organizations(test_text)
locations = entity_recognizer.extract_locations(test_text)

print("Named Entity Recognition Results:")
print("=" * 40)

for entity_type, entity_list in entities.items():
    if entity_list:
        print(f"\n{entity_type.upper()}:")
        for entity in entity_list:
            print(f"  - {entity}")

if organizations:
    print(f"\nORGANIZATIONS:")
    for org in organizations:
        print(f"  - {org}")

if locations:
    print(f"\nLOCATIONS:")
    for loc in locations:
        print(f"  - {loc}")

## 6. Readability Analysis

The extension calculates readability scores to help users understand content complexity.

In [None]:
class ReadabilityAnalyzer:
    def __init__(self):
        pass
    
    def count_syllables(self, word):
        """Count syllables in a word using a heuristic approach"""
        word = word.lower()
        if len(word) <= 3:
            return 1
        
        # Remove common endings that don't add syllables
        word = re.sub(r'(?:[^laeiouy]es|ed|[^laeiouy]e)$', '', word)
        word = re.sub(r'^y', '', word)
        
        # Count vowel groups
        matches = re.findall(r'[aeiouy]{1,2}', word)
        syllable_count = len(matches) if matches else 1
        
        return max(1, syllable_count)
    
    def flesch_reading_ease(self, text):
        """Calculate Flesch Reading Ease score"""
        sentences = sent_tokenize(text)
        words = word_tokenize(text)
        words = [word for word in words if word.isalpha()]
        
        if len(sentences) == 0 or len(words) == 0:
            return 0, 'unknown'
        
        # Count syllables
        total_syllables = sum(self.count_syllables(word) for word in words)
        
        # Calculate metrics
        avg_sentence_length = len(words) / len(sentences)
        avg_syllables_per_word = total_syllables / len(words)
        
        # Flesch Reading Ease formula
        score = 206.835 - (1.015 * avg_sentence_length) - (84.6 * avg_syllables_per_word)
        
        # Interpret score
        if score >= 90:
            level = 'very easy'
        elif score >= 80:
            level = 'easy'
        elif score >= 70:
            level = 'fairly easy'
        elif score >= 60:
            level = 'standard'
        elif score >= 50:
            level = 'fairly difficult'
        elif score >= 30:
            level = 'difficult'
        else:
            level = 'very difficult'
        
        return score, level
    
    def analyze_complexity(self, text):
        """Comprehensive text complexity analysis"""
        sentences = sent_tokenize(text)
        words = word_tokenize(text)
        words = [word for word in words if word.isalpha()]
        
        if len(sentences) == 0 or len(words) == 0:
            return {}
        
        # Basic metrics
        avg_sentence_length = len(words) / len(sentences)
        avg_word_length = sum(len(word) for word in words) / len(words)
        
        # Vocabulary diversity (unique words / total words)
        unique_words = len(set(word.lower() for word in words))
        lexical_diversity = unique_words / len(words)
        
        # Complex word count (words with 3+ syllables)
        complex_words = sum(1 for word in words if self.count_syllables(word) >= 3)
        complex_word_ratio = complex_words / len(words)
        
        # Flesch Reading Ease
        flesch_score, flesch_level = self.flesch_reading_ease(text)
        
        return {
            'total_sentences': len(sentences),
            'total_words': len(words),
            'unique_words': unique_words,
            'avg_sentence_length': round(avg_sentence_length, 2),
            'avg_word_length': round(avg_word_length, 2),
            'lexical_diversity': round(lexical_diversity, 3),
            'complex_words': complex_words,
            'complex_word_ratio': round(complex_word_ratio, 3),
            'flesch_score': round(flesch_score, 2),
            'flesch_level': flesch_level
        }

# Test readability analysis
readability_analyzer = ReadabilityAnalyzer()

# Test texts with different complexity levels
test_texts = {
    'Simple': "The cat sat on the mat. It was a nice day. The sun was bright.",
    'Medium': "Natural language processing enables computers to understand human language through various algorithms and techniques.",
    'Complex': "The implementation of sophisticated machine learning architectures necessitates comprehensive understanding of mathematical optimization techniques and computational linguistics principles."
}

print("Readability Analysis Results:")
print("=" * 50)

for difficulty, text in test_texts.items():
    analysis = readability_analyzer.analyze_complexity(text)
    
    print(f"\n{difficulty.upper()} TEXT:")
    print(f"Text: {text[:60]}...")
    print(f"Words: {analysis['total_words']}, Sentences: {analysis['total_sentences']}")
    print(f"Avg sentence length: {analysis['avg_sentence_length']} words")
    print(f"Avg word length: {analysis['avg_word_length']} characters")
    print(f"Lexical diversity: {analysis['lexical_diversity']}")
    print(f"Complex words: {analysis['complex_word_ratio']:.1%}")
    print(f"Flesch score: {analysis['flesch_score']} ({analysis['flesch_level']})")

## 7. Complete NLP Pipeline Integration

Let's combine all the components into a single pipeline that mimics the extension's functionality.

In [None]:
class WebPageAnalyzer:
    def __init__(self):
        self.preprocessor = TextPreprocessor()
        self.sentiment_analyzer = SentimentAnalyzer()
        self.summarizer = TextSummarizer()
        self.entity_recognizer = EntityRecognizer()
        self.readability_analyzer = ReadabilityAnalyzer()
    
    def analyze_webpage_content(self, webpage_content):
        """Complete analysis of webpage content"""
        # Preprocess text
        clean_text = self.preprocessor.clean_text(webpage_content)
        sentences, words = self.preprocessor.tokenize_text(clean_text)
        key_phrases = self.preprocessor.extract_key_phrases(words)
        
        # Analyze sentiment
        sentiment, confidence = self.sentiment_analyzer.analyze_sentiment(clean_text)
        
        # Generate summary
        summary = self.summarizer.summarize(clean_text, num_sentences=2)
        
        # Extract entities
        entities = self.entity_recognizer.extract_entities(clean_text)
        
        # Analyze readability
        readability = self.readability_analyzer.analyze_complexity(clean_text)
        
        return {
            'text_stats': {
                'total_words': len(words),
                'total_sentences': len(sentences),
                'unique_words': len(set(words))
            },
            'key_phrases': key_phrases[:10],
            'sentiment': {
                'polarity': sentiment,
                'confidence': confidence
            },
            'summary': summary,
            'entities': entities,
            'readability': readability
        }
    
    def generate_response(self, query, analysis):
        """Generate intelligent response based on analysis"""
        query_lower = query.lower()
        
        if 'summary' in query_lower or 'about' in query_lower:
            return f"This content is about: {analysis['summary']}"
        
        elif 'sentiment' in query_lower or 'tone' in query_lower:
            sentiment = analysis['sentiment']
            return f"The overall tone is {sentiment['polarity']} with {sentiment['confidence']:.1%} confidence."
        
        elif 'topic' in query_lower or 'theme' in query_lower:
            topics = [phrase[0] for phrase in analysis['key_phrases'][:5]]
            return f"Main topics include: {', '.join(topics)}"
        
        elif 'difficult' in query_lower or 'readability' in query_lower:
            level = analysis['readability']['flesch_level']
            return f"The content is {level} to read (Flesch score: {analysis['readability']['flesch_score']})."
        
        elif 'entities' in query_lower or 'contact' in query_lower:
            entities = analysis['entities']
            response_parts = []
            if 'email' in entities:
                response_parts.append(f"Emails: {', '.join(entities['email'])}")
            if 'phone' in entities:
                response_parts.append(f"Phone numbers: {', '.join(entities['phone'])}")
            if 'url' in entities:
                response_parts.append(f"URLs: {', '.join(entities['url'][:3])}")
            
            return '; '.join(response_parts) if response_parts else "No contact entities found."
        
        else:
            return f"I can help with questions about this content. Try asking about the summary, topics, sentiment, readability, or entities."

# Demonstrate the complete pipeline
analyzer = WebPageAnalyzer()

sample_webpage = """
Welcome to TechCorp Solutions - Your Premier Technology Partner

At TechCorp Solutions, we provide exceptional software development services that transform businesses. 
Our team of expert developers creates innovative solutions using cutting-edge technologies like artificial intelligence, 
machine learning, and cloud computing. We have successfully delivered over 500 projects with 98% client satisfaction.

Contact us today at info@techcorp.com or call (555) 123-4567 to discuss your project requirements. 
Visit our website at https://www.techcorp.com for more information about our services.

Our comprehensive services include:
- Custom software development
- Mobile application development
- Web development and design
- Cloud migration services
- AI and machine learning solutions

Founded in 2015, TechCorp Solutions has been at the forefront of technological innovation. 
We believe in delivering outstanding results that exceed client expectations. 
Our agile development methodology ensures rapid delivery without compromising quality.
"""

# Analyze the sample webpage
analysis = analyzer.analyze_webpage_content(sample_webpage)

print("=" * 60)
print("COMPLETE WEBPAGE ANALYSIS RESULTS")
print("=" * 60)

print(f"\n📊 TEXT STATISTICS:")
stats = analysis['text_stats']
print(f"   Words: {stats['total_words']}, Sentences: {stats['total_sentences']}, Unique words: {stats['unique_words']}")

print(f"\n🔑 KEY PHRASES:")
for phrase, freq in analysis['key_phrases'][:5]:
    print(f"   {phrase} ({freq} times)")

print(f"\n😊 SENTIMENT ANALYSIS:")
sentiment = analysis['sentiment']
print(f"   Polarity: {sentiment['polarity']} (confidence: {sentiment['confidence']:.1%})")

print(f"\n📝 SUMMARY:")
print(f"   {analysis['summary']}")

print(f"\n🏷️ ENTITIES:")
for entity_type, entity_list in analysis['entities'].items():
    if entity_list:
        print(f"   {entity_type}: {', '.join(entity_list[:3])}")

print(f"\n📖 READABILITY:")
readability = analysis['readability']
print(f"   Level: {readability['flesch_level']} (score: {readability['flesch_score']})")
print(f"   Avg sentence length: {readability['avg_sentence_length']} words")
print(f"   Complex words: {readability['complex_word_ratio']:.1%}")

# Test the response generation
print(f"\n" + "=" * 60)
print("CHATBOT RESPONSE EXAMPLES")
print("=" * 60)

test_queries = [
    "What is this page about?",
    "What's the sentiment of this content?",
    "What are the main topics?",
    "How difficult is this to read?",
    "What contact information is available?"
]

for query in test_queries:
    response = analyzer.generate_response(query, analysis)
    print(f"\nQ: {query}")
    print(f"A: {response}")

## 8. Visualization of Analysis Results

Let's create some visualizations to better understand the analysis results.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from wordcloud import WordCloud

# Set up the plotting style
plt.style.use('default')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10

# Create visualizations
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('WebPage AI Chatbot - NLP Analysis Visualization', fontsize=16, fontweight='bold')

# 1. Key Phrases Bar Chart
phrases = [phrase[0] for phrase in analysis['key_phrases'][:8]]
frequencies = [phrase[1] for phrase in analysis['key_phrases'][:8]]

bars = ax1.barh(phrases, frequencies, color='skyblue', edgecolor='navy', alpha=0.7)
ax1.set_xlabel('Frequency')
ax1.set_title('Top Key Phrases', fontweight='bold')
ax1.grid(axis='x', alpha=0.3)

# Add value labels on bars
for bar in bars:
    width = bar.get_width()
    ax1.text(width + 0.1, bar.get_y() + bar.get_height()/2, f'{int(width)}', 
             ha='left', va='center', fontweight='bold')

# 2. Readability Metrics
readability_metrics = {
    'Avg Sentence\nLength': analysis['readability']['avg_sentence_length'],
    'Avg Word\nLength': analysis['readability']['avg_word_length'],
    'Lexical\nDiversity': analysis['readability']['lexical_diversity'] * 100,  # Convert to percentage
    'Complex Words\n(%)': analysis['readability']['complex_word_ratio'] * 100
}

metrics = list(readability_metrics.keys())
values = list(readability_metrics.values())
colors = ['lightcoral', 'lightsalmon', 'lightgreen', 'lightblue']

bars = ax2.bar(metrics, values, color=colors, edgecolor='black', alpha=0.7)
ax2.set_ylabel('Value')
ax2.set_title('Readability Metrics', fontweight='bold')
ax2.grid(axis='y', alpha=0.3)

# Add value labels on bars
for bar, value in zip(bars, values):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2, height + max(values)*0.01, f'{value:.1f}', 
             ha='center', va='bottom', fontweight='bold')

# 3. Entity Distribution Pie Chart
entity_counts = {}
for entity_type, entity_list in analysis['entities'].items():
    if entity_list:
        entity_counts[entity_type.title()] = len(entity_list)

if entity_counts:
    wedges, texts, autotexts = ax3.pie(entity_counts.values(), labels=entity_counts.keys(), 
                                       autopct='%1.0f', startangle=90, 
                                       colors=['gold', 'lightcoral', 'lightskyblue', 'lightgreen'])
    ax3.set_title('Entity Distribution', fontweight='bold')
    
    # Enhance the pie chart appearance
    for autotext in autotexts:
        autotext.set_color('white')
        autotext.set_fontweight('bold')
else:
    ax3.text(0.5, 0.5, 'No entities found', ha='center', va='center', 
             transform=ax3.transAxes, fontsize=12)
    ax3.set_title('Entity Distribution', fontweight='bold')

# 4. Sentiment and Text Statistics
stats_data = {
    'Total Words': analysis['text_stats']['total_words'],
    'Total Sentences': analysis['text_stats']['total_sentences'],
    'Unique Words': analysis['text_stats']['unique_words'],
    'Flesch Score': analysis['readability']['flesch_score']
}

# Create a more informative display
ax4.axis('off')
ax4.set_title('Text Statistics & Sentiment', fontweight='bold', pad=20)

# Display statistics
y_positions = [0.8, 0.65, 0.5, 0.35, 0.2]
labels = list(stats_data.keys()) + ['Sentiment']
values = list(stats_data.values()) + [f"{analysis['sentiment']['polarity']} ({analysis['sentiment']['confidence']:.1%})"]

for i, (label, value) in enumerate(zip(labels, values)):
    ax4.text(0.1, y_positions[i], f'{label}:', fontweight='bold', fontsize=12, 
             transform=ax4.transAxes)
    ax4.text(0.6, y_positions[i], str(value), fontsize=12, 
             transform=ax4.transAxes)

# Add a background box
from matplotlib.patches import Rectangle
rect = Rectangle((0.05, 0.1), 0.9, 0.8, linewidth=2, edgecolor='gray', 
                facecolor='lightgray', alpha=0.1, transform=ax4.transAxes)
ax4.add_patch(rect)

plt.tight_layout()
plt.show()

# Create a word cloud
if analysis['key_phrases']:
    # Prepare text for word cloud
    word_freq_dict = dict(analysis['key_phrases'])
    
    # Create word cloud
    wordcloud = WordCloud(width=800, height=400, background_color='white', 
                         colormap='viridis', max_words=50).generate_from_frequencies(word_freq_dict)
    
    plt.figure(figsize=(12, 6))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis('off')
    plt.title('Key Terms Word Cloud', fontsize=16, fontweight='bold', pad=20)
    plt.tight_layout()
    plt.show()

print("\n✅ Visualizations generated successfully!")
print("\nThe Chrome extension uses these same NLP techniques to analyze webpage content in real-time.")

## 9. Project Summary and Conclusions

### Key Features Implemented:

1. **Text Preprocessing**: Clean and normalize webpage content
2. **Sentiment Analysis**: Determine the emotional tone of content
3. **Text Summarization**: Generate concise summaries using extractive methods
4. **Named Entity Recognition**: Identify emails, URLs, phone numbers, etc.
5. **Readability Analysis**: Calculate content complexity using Flesch scores
6. **Keyword Extraction**: Identify important terms and phrases
7. **Interactive Chat Interface**: Natural language query processing

### Technical Achievements:

- **Chrome Extension Architecture**: Complete browser extension with popup, content scripts, and background service worker
- **Real-time Processing**: Instant analysis of webpage content as users navigate
- **Context-Aware Responses**: Tailored answers based on specific webpage content
- **Multiple Interfaces**: Both popup and embedded widget options
- **Persistent Storage**: Chat history and user preferences

### NLP Techniques Used:

- **Statistical Text Analysis**: Frequency-based keyword extraction
- **Rule-based Processing**: Pattern matching for entity recognition
- **Heuristic Algorithms**: Sentence scoring for summarization
- **Lexical Analysis**: Syllable counting for readability metrics
- **Template-based Generation**: Structured response creation

### Educational Value:

This project demonstrates practical application of NLP concepts in a real-world scenario, combining theoretical knowledge with hands-on implementation. It showcases how various NLP techniques can be integrated to create a useful tool for content analysis and user interaction.

### Future Enhancements:

- Integration with advanced AI APIs (OpenAI, Google Gemini)
- Deep learning models for better understanding
- Multi-language support
- Voice interaction capabilities
- Advanced topic modeling with LDA

---

**This completes the demonstration of the WebPage AI Chatbot Chrome Extension project for the Natural Language Processing course.**