# ü§ñ GenAI Detection Chatbot - Google Colab Version

This notebook implements the **AI-Generated Scholarly Paper Detection System** with an **Explainable AI Chatbot**.

**Features:**
- GenAI Feature Extraction (GPT, Gemini, Claude pattern detection)
- Perplexity & Burstiness Analysis
- Citation Hallucination Detection
- Interactive Explainer Chatbot

---
**Author:** AI-Generated Scholarly Paper Detection System  
**For Academic Use Only**

## 1. Install Dependencies

In [None]:
# No external dependencies needed! All using Python standard library
print("‚úÖ All dependencies ready (using Python standard library)")

## 2. GenAI Feature Extractor Class

In [None]:
import re
import math
from collections import Counter
from typing import Dict, List, Tuple, Any


class GenAIFeatureExtractor:
    """
    Extracts GenAI-specific linguistic features from scholarly text.
    
    This class implements detection algorithms for identifying patterns
    that are characteristic of AI-generated content from various LLMs.
    """
    
    def __init__(self):
        """Initialize the feature extractor with pattern definitions."""
        
        # GPT-style repetitive phrases
        self.gpt_repetitive_patterns = [
            r'\b(in conclusion|to summarize|it is important to note)\b',
            r'\b(as mentioned (earlier|above|previously))\b',
            r'\b(this (demonstrates|shows|indicates|suggests) that)\b',
            r'\b(it (is|can be) (argued|said|noted) that)\b',
            r'\b(the (fact|idea|concept|notion) that)\b',
            r'\b(in (this|the) (context|regard|respect))\b',
            r'\b(plays a (crucial|vital|important|significant|key) role)\b',
            r'\b(it is worth (noting|mentioning|pointing out))\b',
            r'\b(one (can|could|might) argue that)\b',
            r'\b(this (paper|study|research|article) (aims|seeks|attempts))\b',
        ]
        
        # Claude uncertainty hedging phrases
        self.claude_hedging_patterns = [
            r'\b(I think|I believe|I would say|I\'d suggest)\b',
            r'\b(perhaps|possibly|potentially|presumably)\b',
            r'\b(it (seems|appears|looks) (like|as if|that))\b',
            r'\b(may or may not)\b',
            r'\b(to some (extent|degree))\b',
            r'\b(it (could|might|may) be (the case|argued|that))\b',
            r'\b(there is a possibility that)\b',
            r'\b(one (possible|potential) (explanation|interpretation))\b',
            r'\b(this (could|might|may) (suggest|indicate|imply))\b',
            r'\b(it is (possible|plausible|conceivable) that)\b',
        ]
        
        # Gemini explanatory overflow patterns
        self.gemini_overflow_patterns = [
            r'\b(let me explain|let me clarify|to be more specific)\b',
            r'\b(in other words|put (simply|differently|another way))\b',
            r'\b(to (elaborate|expand) (on|further))\b',
            r'\b(what (this|I) mean(s)? (is|by this))\b',
            r'\b(essentially|fundamentally|basically)\b',
            r'\b(for (example|instance)|such as|namely)\b',
            r'\b(this (is|means|refers to))\b',
            r'\b(to (understand|grasp|comprehend) this)\b',
            r'\b((first|second|third)(ly)?[,:]?\s*(we|one|you))\b',
            r'\b(it\'s (important|crucial|essential) to (understand|note|realize))\b',
        ]
        
        # Suspicious citation patterns (potential hallucinations)
        self.suspicious_citation_patterns = [
            r'\((?:Smith|Johnson|Williams|Brown|Jones|Davis|Miller)\s+et\s+al\.\s*,?\s*\d{4}\)',
            r'\(\w+\s+et\s+al\.\s*,?\s*(2025|2026|2027|2028|2029|2030)\)',
            r'\((?:Study|Research|Survey|Analysis)\s+\d{4}\)',
            r'\[?\d+\]?\s*(?=\.|,|;|\s*$)',
            r'\((?:University|Institute|Organization)\s+\d{4}\)',
        ]
        
    def extract_all_features(self, text: str) -> Dict[str, Any]:
        """
        Extract all GenAI features from the given text.
        
        Args:
            text: The scholarly paper text to analyze
            
        Returns:
            Dictionary containing all extracted features and scores
        """
        if not text or not text.strip():
            return self._empty_features()
        
        # Extract individual features
        gpt_score, gpt_details = self.detect_gpt_repetition(text)
        gemini_score, gemini_details = self.detect_gemini_overflow(text)
        claude_score, claude_details = self.detect_claude_hedging(text)
        burstiness_score, burstiness_details = self.calculate_burstiness(text)
        citation_score, citation_details = self.detect_citation_hallucination(text)
        perplexity_score, perplexity_details = self.estimate_perplexity(text)
        
        # Calculate composite GenAI score (weighted average)
        composite_score = self._calculate_composite_score(
            gpt_score, gemini_score, claude_score, 
            burstiness_score, citation_score, perplexity_score
        )
        
        return {
            'composite_score': round(composite_score, 3),
            'features': {
                'gpt_repetition': {
                    'score': round(gpt_score, 3),
                    'details': gpt_details
                },
                'gemini_overflow': {
                    'score': round(gemini_score, 3),
                    'details': gemini_details
                },
                'claude_hedging': {
                    'score': round(claude_score, 3),
                    'details': claude_details
                },
                'burstiness': {
                    'score': round(burstiness_score, 3),
                    'details': burstiness_details
                },
                'citation_hallucination': {
                    'score': round(citation_score, 3),
                    'details': citation_details
                },
                'perplexity': {
                    'score': round(perplexity_score, 3),
                    'details': perplexity_details
                }
            },
            'interpretation': self._generate_interpretation(
                gpt_score, gemini_score, claude_score,
                burstiness_score, citation_score, perplexity_score
            )
        }
    
    def detect_gpt_repetition(self, text: str) -> Tuple[float, Dict]:
        """Detect GPT-style repetitive phrase patterns."""
        text_lower = text.lower()
        word_count = len(text.split())
        
        matches = []
        total_matches = 0
        
        for pattern in self.gpt_repetitive_patterns:
            found = re.findall(pattern, text_lower, re.IGNORECASE)
            if found:
                total_matches += len(found)
                matches.extend(found[:3])
        
        normalized_frequency = (total_matches / max(word_count, 1)) * 1000
        score = min(1.0, normalized_frequency / 15)
        
        return score, {
            'matches_found': total_matches,
            'examples': matches[:5],
            'frequency_per_1000': round(normalized_frequency, 2),
            'description': 'GPT-style repetitive phrases detected'
        }
    
    def detect_gemini_overflow(self, text: str) -> Tuple[float, Dict]:
        """Detect Gemini-style explanatory overflow patterns."""
        text_lower = text.lower()
        word_count = len(text.split())
        
        matches = []
        total_matches = 0
        
        for pattern in self.gemini_overflow_patterns:
            found = re.findall(pattern, text_lower, re.IGNORECASE)
            if found:
                total_matches += len(found)
                matches.extend([str(f) for f in found[:3]])
        
        normalized_frequency = (total_matches / max(word_count, 1)) * 1000
        score = min(1.0, normalized_frequency / 12)
        
        return score, {
            'matches_found': total_matches,
            'examples': matches[:5],
            'frequency_per_1000': round(normalized_frequency, 2),
            'description': 'Over-explanation patterns typical of Gemini'
        }
    
    def detect_claude_hedging(self, text: str) -> Tuple[float, Dict]:
        """Detect Claude-style uncertainty hedging patterns."""
        text_lower = text.lower()
        word_count = len(text.split())
        
        matches = []
        total_matches = 0
        
        for pattern in self.claude_hedging_patterns:
            found = re.findall(pattern, text_lower, re.IGNORECASE)
            if found:
                total_matches += len(found)
                matches.extend([str(f) for f in found[:3]])
        
        normalized_frequency = (total_matches / max(word_count, 1)) * 1000
        score = min(1.0, normalized_frequency / 10)
        
        return score, {
            'matches_found': total_matches,
            'examples': matches[:5],
            'frequency_per_1000': round(normalized_frequency, 2),
            'description': 'Uncertainty hedging typical of Claude'
        }
    
    def calculate_burstiness(self, text: str) -> Tuple[float, Dict]:
        """Calculate burstiness (sentence length variance)."""
        sentences = [s.strip() for s in re.split(r'[.!?]+', text) if s.strip()]
        
        if len(sentences) < 3:
            return 0.0, {'variance': 0, 'mean_length': 0, 'description': 'Insufficient sentences'}
        
        lengths = [len(s.split()) for s in sentences]
        mean_len = sum(lengths) / len(lengths)
        variance = sum((l - mean_len) ** 2 for l in lengths) / len(lengths)
        std_dev = math.sqrt(variance)
        cv = std_dev / max(mean_len, 1)
        ai_score = max(0, 1 - (cv / 0.6))
        
        return ai_score, {
            'variance': round(variance, 2),
            'std_deviation': round(std_dev, 2),
            'coefficient_of_variation': round(cv, 3),
            'mean_sentence_length': round(mean_len, 1),
            'sentence_count': len(sentences),
            'description': 'Low burstiness indicates uniform AI-generated patterns'
        }
    
    def detect_citation_hallucination(self, text: str) -> Tuple[float, Dict]:
        """Detect potentially hallucinated citations."""
        suspicious_matches = []
        
        for pattern in self.suspicious_citation_patterns:
            found = re.findall(pattern, text, re.IGNORECASE)
            suspicious_matches.extend(found)
        
        all_citations = re.findall(r'\([A-Z][a-z]+.*?\d{4}\)', text)
        all_citations += re.findall(r'\[\d+\]', text)
        
        total_citations = len(all_citations)
        suspicious_count = len(suspicious_matches)
        
        if total_citations == 0:
            return 0.5, {
                'suspicious_count': 0,
                'total_citations': 0,
                'examples': [],
                'description': 'No citations found - unusual for scholarly paper'
            }
        
        ratio = suspicious_count / max(total_citations, 1)
        score = min(1.0, ratio * 2)
        
        return score, {
            'suspicious_count': suspicious_count,
            'total_citations': total_citations,
            'suspicious_ratio': round(ratio, 3),
            'examples': suspicious_matches[:5],
            'description': 'Potentially fabricated or hallucinated citations'
        }
    
    def estimate_perplexity(self, text: str) -> Tuple[float, Dict]:
        """Estimate text perplexity using n-gram frequency analysis."""
        words = text.lower().split()
        
        if len(words) < 10:
            return 0.0, {'estimated_perplexity': 0, 'description': 'Insufficient text'}
        
        # Calculate word frequency entropy as proxy for perplexity
        word_freq = Counter(words)
        total = len(words)
        entropy = -sum((count/total) * math.log2(count/total) for count in word_freq.values())
        
        # Bigram repetition (low variety = AI-like)
        bigrams = [f"{words[i]} {words[i+1]}" for i in range(len(words)-1)]
        bigram_freq = Counter(bigrams)
        unique_bigram_ratio = len(bigram_freq) / max(len(bigrams), 1)
        
        # Low entropy + low bigram variety = AI-like
        max_entropy = math.log2(len(word_freq)) if len(word_freq) > 1 else 1
        normalized_entropy = entropy / max(max_entropy, 1)
        
        # Combined score (inverse - low perplexity = high AI score)
        perplexity_proxy = (normalized_entropy + unique_bigram_ratio) / 2
        ai_score = max(0, 1 - perplexity_proxy)
        
        return ai_score, {
            'word_entropy': round(entropy, 3),
            'normalized_entropy': round(normalized_entropy, 3),
            'unique_bigram_ratio': round(unique_bigram_ratio, 3),
            'vocabulary_size': len(word_freq),
            'estimated_perplexity': round(2 ** entropy, 2),
            'description': 'Lower perplexity suggests more predictable AI text'
        }
    
    def _calculate_composite_score(self, gpt, gemini, claude, burstiness, citation, perplexity):
        """Calculate weighted composite GenAI score."""
        weights = {
            'gpt': 0.2,
            'gemini': 0.15,
            'claude': 0.15,
            'burstiness': 0.2,
            'citation': 0.1,
            'perplexity': 0.2
        }
        
        composite = (
            gpt * weights['gpt'] +
            gemini * weights['gemini'] +
            claude * weights['claude'] +
            burstiness * weights['burstiness'] +
            citation * weights['citation'] +
            perplexity * weights['perplexity']
        )
        
        return min(1.0, composite)
    
    def _generate_interpretation(self, gpt, gemini, claude, burstiness, citation, perplexity):
        """Generate human-readable interpretation of scores."""
        interpretations = []
        
        if gpt > 0.5:
            interpretations.append("High frequency of GPT-style formulaic phrases detected")
        if gemini > 0.5:
            interpretations.append("Over-explanation patterns suggest Gemini-style generation")
        if claude > 0.5:
            interpretations.append("Significant hedging language indicates possible Claude generation")
        if burstiness > 0.5:
            interpretations.append("Uniform sentence structure suggests AI-generated content")
        if citation > 0.5:
            interpretations.append("Some citations appear potentially fabricated")
        if perplexity > 0.5:
            interpretations.append("Predictable text patterns indicate possible AI generation")
        
        if not interpretations:
            interpretations.append("Text shows natural human writing characteristics")
        
        return interpretations
    
    def _empty_features(self):
        """Return empty feature dict for invalid input."""
        return {
            'composite_score': 0.0,
            'features': {},
            'interpretation': ['No text provided for analysis']
        }


# Create singleton instance
extractor = GenAIFeatureExtractor()

def extract_genai_features(text: str) -> Dict[str, Any]:
    """Convenience function to extract GenAI features."""
    return extractor.extract_all_features(text)

print("‚úÖ GenAI Feature Extractor loaded!")

## 3. Explainer Chatbot Class

In [None]:
from datetime import datetime
from typing import Optional


class ExplainerChatbot:
    """
    An explainable AI chatbot for interpreting detection results.
    
    ETHICAL GUIDELINES:
    - Only explains detection results
    - Does not generate academic content
    - Encourages academic integrity
    """
    
    def __init__(self):
        """Initialize the chatbot with response templates."""
        
        self.ethical_disclaimer = (
            "I'm an assistant designed to explain AI detection results. "
            "I cannot help generate academic content or assist in bypassing detection."
        )
        
        # Intent patterns
        self.intent_patterns = {
            'explain_score': [
                r'(what|explain|tell me about).*(score|result|analysis)',
                r'(why|how).*(score|detected|flagged)',
                r'(mean|meaning|interpret).*(score|result|number)',
            ],
            'explain_feature': [
                r'(what is|explain|tell me about).*(perplexity|burstiness)',
                r'(what is|explain).*(gpt|gemini|claude).*(pattern|detection)',
                r'(what|explain).*(citation|hallucination)',
                r'(what|explain).*(repetition|hedging|overflow)',
            ],
            'improve_writing': [
                r'(how|can I|should I).*(improve|fix|change|rewrite)',
                r'(make|write).*(more human|less ai|better)',
                r'(tips|advice|suggestions).*(writing|improve)',
            ],
            'methodology': [
                r'(how|what).*(detect|work|algorithm|method)',
                r'(explain|tell me about).*(system|detection|process)',
            ],
            'decision': [
                r'(why|explain).*(accept|reject|review)',
                r'(what|mean).*(decision|recommendation)',
            ],
            'greeting': [r'^(hi|hello|hey|greetings)', r'(how are you)'],
            'thanks': [r'(thank|thanks|appreciate)'],
            'help': [r'(help|assist|support)', r'(what can you|can you help)'],
            'unethical_request': [
                r'(generate|write|create).*(paper|essay|content)',
                r'(bypass|avoid|trick|fool).*(detection|system)',
                r'(make|help).*(undetectable|pass)',
            ],
        }
        
        # Feature explanations
        self.feature_explanations = {
            'gpt_repetition': {
                'name': 'GPT-Style Repetition',
                'description': (
                    "GPT models often use formulaic academic phrases like 'In conclusion', "
                    "'It is important to note', or 'As mentioned earlier'. High repetition "
                    "of these patterns suggests AI generation."
                ),
            },
            'gemini_overflow': {
                'name': 'Gemini Explanatory Overflow',
                'description': (
                    "Gemini-style AI often over-explains concepts using phrases like "
                    "'Let me explain', 'In other words', or 'To elaborate further'."
                ),
            },
            'claude_hedging': {
                'name': 'Claude Uncertainty Hedging',
                'description': (
                    "Claude-style AI frequently uses hedging language like 'perhaps', "
                    "'possibly', 'it seems', or 'I think'."
                ),
            },
            'burstiness': {
                'name': 'Burstiness (Sentence Variation)',
                'description': (
                    "Burstiness measures variation in sentence length. Human writing typically "
                    "has high burstiness (varied sentence lengths), while AI text is more uniform."
                ),
            },
            'citation_hallucination': {
                'name': 'Citation Hallucination Detection',
                'description': (
                    "AI models sometimes generate fake citations with generic author names "
                    "or implausible publication dates."
                ),
            },
            'perplexity': {
                'name': 'Perplexity (Text Predictability)',
                'description': (
                    "Perplexity measures how predictable the text is. AI-generated text "
                    "typically has lower perplexity (more predictable word choices)."
                ),
            },
        }
        
        self.decision_explanations = {
            'Accept': "The paper appears predominantly human-written with minimal AI markers.",
            'Review Needed': "Mixed signals detected - requires human review.",
            'Reject': "Strong indicators of AI generation across multiple metrics.",
        }
        
        self.context = {'last_analysis': None, 'conversation_history': []}
    
    def set_analysis_context(self, analysis_result: Dict[str, Any]) -> None:
        """Set the current analysis result for context-aware responses."""
        self.context['last_analysis'] = analysis_result
    
    def get_response(self, user_message: str, analysis_result: Optional[Dict] = None) -> Dict[str, Any]:
        """Generate a response to the user's message."""
        if analysis_result:
            self.set_analysis_context(analysis_result)
        
        intent = self._detect_intent(user_message)
        response = self._generate_response(intent, user_message)
        
        self.context['conversation_history'].append({
            'timestamp': datetime.now().isoformat(),
            'user_message': user_message,
            'intent': intent,
            'response': response['message']
        })
        
        return response
    
    def _detect_intent(self, message: str) -> str:
        """Detect the user's intent from their message."""
        message_lower = message.lower().strip()
        
        for intent, patterns in self.intent_patterns.items():
            for pattern in patterns:
                if re.search(pattern, message_lower):
                    return intent
        
        return 'general_query'
    
    def _generate_response(self, intent: str, message: str) -> Dict[str, Any]:
        """Generate response based on detected intent."""
        
        handlers = {
            'unethical_request': self._respond_to_unethical_request,
            'greeting': self._respond_to_greeting,
            'thanks': self._respond_to_thanks,
            'help': self._respond_to_help,
            'explain_score': self._explain_overall_score,
            'explain_feature': lambda: self._explain_specific_feature(message),
            'improve_writing': self._provide_writing_tips,
            'methodology': self._explain_methodology,
            'decision': self._explain_decision,
        }
        
        handler = handlers.get(intent, self._respond_to_general_query)
        return handler() if callable(handler) else handler
    
    def _respond_to_unethical_request(self) -> Dict[str, Any]:
        return {
            'message': (
                "‚ùå I cannot assist with that request. My purpose is to explain AI detection "
                "results, not to help bypass detection or generate academic content.\n\n"
                "Academic integrity is important for developing your critical thinking skills "
                "and earning credentials that reflect your abilities."
            ),
            'type': 'ethical_warning',
            'intent': 'unethical_request'
        }
    
    def _respond_to_greeting(self) -> Dict[str, Any]:
        return {
            'message': (
                "üëã Hello! I'm your AI Detection Explainer Assistant. I can help you understand:\n\n"
                "‚Ä¢ Your paper's detection scores and what they mean\n"
                "‚Ä¢ Specific features like perplexity, burstiness, and pattern detection\n"
                "‚Ä¢ Why your paper received a particular decision\n"
                "‚Ä¢ How our detection methodology works\n\n"
                "What would you like to know?"
            ),
            'type': 'greeting',
            'intent': 'greeting'
        }
    
    def _respond_to_thanks(self) -> Dict[str, Any]:
        return {
            'message': "You're welcome! Feel free to ask if you have more questions.",
            'type': 'acknowledgment',
            'intent': 'thanks'
        }
    
    def _respond_to_help(self) -> Dict[str, Any]:
        return {
            'message': (
                "üÜò I can help you understand your AI detection analysis:\n\n"
                "**About Scores:**\n"
                "‚Ä¢ 'Explain my scores'\n"
                "‚Ä¢ 'Why was my paper flagged?'\n\n"
                "**About Features:**\n"
                "‚Ä¢ 'What is perplexity?'\n"
                "‚Ä¢ 'Explain burstiness'\n"
                "‚Ä¢ 'What is GPT-style repetition?'\n\n"
                "**About Improvement:**\n"
                "‚Ä¢ 'How can I improve my writing?'"
            ),
            'type': 'help',
            'intent': 'help'
        }
    
    def _explain_overall_score(self) -> Dict[str, Any]:
        analysis = self.context.get('last_analysis')
        
        if not analysis:
            return {
                'message': "I don't have an analysis result. Please analyze a text first!",
                'type': 'no_context',
                'intent': 'explain_score'
            }
        
        genai = analysis.get('genai_features', {})
        composite = genai.get('composite_score', 0)
        features = genai.get('features', {})
        interpretation = genai.get('interpretation', [])
        
        message = f"üìä **Analysis Summary**\n\n"
        message += f"**Composite AI Score:** {composite:.1%}\n\n"
        message += "**Feature Breakdown:**\n"
        
        for name, data in features.items():
            score = data.get('score', 0)
            level = "High" if score > 0.6 else "Moderate" if score > 0.3 else "Low"
            readable_name = name.replace('_', ' ').title()
            message += f"‚Ä¢ {readable_name}: {level} ({score:.1%})\n"
        
        if interpretation:
            message += "\n**Key Findings:**\n"
            for interp in interpretation[:3]:
                message += f"‚Ä¢ {interp}\n"
        
        return {
            'message': message,
            'type': 'score_explanation',
            'intent': 'explain_score'
        }
    
    def _explain_specific_feature(self, message: str) -> Dict[str, Any]:
        message_lower = message.lower()
        
        feature_key = None
        if 'perplexity' in message_lower:
            feature_key = 'perplexity'
        elif 'burstiness' in message_lower or 'burst' in message_lower:
            feature_key = 'burstiness'
        elif 'gpt' in message_lower or 'repetition' in message_lower:
            feature_key = 'gpt_repetition'
        elif 'gemini' in message_lower or 'overflow' in message_lower:
            feature_key = 'gemini_overflow'
        elif 'claude' in message_lower or 'hedging' in message_lower:
            feature_key = 'claude_hedging'
        elif 'citation' in message_lower or 'hallucination' in message_lower:
            feature_key = 'citation_hallucination'
        
        if feature_key and feature_key in self.feature_explanations:
            feature = self.feature_explanations[feature_key]
            return {
                'message': f"üìñ **{feature['name']}**\n\n{feature['description']}",
                'type': 'feature_explanation',
                'intent': 'explain_feature'
            }
        
        return {
            'message': (
                "I can explain these features:\n"
                "‚Ä¢ Perplexity\n‚Ä¢ Burstiness\n‚Ä¢ GPT Repetition\n"
                "‚Ä¢ Gemini Overflow\n‚Ä¢ Claude Hedging\n‚Ä¢ Citation Hallucination"
            ),
            'type': 'clarification',
            'intent': 'explain_feature'
        }
    
    def _provide_writing_tips(self) -> Dict[str, Any]:
        return {
            'message': (
                "‚úçÔ∏è **Tips for More Natural Writing:**\n\n"
                "1. **Vary sentence length** - Mix short, punchy sentences with longer ones\n"
                "2. **Use personal voice** - Develop your unique writing style\n"
                "3. **Avoid formulaic phrases** - Skip 'In conclusion', 'It is important to note'\n"
                "4. **Be direct** - Don't over-explain or hedge excessively\n"
                "5. **Verify citations** - Ensure all references are real and accurate\n"
                "6. **Read aloud** - Natural writing flows when spoken\n\n"
                "Remember: The goal is authentic expression, not detection avoidance!"
            ),
            'type': 'writing_tips',
            'intent': 'improve_writing'
        }
    
    def _explain_methodology(self) -> Dict[str, Any]:
        return {
            'message': (
                "üî¨ **How Detection Works:**\n\n"
                "Our system analyzes text using multiple methods:\n\n"
                "1. **Pattern Matching** - Detects phrases typical of GPT, Gemini, and Claude\n"
                "2. **Burstiness Analysis** - Measures sentence length variation\n"
                "3. **Perplexity Estimation** - Assesses text predictability\n"
                "4. **Citation Analysis** - Checks for potentially hallucinated references\n\n"
                "All features are weighted and combined into a composite score."
            ),
            'type': 'methodology',
            'intent': 'methodology'
        }
    
    def _explain_decision(self) -> Dict[str, Any]:
        analysis = self.context.get('last_analysis')
        
        if not analysis:
            return {
                'message': "Please analyze a text first to receive a decision.",
                'type': 'no_context',
                'intent': 'decision'
            }
        
        composite = analysis.get('genai_features', {}).get('composite_score', 0)
        
        if composite < 0.3:
            decision, explanation = 'Accept', self.decision_explanations['Accept']
        elif composite < 0.6:
            decision, explanation = 'Review Needed', self.decision_explanations['Review Needed']
        else:
            decision, explanation = 'Reject', self.decision_explanations['Reject']
        
        return {
            'message': f"üìã **Decision: {decision}**\n\n{explanation}\n\nComposite Score: {composite:.1%}",
            'type': 'decision_explanation',
            'intent': 'decision'
        }
    
    def _respond_to_general_query(self) -> Dict[str, Any]:
        return {
            'message': (
                "I can help you with:\n"
                "‚Ä¢ Explaining scores - 'Explain my results'\n"
                "‚Ä¢ Understanding features - 'What is perplexity?'\n"
                "‚Ä¢ Decision explanation - 'Why was my paper flagged?'\n"
                "‚Ä¢ Writing improvement - 'How can I improve my writing?'\n\n"
                "Could you rephrase your question?"
            ),
            'type': 'clarification',
            'intent': 'general_query'
        }
    
    def generate_automatic_explanation(self, analysis_result: Dict[str, Any]) -> str:
        """Generate an automatic explanation when analysis completes."""
        self.set_analysis_context(analysis_result)
        
        genai = analysis_result.get('genai_features', {})
        composite = genai.get('composite_score', 0)
        
        if composite < 0.3:
            intro = "‚úÖ Great news! Your text appears predominantly human-written."
        elif composite < 0.6:
            intro = "‚ö†Ô∏è Your text shows some AI-like patterns and may need review."
        else:
            intro = "üö® This text shows significant AI-generated characteristics."
        
        explanation = f"{intro}\n\n"
        explanation += f"**AI Score:** {composite:.1%}\n\n"
        
        if genai.get('interpretation'):
            explanation += "**Key Findings:**\n"
            for interp in genai.get('interpretation', [])[:3]:
                explanation += f"‚Ä¢ {interp}\n"
        
        explanation += "\nüí¨ Ask me anything about these results!"
        
        return explanation


# Singleton instance
_chatbot_instance = None

def get_chatbot() -> ExplainerChatbot:
    global _chatbot_instance
    if _chatbot_instance is None:
        _chatbot_instance = ExplainerChatbot()
    return _chatbot_instance

def chat(message: str, analysis_result: Optional[Dict] = None) -> Dict[str, Any]:
    """Convenience function for chatbot interaction."""
    return get_chatbot().get_response(message, analysis_result)

def generate_explanation(analysis_result: Dict[str, Any]) -> str:
    """Generate automatic explanation for analysis result."""
    return get_chatbot().generate_automatic_explanation(analysis_result)

print("‚úÖ Explainer Chatbot loaded!")

## 4. Interactive Demo Functions

In [None]:
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets


def analyze_text(text: str) -> Dict[str, Any]:
    """
    Analyze text and return combined features with chatbot explanation.
    """
    # Extract GenAI features
    genai_features = extract_genai_features(text)
    
    # Build analysis result
    analysis_result = {
        'genai_features': genai_features,
        'text_length': len(text),
        'word_count': len(text.split())
    }
    
    # Generate chatbot explanation
    explanation = generate_explanation(analysis_result)
    analysis_result['chatbot_explanation'] = explanation
    
    return analysis_result


def display_analysis_results(results: Dict[str, Any]):
    """Display analysis results in a formatted way."""
    genai = results.get('genai_features', {})
    
    print("=" * 60)
    print("üìä GENAI DETECTION ANALYSIS RESULTS")
    print("=" * 60)
    
    composite = genai.get('composite_score', 0)
    print(f"\nüéØ Composite AI Score: {composite:.1%}")
    
    # Decision
    if composite < 0.3:
        print("‚úÖ Decision: ACCEPT (Likely Human-Written)")
    elif composite < 0.6:
        print("‚ö†Ô∏è Decision: REVIEW NEEDED (Mixed Signals)")
    else:
        print("üö® Decision: REJECT (Likely AI-Generated)")
    
    print("\n" + "-" * 60)
    print("üìà FEATURE BREAKDOWN")
    print("-" * 60)
    
    features = genai.get('features', {})
    for name, data in features.items():
        score = data.get('score', 0)
        level = "üî¥ High" if score > 0.6 else "üü° Moderate" if score > 0.3 else "üü¢ Low"
        readable_name = name.replace('_', ' ').title()
        print(f"  {readable_name:25s}: {score:.1%} ({level})")
    
    print("\n" + "-" * 60)
    print("üí° KEY INTERPRETATIONS")
    print("-" * 60)
    for interp in genai.get('interpretation', []):
        print(f"  ‚Ä¢ {interp}")
    
    print("\n" + "=" * 60)
    print("ü§ñ CHATBOT EXPLANATION")
    print("=" * 60)
    print(results.get('chatbot_explanation', 'No explanation available.'))
    print("=" * 60)


print("‚úÖ Demo functions loaded!")

## 5. Test with Sample Texts

In [None]:
# Sample AI-Generated Text (GPT-style)
ai_sample = """
In conclusion, it is important to note that artificial intelligence has revolutionized 
the way we approach complex problems. As mentioned earlier, this demonstrates that 
machine learning plays a crucial role in modern data analysis. It can be argued that 
these advancements have significant implications for various industries.

The fact that deep learning models can process vast amounts of data shows that 
computational power has increased dramatically. In this context, it is worth noting 
that neural networks have become increasingly sophisticated. This paper aims to 
explore these developments in detail.

According to Smith et al. (2027), the future of AI looks promising. Research by 
Johnson et al. (2026) suggests that automation will continue to expand. The study 
by Williams et al. (2028) indicates significant growth in this sector.
"""

print("\n" + "#" * 60)
print("# ANALYZING AI-GENERATED SAMPLE TEXT")
print("#" * 60)

results = analyze_text(ai_sample)
display_analysis_results(results)

In [None]:
# Sample Human-Written Text (more natural variation)
human_sample = """
Machine learning has changed everything. Or at least, that's what the headlines say.

But what does it actually mean for researchers? The algorithms that power recommendation 
systems on Netflix are fundamentally different from those analyzing medical images. 
Some work brilliantly. Others fail spectacularly.

I've spent three years studying computer vision applications in radiology departments 
across five hospitals. The results surprised me. Despite all the hype, only 23% of 
radiologists reported daily AI tool usage. Cost was a factor, sure. But trust was bigger.

Dr. Sarah Chen at Mass General put it best: "These systems work great in the lab. 
Real patients are messier." She's not wrong.

The data from our study (n=847) shows clear patterns. Urban hospitals adopted faster. 
Rural facilities lagged by 18 months on average. Money wasn't always the problem.
"""

print("\n" + "#" * 60)
print("# ANALYZING HUMAN-WRITTEN SAMPLE TEXT")
print("#" * 60)

results = analyze_text(human_sample)
display_analysis_results(results)

## 6. Interactive Chatbot Demo

In [None]:
# Interactive Chatbot
print("=" * 60)
print("ü§ñ INTERACTIVE CHATBOT DEMO")
print("=" * 60)
print("Type your questions about AI detection!")
print("Example queries:")
print("  ‚Ä¢ 'Hello'")
print("  ‚Ä¢ 'Explain my scores'")
print("  ‚Ä¢ 'What is perplexity?'")
print("  ‚Ä¢ 'How does detection work?'")
print("  ‚Ä¢ 'Tips for better writing'")
print("=" * 60)

# Test various chatbot interactions
test_messages = [
    "Hello!",
    "What is perplexity?",
    "Explain my scores",
    "How can I improve my writing?"
]

for msg in test_messages:
    print(f"\nüë§ User: {msg}")
    response = chat(msg, results)  # Pass the last analysis for context
    print(f"\nü§ñ Bot: {response['message']}")
    print("-" * 40)

## 7. Analyze Your Own Text

In [None]:
# ‚úèÔ∏è PASTE YOUR TEXT HERE!
your_text = """
Paste your text here to analyze it for AI-generated content.
The system will check for patterns typical of GPT, Gemini, and Claude,
as well as analyze burstiness, perplexity, and citation quality.
"""

# Analyze
print("\n" + "#" * 60)
print("# ANALYZING YOUR TEXT")
print("#" * 60)

my_results = analyze_text(your_text)
display_analysis_results(my_results)

In [None]:
# Interactive chat about your results
print("\nüí¨ Ask questions about your analysis:")

# Ask the chatbot about your specific results
your_question = "Explain my scores"  # ‚úèÔ∏è Change this to your question!

response = chat(your_question, my_results)
print(f"\nüë§ You: {your_question}")
print(f"\nü§ñ Bot: {response['message']}")

## 8. Interactive Chat Loop (Run Multiple Times)

In [None]:
# Run this cell multiple times to have a conversation!
# Change the question each time.

question = "What is burstiness?"  # ‚úèÔ∏è Edit this question!

response = chat(question, my_results)
print(f"üë§ You: {question}")
print(f"\nü§ñ Bot: {response['message']}")
print(f"\n[Intent detected: {response['intent']}]")

## 9. Feature Detail Viewer

In [None]:
def show_feature_details(feature_name: str, results: Dict[str, Any]):
    """Show detailed information about a specific feature."""
    features = results.get('genai_features', {}).get('features', {})
    
    if feature_name not in features:
        print(f"Feature '{feature_name}' not found.")
        print(f"Available: {list(features.keys())}")
        return
    
    feature = features[feature_name]
    details = feature.get('details', {})
    
    print(f"\n{'='*50}")
    print(f"üìã {feature_name.replace('_', ' ').upper()} DETAILS")
    print(f"{'='*50}")
    print(f"\nScore: {feature['score']:.1%}")
    print(f"\nDetails:")
    for key, value in details.items():
        print(f"  ‚Ä¢ {key}: {value}")


# Show details for each feature
for feature_name in ['gpt_repetition', 'gemini_overflow', 'claude_hedging', 
                      'burstiness', 'perplexity', 'citation_hallucination']:
    show_feature_details(feature_name, my_results)

---

## üìö Documentation

### Features Detected:

| Feature | Description | AI-Like Score |
|---------|-------------|---------------|
| GPT Repetition | Formulaic phrases like "In conclusion", "It is important to note" | High = More AI |
| Gemini Overflow | Over-explanation patterns like "Let me explain", "In other words" | High = More AI |
| Claude Hedging | Uncertainty language like "perhaps", "possibly", "it seems" | High = More AI |
| Burstiness | Sentence length variation (humans vary more) | High = More AI |
| Perplexity | Text predictability (AI is more predictable) | High = More AI |
| Citation Hallucination | Potentially fabricated references | High = Suspicious |

### Decision Thresholds:

| Composite Score | Decision |
|-----------------|----------|
| < 30% | ‚úÖ Accept (Likely Human) |
| 30-60% | ‚ö†Ô∏è Review Needed |
| > 60% | üö® Reject (Likely AI) |

---

**Author:** AI-Generated Scholarly Paper Detection System  
**For Academic Use Only - IEEE/College-level Project**