# Reconnaissance & Footprinting - Hands-On Lab

**Part of HackLearn Pro - Module #11**

This interactive notebook demonstrates professional reconnaissance and footprinting techniques used in penetration testing.

## Learning Objectives

1. Master passive reconnaissance techniques (OSINT, WHOIS, DNS enumeration)
2. Perform active reconnaissance (port scanning, banner grabbing)
3. Automate information gathering with Python scripts
4. Analyze and consolidate reconnaissance data
5. Understand detection mechanisms and defensive strategies

## Legal Disclaimer

**CRITICAL:** Only perform reconnaissance on systems you own or have explicit written authorization to test. Unauthorized reconnaissance may violate:
- **US:** Computer Fraud and Abuse Act (CFAA) 18 U.S.C. § 1030
- **EU:** Network and Information Systems (NIS) Directive
- **UK:** Computer Misuse Act 1990

Use authorized targets:
- Your own infrastructure
- Practice platforms: HackTheBox, TryHackMe, Offensive Security Labs
- Explicitly authorized penetration testing engagements

## Setup

Install required packages for this lab:

In [None]:
# Install dependencies
!pip install -q requests dnspython python-whois ipwhois

In [None]:
import requests
import socket
import dns.resolver
import json
import re
from typing import List, Dict, Any, Optional
from datetime import datetime
import subprocess
from urllib.parse import urlparse

## Exercise 1: Passive DNS Reconnaissance

DNS enumeration reveals subdomains, mail servers, and infrastructure without directly contacting the target.

In [None]:
class DNSRecon:
    """
    Passive DNS reconnaissance tool.
    Gathers DNS records without alerting the target.
    """
    
    def __init__(self, domain: str):
        self.domain = domain
        self.resolver = dns.resolver.Resolver()
        self.results = {}
    
    def query_record(self, record_type: str) -> List[str]:
        """Query specific DNS record type."""
        try:
            answers = self.resolver.resolve(self.domain, record_type)
            return [str(rdata) for rdata in answers]
        except Exception as e:
            return [f"Error: {str(e)}"]
    
    def comprehensive_scan(self) -> Dict[str, List[str]]:
        """Gather all common DNS record types."""
        record_types = ['A', 'AAAA', 'MX', 'NS', 'TXT', 'SOA', 'CNAME']
        
        print(f"[*] Starting DNS reconnaissance for {self.domain}")
        print("=" * 60)
        
        for record_type in record_types:
            print(f"\n[+] Querying {record_type} records...")
            records = self.query_record(record_type)
            self.results[record_type] = records
            
            for record in records:
                print(f"    {record}")
        
        return self.results
    
    def analyze_results(self):
        """Analyze DNS results for security insights."""
        print("\n" + "=" * 60)
        print("[*] Analysis Summary")
        print("=" * 60)
        
        # Check for SPF records
        txt_records = self.results.get('TXT', [])
        spf_found = any('spf' in record.lower() for record in txt_records)
        print(f"\n[{'✓' if spf_found else '✗'}] SPF Record: {'Present' if spf_found else 'Missing (Email spoofing risk)'}")
        
        # Check for DMARC
        dmarc_found = any('dmarc' in record.lower() for record in txt_records)
        print(f"[{'✓' if dmarc_found else '✗'}] DMARC Record: {'Present' if dmarc_found else 'Missing (Email spoofing risk)'}")
        
        # Count mail servers
        mx_count = len([r for r in self.results.get('MX', []) if not r.startswith('Error')])
        print(f"\n[*] Mail Servers Found: {mx_count}")
        
        # Count name servers
        ns_count = len([r for r in self.results.get('NS', []) if not r.startswith('Error')])
        print(f"[*] Name Servers Found: {ns_count}")
        
        # IPv6 support
        ipv6_supported = len([r for r in self.results.get('AAAA', []) if not r.startswith('Error')]) > 0
        print(f"[*] IPv6 Support: {'Yes' if ipv6_supported else 'No'}")

# Example usage with a safe target (example.com is designed for testing)
recon = DNSRecon('example.com')
results = recon.comprehensive_scan()
recon.analyze_results()

## Exercise 2: Certificate Transparency Log Search

Certificate transparency logs reveal subdomains through SSL/TLS certificates. This is completely passive and undetectable.

In [None]:
class CertificateRecon:
    """
    Search certificate transparency logs for subdomains.
    Uses crt.sh public API (completely passive).
    """
    
    def __init__(self, domain: str):
        self.domain = domain
        self.api_url = "https://crt.sh/"
    
    def search_certificates(self) -> List[Dict[str, Any]]:
        """Query crt.sh for certificates issued to domain."""
        print(f"[*] Searching certificate transparency logs for {self.domain}")
        print("=" * 60)
        
        try:
            # Query crt.sh API
            params = {'q': f'%.{self.domain}', 'output': 'json'}
            response = requests.get(self.api_url, params=params, timeout=30)
            
            if response.status_code == 200:
                certs = response.json()
                print(f"[+] Found {len(certs)} certificates")
                return certs
            else:
                print(f"[✗] API request failed: {response.status_code}")
                return []
        except Exception as e:
            print(f"[✗] Error: {e}")
            return []
    
    def extract_subdomains(self, certs: List[Dict[str, Any]]) -> List[str]:
        """Extract unique subdomains from certificates."""
        subdomains = set()
        
        for cert in certs:
            name_value = cert.get('name_value', '')
            # Split on newlines (crt.sh returns multiple domains per cert)
            domains = name_value.split('\n')
            subdomains.update(domains)
        
        # Filter and sort
        subdomains = sorted([d for d in subdomains if d.endswith(self.domain)])
        return subdomains
    
    def display_results(self, subdomains: List[str]):
        """Display discovered subdomains with categorization."""
        print(f"\n[*] Discovered {len(subdomains)} unique subdomains:")
        print("=" * 60)
        
        # Categorize subdomains
        categories = {
            'Production': ['www', 'api', 'app', 'web'],
            'Development': ['dev', 'staging', 'test', 'qa', 'uat'],
            'Infrastructure': ['mail', 'smtp', 'vpn', 'ftp', 'ssh'],
            'Administration': ['admin', 'cpanel', 'portal', 'dashboard'],
            'Other': []
        }
        
        categorized = {cat: [] for cat in categories.keys()}
        
        for subdomain in subdomains:
            categorized_flag = False
            for category, keywords in categories.items():
                if category == 'Other':
                    continue
                if any(keyword in subdomain.lower() for keyword in keywords):
                    categorized[category].append(subdomain)
                    categorized_flag = True
                    break
            if not categorized_flag:
                categorized['Other'].append(subdomain)
        
        # Display by category
        for category, domains in categorized.items():
            if domains:
                print(f"\n[{category}] ({len(domains)} domains)")
                for domain in domains[:10]:  # Limit to 10 per category
                    print(f"  - {domain}")
                if len(domains) > 10:
                    print(f"  ... and {len(domains) - 10} more")
        
        # Security insights
        print("\n" + "=" * 60)
        print("[*] Security Insights")
        print("=" * 60)
        
        dev_count = len(categorized['Development'])
        if dev_count > 0:
            print(f"\n[!] {dev_count} development/staging subdomains exposed")
            print("    Risk: These often have weaker security controls")
        
        admin_count = len(categorized['Administration'])
        if admin_count > 0:
            print(f"\n[!] {admin_count} administrative subdomains found")
            print("    Risk: High-value targets for attackers")

# Example usage
cert_recon = CertificateRecon('example.com')
certificates = cert_recon.search_certificates()
if certificates:
    subdomains = cert_recon.extract_subdomains(certificates)
    cert_recon.display_results(subdomains)

## Exercise 3: Web Reconnaissance

Gather information from HTTP headers, robots.txt, and technology fingerprinting.

In [None]:
class WebRecon:
    """
    Web application reconnaissance tool.
    Analyzes HTTP headers, server information, and accessible files.
    """
    
    def __init__(self, url: str):
        self.url = url if url.startswith('http') else f'http://{url}'
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Educational Security Research)'
        }
    
    def fetch_headers(self) -> Dict[str, str]:
        """Fetch and analyze HTTP response headers."""
        print(f"[*] Fetching HTTP headers from {self.url}")
        print("=" * 60)
        
        try:
            response = requests.get(self.url, headers=self.headers, timeout=10, allow_redirects=True)
            
            print(f"\n[+] Status Code: {response.status_code}")
            print(f"[+] Final URL: {response.url}")
            print("\n[*] Response Headers:")
            
            for header, value in response.headers.items():
                print(f"  {header}: {value}")
            
            return dict(response.headers)
        except Exception as e:
            print(f"[✗] Error: {e}")
            return {}
    
    def identify_technologies(self, headers: Dict[str, str]):
        """Identify web technologies from headers."""
        print("\n" + "=" * 60)
        print("[*] Technology Fingerprinting")
        print("=" * 60)
        
        findings = []
        
        # Server identification
        if 'Server' in headers:
            server = headers['Server']
            print(f"\n[+] Web Server: {server}")
            findings.append(('Server', server))
        
        # Application framework
        if 'X-Powered-By' in headers:
            framework = headers['X-Powered-By']
            print(f"[+] Application Framework: {framework}")
            findings.append(('Framework', framework))
        
        # Security headers analysis
        security_headers = {
            'Strict-Transport-Security': 'HSTS',
            'X-Content-Type-Options': 'Content Type Protection',
            'X-Frame-Options': 'Clickjacking Protection',
            'Content-Security-Policy': 'CSP',
            'X-XSS-Protection': 'XSS Protection'
        }
        
        print("\n[*] Security Headers:")
        for header, description in security_headers.items():
            present = header in headers
            status = "✓ Present" if present else "✗ Missing"
            print(f"  [{status}] {description} ({header})")
        
        return findings
    
    def check_robots_txt(self):
        """Check robots.txt for interesting paths."""
        print("\n" + "=" * 60)
        print("[*] Checking robots.txt")
        print("=" * 60)
        
        parsed_url = urlparse(self.url)
        robots_url = f"{parsed_url.scheme}://{parsed_url.netloc}/robots.txt"
        
        try:
            response = requests.get(robots_url, headers=self.headers, timeout=10)
            
            if response.status_code == 200:
                print(f"\n[+] robots.txt found:")
                print(response.text[:500])  # First 500 characters
                
                # Extract disallowed paths
                disallowed = re.findall(r'Disallow:\s*(.+)', response.text)
                if disallowed:
                    print(f"\n[*] Found {len(disallowed)} disallowed paths (potential targets):")
                    for path in disallowed[:10]:
                        print(f"  - {path.strip()}")
            else:
                print(f"\n[✗] robots.txt not found (Status: {response.status_code})")
        except Exception as e:
            print(f"[✗] Error: {e}")

# Example usage
web_recon = WebRecon('https://example.com')
headers = web_recon.fetch_headers()
if headers:
    web_recon.identify_technologies(headers)
    web_recon.check_robots_txt()

## Exercise 4: Port Scanning Simulation

Simulate port scanning behavior (educational only - use on your own systems).

In [None]:
class PortScanner:
    """
    Simple TCP port scanner for educational purposes.
    ONLY use on systems you own or have authorization to test.
    """
    
    def __init__(self, target: str):
        self.target = target
        # Common ports to scan
        self.common_ports = {
            21: 'FTP',
            22: 'SSH',
            23: 'Telnet',
            25: 'SMTP',
            53: 'DNS',
            80: 'HTTP',
            110: 'POP3',
            143: 'IMAP',
            443: 'HTTPS',
            445: 'SMB',
            3306: 'MySQL',
            3389: 'RDP',
            5432: 'PostgreSQL',
            8080: 'HTTP-Proxy'
        }
    
    def scan_port(self, port: int, timeout: float = 1.0) -> bool:
        """Check if a single port is open."""
        try:
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            sock.settimeout(timeout)
            result = sock.connect_ex((self.target, port))
            sock.close()
            return result == 0
        except Exception:
            return False
    
    def scan_common_ports(self):
        """Scan commonly used ports."""
        print(f"[*] Scanning common ports on {self.target}")
        print("=" * 60)
        print("[!] WARNING: Only scan systems you own or have authorization to test")
        print("=" * 60)
        
        open_ports = []
        
        for port, service in self.common_ports.items():
            print(f"\r[*] Scanning port {port}/{len(self.common_ports)}...", end='', flush=True)
            if self.scan_port(port):
                open_ports.append((port, service))
        
        print("\r" + " " * 50)  # Clear progress line
        
        if open_ports:
            print(f"\n[+] Found {len(open_ports)} open ports:")
            print("=" * 60)
            for port, service in open_ports:
                print(f"  Port {port:5d}/tcp  {service:15s}  OPEN")
        else:
            print("\n[*] No open ports found in common port range")
        
        return open_ports
    
    def banner_grab(self, port: int) -> Optional[str]:
        """Attempt to grab service banner."""
        try:
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            sock.settimeout(2)
            sock.connect((self.target, port))
            
            # Send HTTP request for web ports
            if port in [80, 443, 8080]:
                sock.send(b'HEAD / HTTP/1.0\r\n\r\n')
            
            banner = sock.recv(1024).decode('utf-8', errors='ignore')
            sock.close()
            return banner.strip()
        except Exception:
            return None

# Example: Scan localhost (safe target)
print("[*] Educational port scanning demonstration")
print("[*] Target: localhost (127.0.0.1) - safe for testing\n")

scanner = PortScanner('127.0.0.1')
open_ports = scanner.scan_common_ports()

if open_ports:
    print("\n[*] Attempting banner grabbing on open ports...")
    print("=" * 60)
    for port, service in open_ports[:3]:  # Limit to first 3
        banner = scanner.banner_grab(port)
        if banner:
            print(f"\n[+] Port {port} ({service}):")
            print(f"  {banner[:200]}...")  # First 200 chars

## Exercise 5: Reconnaissance Data Consolidation

Combine multiple reconnaissance sources into a comprehensive target profile.

In [None]:
class ReconConsolidator:
    """
    Consolidate reconnaissance data from multiple sources.
    Generate actionable intelligence report.
    """
    
    def __init__(self, target: str):
        self.target = target
        self.data = {
            'dns_records': {},
            'subdomains': [],
            'web_technologies': [],
            'open_ports': [],
            'security_issues': []
        }
    
    def add_dns_data(self, dns_results: Dict[str, List[str]]):
        """Add DNS reconnaissance results."""
        self.data['dns_records'] = dns_results
    
    def add_subdomains(self, subdomains: List[str]):
        """Add discovered subdomains."""
        self.data['subdomains'] = subdomains
    
    def add_web_tech(self, technologies: List[tuple]):
        """Add identified web technologies."""
        self.data['web_technologies'] = technologies
    
    def add_ports(self, ports: List[tuple]):
        """Add discovered open ports."""
        self.data['open_ports'] = ports
    
    def analyze_security_posture(self):
        """Analyze collected data for security insights."""
        issues = []
        
        # Check for missing SPF/DMARC
        txt_records = self.data['dns_records'].get('TXT', [])
        if not any('spf' in r.lower() for r in txt_records):
            issues.append({
                'severity': 'Medium',
                'category': 'Email Security',
                'issue': 'Missing SPF record',
                'impact': 'Domain vulnerable to email spoofing'
            })
        
        if not any('dmarc' in r.lower() for r in txt_records):
            issues.append({
                'severity': 'Medium',
                'category': 'Email Security',
                'issue': 'Missing DMARC record',
                'impact': 'No email authentication policy enforcement'
            })
        
        # Check for development subdomains
        dev_keywords = ['dev', 'staging', 'test', 'qa', 'uat']
        dev_subdomains = [s for s in self.data['subdomains'] 
                          if any(keyword in s.lower() for keyword in dev_keywords)]
        if dev_subdomains:
            issues.append({
                'severity': 'High',
                'category': 'Attack Surface',
                'issue': f'{len(dev_subdomains)} development environments exposed',
                'impact': 'Development systems often have weaker security controls'
            })
        
        # Check for administrative subdomains
        admin_keywords = ['admin', 'cpanel', 'portal', 'dashboard']
        admin_subdomains = [s for s in self.data['subdomains'] 
                            if any(keyword in s.lower() for keyword in admin_keywords)]
        if admin_subdomains:
            issues.append({
                'severity': 'High',
                'category': 'Attack Surface',
                'issue': f'{len(admin_subdomains)} administrative interfaces exposed',
                'impact': 'High-value targets for credential attacks'
            })
        
        # Check for exposed databases
        db_ports = [3306, 5432, 27017, 6379, 1433]
        exposed_dbs = [p for p, s in self.data['open_ports'] if p in db_ports]
        if exposed_dbs:
            issues.append({
                'severity': 'Critical',
                'category': 'Network Security',
                'issue': f'Database ports exposed: {exposed_dbs}',
                'impact': 'Direct database access may be possible'
            })
        
        self.data['security_issues'] = issues
        return issues
    
    def generate_report(self):
        """Generate comprehensive reconnaissance report."""
        print("\n" + "=" * 70)
        print(f"RECONNAISSANCE REPORT: {self.target}")
        print("=" * 70)
        print(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        
        # Executive Summary
        print("\n" + "=" * 70)
        print("EXECUTIVE SUMMARY")
        print("=" * 70)
        print(f"Target Domain: {self.target}")
        print(f"Subdomains Discovered: {len(self.data['subdomains'])}")
        print(f"Open Ports Found: {len(self.data['open_ports'])}")
        print(f"Security Issues Identified: {len(self.data['security_issues'])}")
        
        # DNS Records
        print("\n" + "=" * 70)
        print("DNS RECORDS")
        print("=" * 70)
        for record_type, values in self.data['dns_records'].items():
            if values and not values[0].startswith('Error'):
                print(f"\n[{record_type}]")
                for value in values[:5]:  # Limit to 5
                    print(f"  - {value}")
        
        # Subdomains
        if self.data['subdomains']:
            print("\n" + "=" * 70)
            print("SUBDOMAINS")
            print("=" * 70)
            for subdomain in self.data['subdomains'][:20]:  # Limit to 20
                print(f"  - {subdomain}")
            if len(self.data['subdomains']) > 20:
                print(f"  ... and {len(self.data['subdomains']) - 20} more")
        
        # Open Ports
        if self.data['open_ports']:
            print("\n" + "=" * 70)
            print("OPEN PORTS")
            print("=" * 70)
            for port, service in self.data['open_ports']:
                print(f"  {port:5d}/tcp  {service:15s}  OPEN")
        
        # Security Issues
        if self.data['security_issues']:
            print("\n" + "=" * 70)
            print("SECURITY FINDINGS")
            print("=" * 70)
            
            severity_order = {'Critical': 0, 'High': 1, 'Medium': 2, 'Low': 3}
            sorted_issues = sorted(self.data['security_issues'], 
                                   key=lambda x: severity_order.get(x['severity'], 99))
            
            for issue in sorted_issues:
                print(f"\n[{issue['severity']}] {issue['category']}")
                print(f"  Issue: {issue['issue']}")
                print(f"  Impact: {issue['impact']}")
        
        print("\n" + "=" * 70)
        print("END OF REPORT")
        print("=" * 70)

# Example: Generate consolidated report
print("[*] Generating consolidated reconnaissance report...\n")

consolidator = ReconConsolidator('example.com')

# Simulate adding data from previous exercises
consolidator.add_dns_data({
    'A': ['93.184.216.34'],
    'MX': ['mail.example.com'],
    'NS': ['ns1.example.com', 'ns2.example.com']
})

consolidator.add_subdomains([
    'www.example.com',
    'api.example.com',
    'dev.example.com',
    'admin.example.com'
])

consolidator.add_ports([
    (80, 'HTTP'),
    (443, 'HTTPS')
])

# Analyze and generate report
consolidator.analyze_security_posture()
consolidator.generate_report()

## Key Takeaways

1. **Passive reconnaissance is undetectable** - DNS queries, certificate transparency, WHOIS lookups leave no traces
2. **Certificate transparency logs reveal subdomains** - 95% of organizations leak infrastructure through CT logs
3. **Active reconnaissance triggers alerts** - Port scanning and banner grabbing will be detected by IDS/IPS
4. **Consolidation reveals patterns** - Combining data sources provides comprehensive attack surface view
5. **Development environments are high-risk** - Exposed dev/staging systems often have weaker security
6. **Authorization is mandatory** - Only scan systems you own or have written permission to test

## Defense Strategies

1. **Attack Surface Management (ASM)** - Continuously discover and monitor internet-facing assets
2. **Certificate Transparency Monitoring** - Alert on new certificates for your domains
3. **IDS/IPS Deployment** - Detect and block active scanning attempts
4. **Honeypots & Canaries** - Deploy decoy systems to detect reconnaissance
5. **Minimize Public Information** - Limit data in WHOIS, social media, job postings
6. **SPF/DMARC/DKIM** - Prevent email spoofing through proper DNS configuration

## Real-World Impact

- **Target (2013)**: Third-party vendor reconnaissance → $292M breach
- **SolarWinds (2020)**: Multi-year reconnaissance → 18,000+ organizations compromised
- **OPM (2015)**: Network topology mapping → 21.5M security clearances stolen
- **Colonial Pipeline (2021)**: Exposed credentials → $4.4M ransom + $2.3B economic impact

## Next Steps

1. Practice on authorized platforms: HackTheBox, TryHackMe, Offensive Security Labs
2. Learn advanced tools: Nmap, Amass, Maltego, SpiderFoot
3. Study MITRE ATT&CK TA0043 (Reconnaissance tactics)
4. Implement attack surface monitoring for your organization
5. Deploy honeypots to detect reconnaissance attempts

## Resources

- MITRE ATT&CK TA0043: https://attack.mitre.org/tactics/TA0043/
- PTES: http://www.pentest-standard.org/
- NIST SP 800-115: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-115.pdf
- OWASP Testing Guide: https://owasp.org/www-project-web-security-testing-guide/
- Shodan: https://www.shodan.io/
- Censys: https://censys.io/
- crt.sh: https://crt.sh/