# 🍺🥸 Moe vs Barney: Modern Web Scraping Anti-Bot Evasion

> *"Writing custom headers is so 2015. If you want your scraper to blend in today, you don't need to build the disguise from scratch. You need a full profile."*


This notebook demonstrates the dramatic difference between basic and advanced web scraping techniques in 2024.


In [6]:
pip install curl_cffi requests fake-useragent rich


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [9]:
# Install required packages (uncomment if needed)
# !pip install curl_cffi requests fake-useragent rich

# Import all necessary libraries
import requests
from curl_cffi import requests as curl_requests
import time
import json
from fake_useragent import UserAgent
from IPython.display import display, HTML, Markdown
import warnings
warnings.filterwarnings('ignore')

print("🔧 Libraries imported successfully!")
print("Ready to demonstrate the difference between basic and advanced scraping...")


🔧 Libraries imported successfully!
Ready to demonstrate the difference between basic and advanced scraping...


## 🍺 Round 1: Basic Barney (The Old Way)

Let's see how Barney approaches web scraping with basic static headers - the way most people did it in 2015...


In [11]:
class BarneyBasicScraper:
    """Basic scraper that Moe easily detects and kicks out"""
    
    def __init__(self):
        # Old school: just slap on some headers and hope for the best
        ua = UserAgent()
        self.headers = {
            'User-Agent': ua.random,
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate',
            'Connection': 'keep-alive',
            'Upgrade-Insecure-Requests': '1',
        }
        self.session = requests.Session()
        self.session.headers.update(self.headers)
        
        print("🍺 Basic Barney initialized with static headers")
        print(f"User-Agent: {self.headers['User-Agent'][:50]}...")
    
    def test_fingerprint_detection(self):
        """Test what servers can detect about our scraper"""
        print("\n🔍 Testing fingerprint detection...")
        
        try:
            # Test TLS fingerprint
            response = self.session.get("https://tls.browserleaks.com/json", timeout=10)
            
            if response.status_code == 200:
                data = response.json()
                print(f"✅ Request successful!")
                print(f"📊 Server detected:")
                print(f"   JA3 Hash: {data.get('ja3_hash', 'Unknown')}")
                print(f"   User Agent: {data.get('user_agent', 'Unknown')[:60]}...")
                
                return data
            else:
                print(f"❌ Request failed: {response.status_code}")
                return None
                
        except Exception as e:
            print(f"❌ Error: {str(e)}")
            return None
    
    def test_headers(self):
        """See what headers the server actually receives"""
        print("\n🔍 Testing header detection...")
        
        try:
            response = self.session.get("https://httpbin.org/headers", timeout=10)
            data = response.json()
            
            print("📋 Headers as seen by server:")
            for key, value in data.get('headers', {}).items():
                if key != 'X-Amzn-Trace-Id':  # Skip AWS trace ID
                    print(f"   {key}: {value}")
                    
            return data
            
        except Exception as e:
            print(f"❌ Error: {str(e)}")
            return None

# Create and test basic scraper
print("=" * 60)
print("🍺 BASIC BARNEY DEMONSTRATION")
print("=" * 60)

basic_scraper = BarneyBasicScraper()
basic_fp_result = basic_scraper.test_fingerprint_detection()
basic_header_result = basic_scraper.test_headers()


🍺 BASIC BARNEY DEMONSTRATION
🍺 Basic Barney initialized with static headers
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Ap...

🔍 Testing fingerprint detection...
✅ Request successful!
📊 Server detected:
   JA3 Hash: 7291ea5e449f2c7b17582541703e549d
   User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/...

🔍 Testing header detection...
📋 Headers as seen by server:
   Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
   Accept-Encoding: gzip, deflate
   Accept-Language: en-US,en;q=0.5
   Host: httpbin.org
   Upgrade-Insecure-Requests: 1
   User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.3.1 Safari/605.1.15


### 🚫 Why Basic Barney Gets Caught

Look at the results above! Here's what gives Barney away:

1. **TLS Fingerprint Mismatch**: The JA3 hash doesn't match what a real browser would send
2. **Static Headers**: Headers are hardcoded and don't change like real browsers
3. **Missing Browser Signals**: No modern browser security headers like `Sec-Ch-Ua`
4. **Predictable Patterns**: Always sends the same headers in the same order

Modern anti-bot systems can spot this from a mile away! 🕵️‍♂️


## 🥸 Round 2: Disguised Barney (The Modern Way)

Now watch Barney come back with a full disguise using `curl_cffi` - complete browser fingerprint impersonation!


In [14]:
class BarneyDisguisedScraper:
    """Advanced scraper with full browser fingerprint impersonation"""
    
    def __init__(self, browser_version="chrome120"):
        self.browser_version = browser_version
        self.session = curl_requests.Session()
        
        print(f"🥸 Disguised Barney initialized")
        print(f"🎭 Impersonating: {browser_version}")
        print("✨ curl_cffi automatically handles:")
        print("   • TLS fingerprint matching")
        print("   • HTTP/2 settings")
        print("   • Header order and casing")
        print("   • Cipher suites")
    
    def test_fingerprint_detection(self):
        """Test fingerprint detection with proper disguise"""
        print("\n🔍 Testing disguised fingerprint...")
        
        try:
            # The magic: impersonate parameter
            response = self.session.get(
                "https://tls.browserleaks.com/json", 
                impersonate=self.browser_version,
                timeout=10
            )
            
            if response.status_code == 200:
                data = response.json()
                print(f"✅ Request successful with {self.browser_version} impersonation!")
                print(f"📊 Server detected:")
                print(f"   JA3 Hash: {data.get('ja3_hash', 'Unknown')}")
                print(f"   User Agent: {data.get('user_agent', 'Unknown')[:60]}...")
                print(f"   TLS Version: {data.get('tls_version', 'Unknown')}")
                
                print(f"\n✨ Why this works:")
                print(f"   • TLS fingerprint matches real {self.browser_version}")
                print(f"   • All browser signals are consistent")
                print(f"   • No suspicious patterns detected")
                
                return data
            else:
                print(f"❌ Request failed: {response.status_code}")
                return None
                
        except Exception as e:
            print(f"❌ Error: {str(e)}")
            return None
    
    def test_headers(self):
        """See what headers the server receives with impersonation"""
        print("\n🔍 Testing disguised headers...")
        
        try:
            response = self.session.get(
                "https://httpbin.org/headers",
                impersonate=self.browser_version,
                timeout=10
            )
            data = response.json()
            
            print("📋 Headers as seen by server:")
            headers = data.get('headers', {})
            
            # Show important browser headers
            important_headers = [
                'User-Agent', 'Accept', 'Accept-Encoding', 
                'Accept-Language', 'Sec-Ch-Ua', 'Sec-Ch-Ua-Mobile',
                'Sec-Ch-Ua-Platform', 'Sec-Fetch-Dest', 'Sec-Fetch-Mode'
            ]
            
            for header in important_headers:
                if header in headers:
                    print(f"   {header}: {headers[header]}")
            
            print(f"\n🎭 Notice the realistic browser headers!")
            print(f"   • Sec-Ch-Ua headers indicate real browser")
            print(f"   • Accept-Encoding includes 'br' (Brotli)")
            print(f"   • Proper header ordering")
                    
            return data
            
        except Exception as e:
            print(f"❌ Error: {str(e)}")
            return None
    
    def test_browser_switching(self):
        """Demonstrate switching between different browser profiles"""
        print("\n🎭 Browser Profile Switching Demo")
        
        browsers = ["chrome120", "chrome119", "chrome116"]
        
        for browser in browsers:
            try:
                print(f"\n🔄 Testing as {browser}...")
                
                response = self.session.get(
                    "https://httpbin.org/user-agent",
                    impersonate=browser,
                    timeout=5
                )
                
                if response.status_code == 200:
                    data = response.json()
                    user_agent = data.get('user-agent', 'Unknown')
                    print(f"   ✅ Success: {user_agent[:70]}...")
                else:
                    print(f"   ❌ Failed with status {response.status_code}")
                    
            except Exception as e:
                print(f"   ⚠️ {browser} not supported in this curl_cffi version")
            
            time.sleep(0.5)

# Create and test disguised scraper
print("=" * 60)
print("🥸 DISGUISED BARNEY DEMONSTRATION")
print("=" * 60)

disguised_scraper = BarneyDisguisedScraper("chrome120")
disguised_fp_result = disguised_scraper.test_fingerprint_detection()
disguised_header_result = disguised_scraper.test_headers()
disguised_scraper.test_browser_switching()


🥸 DISGUISED BARNEY DEMONSTRATION
🥸 Disguised Barney initialized
🎭 Impersonating: chrome120
✨ curl_cffi automatically handles:
   • TLS fingerprint matching
   • HTTP/2 settings
   • Header order and casing
   • Cipher suites

🔍 Testing disguised fingerprint...
✅ Request successful with chrome120 impersonation!
📊 Server detected:
   JA3 Hash: bc297e7e8c6e9fbf60765007239cfedf
   User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/...
   TLS Version: Unknown

✨ Why this works:
   • TLS fingerprint matches real chrome120
   • All browser signals are consistent
   • No suspicious patterns detected

🔍 Testing disguised headers...
📋 Headers as seen by server:
   User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
   Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
   Accept-Encoding: gzip, deflate, br
 

## 📊 Side-by-Side Comparison

Let's compare what we learned from both approaches:


In [15]:
# Create comparison table
print("=" * 80)
print("📊 DETAILED COMPARISON: BASIC vs DISGUISED")
print("=" * 80)

print("\n🍺 BASIC BARNEY (Old School)")
print("-" * 40)
if basic_fp_result:
    print(f"JA3 Hash: {basic_fp_result.get('ja3_hash', 'Unknown')}")
    print(f"User Agent: {basic_fp_result.get('user_agent', 'Unknown')[:50]}...")
else:
    print("❌ Fingerprint test failed")

if basic_header_result:
    headers = basic_header_result.get('headers', {})
    print(f"Headers sent: {len(headers)} headers")
    print(f"Has Sec-Ch-Ua: {'Sec-Ch-Ua' in headers}")
    print(f"Has modern compression: {'br' in headers.get('Accept-Encoding', '')}")
else:
    print("❌ Header test failed")

print(f"Detection Risk: 🚫 HIGH - Easily spotted by anti-bot systems")

print("\n🥸 DISGUISED BARNEY (Modern)")
print("-" * 40)
if disguised_fp_result:
    print(f"JA3 Hash: {disguised_fp_result.get('ja3_hash', 'Unknown')}")
    print(f"User Agent: {disguised_fp_result.get('user_agent', 'Unknown')[:50]}...")
else:
    print("❌ Fingerprint test failed")

if disguised_header_result:
    headers = disguised_header_result.get('headers', {})
    print(f"Headers sent: {len(headers)} headers")
    print(f"Has Sec-Ch-Ua: {'Sec-Ch-Ua' in headers}")
    print(f"Has modern compression: {'br' in headers.get('Accept-Encoding', '')}")
else:
    print("❌ Header test failed")

print(f"Detection Risk: ✅ LOW - Indistinguishable from real browser")

print("\n" + "=" * 80)
print("🎯 THE VERDICT")
print("=" * 80)
print("🍺 Basic Barney: Gets kicked out by Moe (anti-bot systems)")
print("🥸 Disguised Barney: Successfully fools Moe!")
print("\n💡 The key: Modern anti-bot evasion requires complete browser impersonation,")
print("   not just custom headers. Tools like curl_cffi make this possible.")


📊 DETAILED COMPARISON: BASIC vs DISGUISED

🍺 BASIC BARNEY (Old School)
----------------------------------------
JA3 Hash: 7291ea5e449f2c7b17582541703e549d
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Ap...
Headers sent: 7 headers
Has Sec-Ch-Ua: False
Has modern compression: False
Detection Risk: 🚫 HIGH - Easily spotted by anti-bot systems

🥸 DISGUISED BARNEY (Modern)
----------------------------------------
JA3 Hash: bc297e7e8c6e9fbf60765007239cfedf
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Ap...
Headers sent: 14 headers
Has Sec-Ch-Ua: True
Has modern compression: True
Detection Risk: ✅ LOW - Indistinguishable from real browser

🎯 THE VERDICT
🍺 Basic Barney: Gets kicked out by Moe (anti-bot systems)
🥸 Disguised Barney: Successfully fools Moe!

💡 The key: Modern anti-bot evasion requires complete browser impersonation,
   not just custom headers. Tools like curl_cffi make this possible.


## 🔬 Technical Deep Dive

### What makes curl_cffi so effective?

**TLS Fingerprinting**: curl_cffi mimics the exact TLS handshake of real browsers, including:
- Cipher suite preferences  
- TLS extensions
- Certificate verification patterns
- Connection reuse behavior

**HTTP/2 Settings**: Real browsers use HTTP/2 with specific settings:
- Window sizes
- Priority frames  
- Header compression (HPACK)
- Push promise handling

**Header Consistency**: Not just the content, but the **order** and **casing** of headers matter:
- Real browsers send headers in predictable orders
- Header values include browser-specific details
- Modern security headers (Sec-Ch-*) are automatically included

### Why static headers fail in 2024

1. **Fingerprint Mismatch**: Your claimed User-Agent doesn't match your TLS fingerprint
2. **Missing Signals**: No modern browser security headers
3. **Wrong Protocol**: Using HTTP/1.1 when browsers prefer HTTP/2  
4. **Suspicious Patterns**: Headers in alphabetical order (not browser-like)


## 🛠️ Practical Examples

Here are some real-world examples you can use:


In [18]:
print("🔧 PRACTICAL CURL_CFFI EXAMPLES")
print("=" * 50)

# Example 1: Basic GET with impersonation
print("\n📌 Example 1: Basic GET Request")
print("Old way:")
print("  requests.get('https://api.example.com')")
print("New way:")
print("  curl_requests.get('https://api.example.com', impersonate='chrome120')")

# Example 2: POST with JSON data  
print("\n📌 Example 2: POST with JSON")
print("curl_requests.post(")
print("    'https://api.example.com/data',")
print("    json={'key': 'value'},") 
print("    impersonate='chrome120'")
print(")")

# Example 3: Custom headers + impersonation
print("\n📌 Example 3: Custom Headers + Impersonation")
print("curl_requests.get(")
print("    'https://api.example.com',")
print("    headers={'Authorization': 'Bearer token'},")
print("    impersonate='chrome120'")
print(")")

# Example 4: Session with cookies
print("\n📌 Example 4: Session Management")
print("session = curl_requests.Session()")
print("response = session.get(url, impersonate='chrome120')")
print("# Cookies automatically maintained!")

# Quick live demo
print("\n🚀 LIVE DEMO: Quick Test")
try:
    response = curl_requests.get(
        'https://httpbin.org/user-agent',
        impersonate='chrome120',
        timeout=5
    )
    
    if response.status_code == 200:
        data = response.json()
        print(f"✅ Success! Server sees: {data.get('user-agent', 'Unknown')[:60]}...")
    else:
        print(f"❌ Failed: {response.status_code}")
        
except Exception as e:
    print(f"❌ Error: {e}")

print("\n💡 Remember: Just add impersonate='chrome120' to any request!")


🔧 PRACTICAL CURL_CFFI EXAMPLES

📌 Example 1: Basic GET Request
Old way:
  requests.get('https://api.example.com')
New way:
  curl_requests.get('https://api.example.com', impersonate='chrome120')

📌 Example 2: POST with JSON
curl_requests.post(
    'https://api.example.com/data',
    json={'key': 'value'},
    impersonate='chrome120'
)

📌 Example 3: Custom Headers + Impersonation
curl_requests.get(
    'https://api.example.com',
    headers={'Authorization': 'Bearer token'},
    impersonate='chrome120'
)

📌 Example 4: Session Management
session = curl_requests.Session()
response = session.get(url, impersonate='chrome120')
# Cookies automatically maintained!

🚀 LIVE DEMO: Quick Test
✅ Success! Server sees: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/...

💡 Remember: Just add impersonate='chrome120' to any request!


## 🎉 Conclusion

### The Key Takeaway

**"Writing custom headers is so 2015."**

Modern anti-bot systems are sophisticated. They don't just look at headers - they analyze:
- TLS fingerprints
- HTTP/2 behavior  
- Header consistency
- Request patterns
- Browser-specific signals

### The Solution

Tools like `curl_cffi` solve this by providing **complete browser impersonation**, not just header spoofing.

### What's Next?

- Try the examples above with your own projects
- Experiment with different browser profiles  
- Remember: the goal isn't to outsmart Moe, but to become someone Moe doesn't recognize

---

*"The key isn't to outsmart Moe – it's to become someone Moe doesn't recognize."*

**🔗 Want to learn more?** Check out the other files in this repository for more detailed examples and explanations!
