# Shopify Subdomain Takeover Scanner

**Total time:** ~6 hours for 10,000 domains

**What this does:**
- Scans domains for Shopify CNAME records
- Detects HTTP 403/404 status (potential takeover indicators)
- Real-time progress display
- Exports results to CSV

**Instructions:** Run cells 1-12 in order

## Cell 1: Clone Project from GitHub

Clones the project repository to Kaggle workspace.

In [None]:
%%bash
cd /kaggle/working

echo "=========================================="
echo "Cloning Project from GitHub"
echo "=========================================="

# Remove old clone if exists
rm -rf subdomain-playground 2>/dev/null || true

# Clone fresh copy
git clone https://github.com/sayihhamza/subdomain-playground.git

cd subdomain-playground

echo ""
echo "✓ Project cloned successfully!"
echo ""
echo "Project structure:"
ls -lh | head -20

## Cell 2: Verify Project Files

In [None]:
%%bash
cd /kaggle/working/subdomain-playground

echo "=========================================="
echo "Verifying Project Files"
echo "=========================================="

echo ""
echo "Essential files:"
ls -lh scan.py requirements.txt 2>/dev/null

echo ""
echo "Directories:"
ls -d */ 2>/dev/null

echo ""
echo "CSV files:"
if [ -d "data/domain_sources/myleadfox" ]; then
    CSV_COUNT=$(ls data/domain_sources/myleadfox/*.csv 2>/dev/null | wc -l)
    echo "✓ Found $CSV_COUNT CSV files in data/domain_sources/myleadfox/"
    ls -lh data/domain_sources/myleadfox/*.csv 2>/dev/null | head -5
else
    echo "✗ CSV directory not found"
fi

echo ""
echo "✓ Project structure verified!"

## Cell 3: Install Go 1.22

Kaggle has Go 1.18, but we need Go 1.22 to compile the security tools.

In [None]:
%%bash
set -e

echo "=========================================="
echo "Installing Go 1.22"
echo "=========================================="

# Check current Go version
echo "Current Go version:"
go version 2>/dev/null || echo "Go not found"

echo ""
echo "Installing Go 1.22.3..."

# Remove old Go installations
sudo rm -rf /usr/lib/go* 2>/dev/null || true
sudo rm -rf /usr/local/go 2>/dev/null || true

# Download Go 1.22.3
echo "Downloading Go 1.22.3 for Linux AMD64..."
wget -q https://go.dev/dl/go1.22.3.linux-amd64.tar.gz -O /tmp/go.tar.gz

# Install Go
echo "Installing to /usr/local/go..."
sudo tar -C /usr/local -xzf /tmp/go.tar.gz

# Cleanup
rm /tmp/go.tar.gz

echo ""
echo "✓ Go 1.22.3 installed successfully!"
echo ""
echo "New Go version:"
/usr/local/go/bin/go version

## Cell 4: Build Security Tools from Source

Compiles httpx, dnsx, and subzy from source. This takes 2-3 minutes.

These tools are required for:
- **httpx**: HTTP probing and status checking
- **dnsx**: DNS resolution and CNAME chain tracking
- **subzy**: Subdomain takeover detection

In [None]:
%%bash
export PATH=/usr/local/go/bin:$PATH
cd /kaggle/working/subdomain-playground

echo "=========================================="
echo "Building Security Tools"
echo "=========================================="
echo "This takes 2-3 minutes..."
echo ""

# Create bin directory
mkdir -p bin

echo "[1/3] Building httpx..."
GOBIN=$(pwd)/bin /usr/local/go/bin/go install -v github.com/projectdiscovery/httpx/cmd/httpx@latest

echo ""
echo "[2/3] Building dnsx..."
GOBIN=$(pwd)/bin /usr/local/go/bin/go install -v github.com/projectdiscovery/dnsx/cmd/dnsx@latest

echo ""
echo "[3/3] Building subzy..."
GOBIN=$(pwd)/bin /usr/local/go/bin/go install -v github.com/LukaSikic/subzy@latest

echo ""
echo "=========================================="
echo "Verification"
echo "=========================================="

echo ""
echo "Tool versions:"
./bin/httpx -version 2>&1 | head -1
./bin/dnsx -version 2>&1 | head -1
echo "subzy: $(./bin/subzy --help 2>&1 | head -1 || echo 'installed')"

echo ""
echo "Binary details:"
file bin/httpx
file bin/dnsx
file bin/subzy

echo ""
echo "Tool sizes:"
ls -lh bin/ | grep -E "(httpx|dnsx|subzy)"

echo ""
echo "✓ All tools built successfully!"

## Cell 5: Configure Environment

Sets up environment variables for tool paths.

In [None]:
%%bash
cd /kaggle/working/subdomain-playground

echo "Creating .env file..."

cat > .env << 'EOF'
DNSX_PATH=/kaggle/working/subdomain-playground/bin/dnsx
HTTPX_PATH=/kaggle/working/subdomain-playground/bin/httpx
SUBZY_PATH=/kaggle/working/subdomain-playground/bin/subzy
EOF

echo "✓ Environment configured"
echo ""
echo "Contents:"
cat .env

echo ""
echo "Verifying tool paths:"
for tool in dnsx httpx subzy; do
    if [ -f "bin/$tool" ]; then
        echo "✓ bin/$tool exists"
    else
        echo "✗ bin/$tool NOT FOUND"
    fi
done

## Cell 6: Extract Domains from CSV Files

Extracts unique domains from CSV files in `data/domain_sources/myleadfox/`

In [None]:
%%bash
cd /kaggle/working/subdomain-playground

echo "=========================================="
echo "Extracting Domains from CSV Files"
echo "=========================================="

if [ -d "data/domain_sources/myleadfox" ]; then
    CSV_COUNT=$(ls data/domain_sources/myleadfox/*.csv 2>/dev/null | wc -l)
    echo "Found $CSV_COUNT CSV files"
    echo ""

    # Extract unique domains from all CSV files
    echo "Extracting domains..."
    cat data/domain_sources/myleadfox/*.csv | \
      tail -n +2 | \
      cut -d',' -f1 | \
      sed 's/"//g' | \
      grep -E '^[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' | \
      sort -u > data/all_sources.txt

    DOMAIN_COUNT=$(wc -l < data/all_sources.txt | tr -d ' ')
    echo "✓ Extracted $DOMAIN_COUNT unique domains"
    echo ""
    echo "Saved to: data/all_sources.txt"
    echo ""
    echo "First 10 domains:"
    head -10 data/all_sources.txt
else
    echo "✗ CSV directory not found: data/domain_sources/myleadfox/"
    echo "Please add your CSV files to this directory"
    exit 1
fi

## Cell 7: Quick Test (5 domains)

**⚠️ IMPORTANT: Watch for real-time output!**

You should see:
- Domains streaming with DNS/HTTP info
- Live progress updates
- CNAME chains and provider detection

If Cell 7 works correctly, you can proceed to Cell 8 for the full scan.

In [None]:
%%bash
cd /kaggle/working/subdomain-playground

echo "=========================================="
echo "Quick Test - 5 Domains"
echo "=========================================="

# Create test file with first 5 domains
head -5 data/all_sources.txt > data/test_5.txt

echo "Testing with:"
cat data/test_5.txt
echo ""
echo "=========================================="
echo "STARTING TEST SCAN"
echo "=========================================="
echo ""

# Run quick test
python scan.py -l data/test_5.txt --shopify-takeover-only --workers 2 --mode quick

echo ""
echo "=========================================="
echo "✓ Test Complete"
echo "=========================================="
echo ""
echo "If you saw real-time output above with DNS/HTTP info, proceed to Cell 8!"
echo "If you saw '0 results in 17 seconds', something is wrong - check errors above."

## Cell 8: FULL SCAN - ALL DOMAINS

⚠️ **WARNING: This takes 5-6 hours for ~10,000 domains!**

**What you'll see:**
- Real-time progress streaming
- Live DNS/HTTP information
- Progress updates every 10 domains
- ETA (estimated time to completion)

**Example output:**
```
[10/10247] 0.1% complete | Rate: 0.52 domains/sec | ETA: 5h 23m 15s
```

Kaggle sessions timeout after 12 hours, so you have plenty of time.

**Note:** You can safely let this run. Kaggle will keep the session alive.

In [None]:
%%bash
cd /kaggle/working/subdomain-playground

echo "=========================================="
echo "STARTING FULL SCAN"
echo "=========================================="
echo ""
echo "Total domains: $(wc -l < data/all_sources.txt)"
echo "Workers: 4"
echo "Mode: quick (passive enumeration only)"
echo "Filter: Shopify takeover candidates only"
echo ""
echo "Estimated time: 5-6 hours"
echo ""
echo "=========================================="
echo ""

# Run full scan with Shopify filter
python scan.py -l data/all_sources.txt \
    --shopify-takeover-only \
    --workers 4 \
    --mode quick

echo ""
echo "=========================================="
echo "✅ SCAN COMPLETE!"
echo "=========================================="
echo ""
echo "Results saved to:"
echo "  - data/scans/shopify_takeover_candidates.json"
echo ""
ls -lh data/scans/*.json 2>/dev/null || echo "No results file found"

## Cell 9: View Results Summary

Displays scan results with risk level breakdown and top findings.

In [None]:
import json
import os

os.chdir('/kaggle/working/subdomain-playground')

results_file = 'data/scans/shopify_takeover_candidates.json'

if os.path.exists(results_file):
    with open(results_file, 'r') as f:
        results = json.load(f)

    print("=" * 80)
    print("SHOPIFY TAKEOVER SCAN RESULTS")
    print("=" * 80)
    print(f"\nTotal candidates found: {len(results)}")

    # Count by risk level
    risk_counts = {}
    for r in results:
        risk = r.get('risk_level', 'unknown')
        risk_counts[risk] = risk_counts.get(risk, 0) + 1

    print("\nBreakdown by risk level:")
    for risk, count in sorted(risk_counts.items()):
        print(f"  {risk.upper()}: {count}")

    print("\n" + "=" * 80)
    print("TOP 10 FINDINGS (by confidence score)")
    print("=" * 80)

    sorted_results = sorted(results, key=lambda x: x.get('confidence_score', 0), reverse=True)

    for i, r in enumerate(sorted_results[:10], 1):
        print(f"\n{i}. {r['subdomain']}")
        print(f"   CNAME: {r.get('cname', 'N/A')}")
        print(f"   HTTP Status: {r.get('http_status', 'N/A')}")
        print(f"   Risk Level: {r.get('risk_level', 'N/A')}")
        print(f"   Confidence Score: {r.get('confidence_score', 0)}")
        if r.get('cname_chain'):
            print(f"   CNAME Chain: {' → '.join(r['cname_chain'][:3])}")
else:
    print(f"✗ Results file not found: {results_file}")
    print("\nMake sure Cell 8 completed successfully.")

## Cell 10: Export to CSV

Exports results to `shopify_results.csv` for easy analysis.

In [None]:
import json
import pandas as pd
import os

os.chdir('/kaggle/working/subdomain-playground')

with open('data/scans/shopify_takeover_candidates.json', 'r') as f:
    results = json.load(f)

df = pd.DataFrame(results)

# Select key columns
columns = [
    'subdomain', 'cname', 'http_status', 'risk_level', 'confidence_score',
    'cname_chain_count', 'final_cname_target', 'a_records', 'provider'
]
df_export = df[[col for col in columns if col in df.columns]]
df_export = df_export.sort_values('confidence_score', ascending=False)

# Save to CSV
df_export.to_csv('shopify_results.csv', index=False)

print(f"✓ Exported {len(df_export)} results to shopify_results.csv")
print("\nPreview (top 10):")
display(df_export.head(10))

print("\nColumn descriptions:")
print("  - subdomain: Domain scanned")
print("  - cname: CNAME record pointing to Shopify")
print("  - http_status: HTTP response code (403/404 = potential takeover)")
print("  - risk_level: low, medium, high, or critical")
print("  - confidence_score: 0-100 (higher = more confident)")

## Cell 11: Filter High-Risk Only

Creates a separate CSV with only critical and high-risk findings.

In [None]:
import pandas as pd
import os

os.chdir('/kaggle/working/subdomain-playground')

df = pd.read_csv('shopify_results.csv')
df_high = df[df['risk_level'].isin(['critical', 'high'])]

print(f"High-risk findings: {len(df_high)} out of {len(df)} total")
print("")

if len(df_high) > 0:
    df_high.to_csv('shopify_high_risk.csv', index=False)
    print("✓ Saved to shopify_high_risk.csv")
    print("\nHigh-risk results:")
    display(df_high)
    
    print("\n⚠️ PRIORITY ACTIONS:")
    print("  1. Verify these findings manually")
    print("  2. Check if you own these domains")
    print("  3. Claim Shopify stores if authorized")
    print("  4. Report findings to domain owners")
else:
    print("✓ No high-risk findings detected.")
    print("\nThis is good news! Either:")
    print("  - No critical vulnerabilities found")
    print("  - All findings are low/medium risk")

## Cell 12: Download Results

Provides download links for all result files.

In [None]:
from IPython.display import FileLink, display
import os

os.chdir('/kaggle/working/subdomain-playground')

print("Download your results:")
print("=" * 80)
print("")

files = [
    ('shopify_results.csv', 'All Shopify takeover candidates (CSV)'),
    ('shopify_high_risk.csv', 'High-risk findings only (CSV)'),
    ('data/scans/shopify_takeover_candidates.json', 'Full results with metadata (JSON)')
]

for file_path, description in files:
    if os.path.exists(file_path):
        file_size = os.path.getsize(file_path)
        size_kb = file_size / 1024
        print(f"✓ {description}")
        print(f"  Size: {size_kb:.1f} KB")
        display(FileLink(file_path))
        print("")
    else:
        print(f"- {description} (not found)")
        print("")

print("=" * 80)
print("\n✅ SCAN COMPLETE!")
print("\nNext steps:")
print("  1. Download the CSV files above")
print("  2. Review high-risk findings first")
print("  3. Manually verify critical findings")
print("  4. Take appropriate action on confirmed vulnerabilities")
print("\n⚠️ Legal reminder: Only act on domains you own or have authorization to test.")