Transform raw server logs into AI-powered business insights, security intelligence, and SEO analysis.
Web Log Analyzer is a comprehensive Python tool that reads website server logs to generate easy-to-understand reports. It features AI-powered narratives, content classification, and sitemap analysis to provide deep insights into visitor behavior, security threats, and SEO health. It's perfect for small business owners who want professional-grade analytics without the complexity or cost of enterprise solutions.
β‘ Essential 5-Minute Setup:
-
Copy configuration file:
cp config_template.py config.py # Create your own config from the template -
Edit these MUST-CHANGE settings in
config.py:# Your website's developer/admin IPs (REQUIRED - replace with your IP) DEVELOPER_IPS = ['YOUR.IP.ADDRESS.HERE'] # Find your IP at whatismyip.com # Your website domain (Recommended for sitemap & SEO analysis) SITEMAP_DOMAIN = "yourdomain.com" # Enrichment API Keys (Optional but recommended for rich insights) IPINFO_TOKEN = "your_token_here" # Free at ipinfo.io ABUSEIPDB_KEY = "your_key_here" # Free at abuseipdb.com # AI Narrative Generation (Optional - requires API key) AI_PROVIDER = "gemini" # or "claude" GOOGLE_AI_API_KEY = "your_gemini_api_key" # ANTHROPIC_API_KEY = "your_claude_api_key"
-
Put your log files in the
input/directory -
Run analysis:
# Generate standard unified reports (recommended) python weblog_analyzer.py # For more options, like forcing a refresh of all data: python weblog_analyzer.py --force python weblog_analyzer.py --help
-
View your reports in
output/directory:output/website_analytics_report_business.md- For business insightsoutput/website_analytics_report_security.md- For security analysisoutput/index.html- A dashboard linking to all reports
π― Critical Settings to Change:
- DEVELOPER_IPS: Replace
'YOUR.IP.ADDRESS.HERE'with your actual IP address to exclude your own visits from analytics. - SITEMAP_DOMAIN: Add your domain for sitemap and SEO analysis.
- API Keys: Add free API keys for AI summaries, geolocation, and threat intelligence.
- IP Enrichment: The tool will automatically download a local IP database (
.mmdb) on first run toinput/IPinfo/to reduce API calls.
That's it! Everything else has sensible defaults.
- See how many real people (not bots) visit your site
- Discover which pages are most popular with actual customers
- Understand where your visitors come from geographically
- Track growth trends over time
- Get insights typically only available with expensive analytics tools
- Identify security threats and attack patterns before they become problems
- Analyze bot traffic and distinguish between helpful and harmful bots
- Get specific recommendations for improving site security
- Monitor site health and technical performance
- Understand visitor device preferences (mobile vs desktop)
- See which content drives the most engagement
- Understand visitor behavior patterns
- Identify your most loyal visitors (return customers)
- Track referral sources and marketing campaign effectiveness
Unlike basic log analyzers, this tool enriches your data with:
- Geographic insights - See exactly where your visitors are located
- Threat intelligence - Know which IPs are potentially dangerous
- Network analysis - Understand if visitors are on residential, business, or hosting networks
- Device breakdown - Mobile vs desktop usage patterns
Two separate reports tailored for different audiences:
- Business Report: Easy-to-read insights for owners and marketers
- Security Report: Technical details for developers and IT teams
Sophisticated bot classification that separates:
- Beneficial bots (Google, Bing search crawlers)
- Neutral bots (SEO tools, monitoring services)
- Suspicious bots (potential scrapers or attackers)
- Malicious tools (known attack frameworks)
- Compare this month to previous months automatically
- Identify growth patterns and seasonal trends
- Get warnings when unusual activity occurs
- Build a picture of your site's growth over time
| Feature | What It Does | Business Value |
|---|---|---|
| Visitor Separation | Filters out bots to show real human traffic | Know your actual customer count |
| Geographic Insights | Shows where visitors come from | Target marketing by location |
| Device Analysis | Mobile vs desktop breakdown | Optimize for your visitors' preferences |
| Popular Content | Which pages get the most real visitors | Focus on content that works |
| Security Monitoring | Detects attacks and threats automatically | Protect your business reputation |
| Return Visitor Tracking | Identifies loyal customers | Understand customer loyalty |
| Historical Trends | Compares current to past performance | Track business growth |
| API Quota Management | Uses free tiers efficiently | Get enterprise insights at no cost |
| Page-Level Trends | Tracks performance of individual pages over time | Optimize high-performing content |
- Python 3.8 or higher
- Access to your website's server logs
- 10 minutes of setup time
-
Download and setup
git clone https://github.com/focused-hunts/weblog-analyzer.git cd weblog-analyzer pip install -r requirements.txt -
Configure the tool (Critical Step)
cp config_template.py config.py # Edit config.py with your settings (see TL;DR section above) -
Add your log files
- Place your server log files in the
input/directory - Supports both
.logand.log.gz(compressed) files - Works with Apache and Nginx Combined Log Format by default
- Place your server log files in the
-
Run your first analysis
# Default unified report generation python weblog_analyzer.py # Explicitly run with switches # (Note: 'unified' is the default command and can be omitted) python weblog_analyzer.py unified --cache-only python weblog_analyzer.py unified --force
After running, check the output/ directory for:
website_analytics_report_business.md- Your business intelligence reportwebsite_analytics_report_security.md- Security analysis and threatsindex.html- Quick overview dashboardwebsite_analytics_report.json- Raw data for debugging or other tools
While the tool works without API keys, adding them provides much richer insights:
- What it adds: Visitor locations, ISP information, network details
- Free tier: 50,000 lookups/month
- Get key: ipinfo.io
- Configuration:
IPINFO_TOKENinconfig.py - Business value: Understand your market geography, optimize for local customers
- What it adds: Security threat scores, IP reputation data
- Free tier: 1,000 lookups/day
- Get key: abuseipdb.com
- Business value: Identify high-risk visitors, protect against fraud
- Configuration:
ABUSEIPDB_KEYinconfig.py
- What it adds: Identifies automated scanners vs real visitors
- Free tier: Community access
- Get key: greynoise.io
- Business value: Better bot detection, cleaner visitor metrics
- What it adds: Generates executive summaries and insights in plain English.
- Get key: Google AI Studio (for Gemini) or Anthropic (for Claude).
- Business value: Turns complex data into easy-to-understand reports.
- Configuration: Set
AI_PROVIDERto"gemini"or"claude"inconfig.pyand add the corresponding API key (GOOGLE_AI_API_KEYorANTHROPIC_API_KEY).
- What it adds: Validates location data for accuracy
- Free tier: 1,000 lookups/month
- Get key: ipgeolocation.io
- Configuration:
IPGEOLOCATION_API_KEYinconfig.py - Business value: More accurate geographic insights
π‘ Tip: Start with just IPinfo and AbuseIPDB - they provide 90% of the value!
Visitor Overview
π Real Visitors This Month: 1,247 people
π Return Customers: 23% (287 visitors came back)
π Top Countries: United States (45%), Canada (12%), UK (8%)
π± Mobile Users: 67% of your visitors prefer mobile
Content Performance
- Which pages get the most real visitor attention
- How long people spend on different sections
- Which content drives return visits
Growth Trends
- Month-over-month visitor growth
- Seasonal patterns in your traffic
- Comparison to your historical performance
Threat Level Assessment
π’ Security Status: NORMAL
π¨ Attack Attempts: 12 blocked (0.3% of traffic)
π€ Bot Traffic: 52% (mostly search engines)
β οΈ High-Risk IPs: 3 identified and flagged
Attack Analysis
- Types of attacks attempted (SQL injection, etc.)
- Geographic sources of threats
- Recommendations for additional protection
See config_template.py for a full list of options. The most critical ones to change are DEVELOPER_IPS and WEBSITE_DOMAIN.
See config_template.py for a full list of options. The most critical ones to change are DEVELOPER_IPS and SITEMAP_DOMAIN.
To enable AI-powered narratives and SEO analysis, set the following in config.py:
# AI provider for generating report narratives ('gemini' or 'claude')
AI_PROVIDER = "gemini"
# Corresponding API Keys for the selected provider
GOOGLE_AI_API_KEY = "your_gemini_key_here"
ANTHROPIC_API_KEY = "your_claude_key_here"# What reports to generate
GENERATE_BUSINESS_REPORT = True # Always recommended
GENERATE_SECURITY_REPORT = True # Recommended for all sites
GENERATE_HTML = True # Nice overview page
# Historical analysis (how many months back to compare)
UNIFIED_HISTORICAL_MONTHS = 6 # 6 months gives good context for trends
# How long to cache API responses (saves money!)
CACHE_TTL_DAYS = 90 # 90 days is recommended"Analytics revealed our 'how-to' posts had 3x higher return visitor rates than news posts. We shifted our content strategy and built a more loyal audience."
"We found IP addresses from a known hosting provider making 500+ requests per day to admin pages. We blocked the range and reduced server load by 15%."
- Desktop vs Mobile vs Tablet breakdown
- Browser version tracking
- Operating system distribution
- User agent anomaly detection (fake browsers, bots)
- Clearer Reporting: Accounts for traffic from unknown device types.
- ISP and hosting provider analysis
- VPN/Proxy detection
- Residential vs business network identification
- Risk scoring for different network types
- Beneficial: Google, Bing, Facebook crawlers
- Neutral: SEO tools, monitoring services
- Research: Academic institutions, security research
- Suspicious: Scrapers, unknown automation
- Malicious: Known attack tools and frameworks
- Automatically compares current month to previous months
- Identifies growth trends and seasonal patterns
- Alerts for unusual activity spikes
- Builds long-term performance picture
- Tracks monthly views for your most important pages.
- Identifies which content is gaining or losing popularity.
- Helps you focus content strategy on what works.
Automatically finds all log files in your input directory:
access.log,ssl_log,access-Nov-2024.log.gz- Groups files by month based on timestamps or filenames
- Handles compressed (
.gz) files automatically
- Extracts visitor IPs, timestamps, pages visited, devices used
- Detects attack patterns automatically
- Classifies traffic as human visitors vs bots
- Identifies suspicious activity and security threats
- Looks up IP locations and threat scores using APIs
- Validates data across multiple sources for accuracy
- Caches results to minimize API usage and costs
- Respects free-tier limits automatically
- Separates real customers from bots and crawlers
- Tracks visitor loyalty and return rates
- Analyzes content performance and popular pages
- Generates trend comparisons with historical data
weblog-analyzer/
βββ weblog_analyzer.py # Main application
βββ config_template.py # Copy this to config.py
βββ config.py # Your settings (don't commit to git!)
βββ modules/
β βββ log_parser.py # Reads and parses log files
β βββ analyzer.py # Core data analysis
β βββ enrichment.py # Adds geographic and threat data
β βββ reporter.py # Creates business and security reports
β βββ trend_manager.py # Handles historical comparisons
β βββ log_registry.py # Tracks processed files
β βββ content_classifier.py # Classifies content (business vs. technical)
β βββ sitemap_analyzer.py # Analyzes sitemap coverage
β βββ seo_analyzer.py # Provides SEO health insights
β βββ ai_narrator.py # Generates AI-powered summaries
β βββ logger_setup.py # Configures logging
βββ input/ # π Put your log files here
βββ output/ # π Generated reports appear here
βββ cache/ # π Caches API data and AI narratives
βββ README.md # This file
- Sensitive data filtering: Your admin/developer IP addresses are automatically excluded from visitor analytics
- Threat identification: Potential attackers are flagged but their attempts are already blocked by your web server
- Privacy compliance: The tool analyzes visitor patterns but doesn't store personally identifiable information
- API key protection: Never commit your
config.pyfile to version control - Local processing: All analysis happens on your server - no data sent to third parties except for IP lookup APIs
- Cache encryption: Sensitive lookup data is cached locally to minimize external API calls
Your security report will provide specific recommendations like:
- "Consider enabling Cloudflare for additional protection"
- "IP range X.X.X.X/24 shows attack patterns - consider blocking"
- "Your mobile visitors are growing - ensure mobile security features are enabled"
- Small sites (1,000-5,000 visitors/month): 2-5 minutes
- Medium sites (10,000-50,000 visitors/month): 10-20 minutes
- Larger sites (100,000+ visitors/month): 30-60 minutes
- Smart caching: 70-90% of repeat IP lookups use cached data (free)
- Quota monitoring: Automatic warnings before hitting free-tier limits
- Selective validation: Only validates high-value IPs to conserve quota
- Progressive enhancement: Works without API keys, gets better with them
- No API keys: $0 (basic analytics)
- Free tiers only: $0 (rich analytics for most small businesses)
- Growing business: $5-15/month (if you exceed free tiers)
- Track real customer visits vs bot traffic
- Identify your most loyal customers (return visitors)
- Understand geographic distribution for shipping planning
- Monitor for payment page attacks or card testing
- See which articles drive the most engagement
- Understand your audience's device preferences
- Track return readers vs one-time visitors
- Identify content that builds audience loyalty
- Geographic analysis for local market understanding
- Mobile usage patterns (important for local search)
- Contact page and service page performance tracking
- Local competition insights through referrer analysis
- Business vs residential visitor identification
- Content performance for different decision-makers
- International market analysis
- Lead quality assessment through visit patterns
- Portfolio page performance analysis
- Client geographic distribution
- Mobile vs desktop portfolio viewing preferences
- Contact form and inquiry pattern analysis
Solution:
- Ensure log files are in the
input/directory - Check that files have
.logor.log.gzextensions - Verify log format is Apache Combined Log Format
Solution:
- Check that
DEVELOPER_IPSincludes your actual IP address - Verify your site actually has human visitors (not just bots)
- Look at the raw data in the JSON export to understand the traffic
Solution:
- This is normal! The tool will use cached data for repeat IPs
- Wait 24 hours for daily quotas to reset
- Consider upgrading to paid tiers if you need real-time data
- The reports will still be valuable with partial enrichment
This is normal!
- 50-80% bot traffic is typical for most websites
- The reports separate bot and human traffic clearly
- Focus on the "human visitors" metrics for business insights
This tool is built for small business owners. Contributions and feature requests are welcome!
- Check existing issues on GitHub before opening a new one.
- For bugs, please provide a log sample, your configuration (with keys removed), and the error message.
- We are considering features like WordPress integration, email scheduling, and more. Feel free to contribute or make a request!
License: MIT License - free for personal and commercial use
Built With:
- IPinfo for geographic data
- AbuseIPDB for threat intelligence
- GreyNoise for scanner identification
- Shodan InternetDB for infrastructure analysis
- BGPView for network information
After running your first analysis:
- Review both reports - Business for growth insights, Security for peace of mind
- Set up API keys if you haven't already - the insights get much richer
- Run monthly to build historical context and trend analysis
- Act on insights - optimize for mobile, improve popular content, address security issues
- Track improvements - use month-over-month comparisons to measure success
Built for small business owners who want enterprise-grade insights without the enterprise complexity or cost.
Ready to understand your website's true performance? Download, configure, and run your first analysis in under 10 minutes.