Make your Dash applications AI-friendly with automatic documentation, bot management, and SEO optimization.
dash-improve-my-llms is a comprehensive plugin that automatically generates five types of AI-friendly documentation and SEO resources for your Dash application:
llms.txt- Comprehensive, context-rich markdown optimized for LLM understandingpage.json- Detailed technical architecture with interactivity and data flowarchitecture.txt- ASCII art representation of entire application
robots.txt- Intelligent bot control with AI training bot blockingsitemap.xml- SEO-optimized sitemap with intelligent priority inference- Static HTML - Bot-friendly pages with structured data
mark_hidden()- Hide sensitive pages from AI bots and search engines- Bot Detection - Differentiate between AI training, AI search, and traditional bots
- Configurable Policies - Fine-grained control over which bots can access what
pip install dash-improve-my-llmsfrom dash import Dash
from dash_improve_my_llms import add_llms_routes
app = Dash(__name__, use_pages=True)
add_llms_routes(app) # That's it! π
if __name__ == '__main__':
app.run(debug=True)Now visit:
http://localhost:8050/llms.txt- LLM-friendly page contexthttp://localhost:8050/page.json- Technical architecturehttp://localhost:8050/architecture.txt- App overviewhttp://localhost:8050/robots.txt- Bot access control NEW!http://localhost:8050/sitemap.xml- SEO sitemap NEW!
- Three comprehensive formats (llms.txt, page.json, architecture.txt)
- Smart context extraction - Understands your app structure
- Callback tracking - Documents all data flows
- Component categorization - Automatic classification by purpose
- Navigation mapping - Tracks all internal/external links
- AI Training Bot Blocking - Block GPTBot, Claude-Web, CCBot, etc.
- AI Search Allowance - Allow ChatGPT-User, ClaudeBot, PerplexityBot
- Traditional Search Engines - Full support for Google, Bing, etc.
- Configurable Policies - Fine-grained control via
RobotsConfig - Bot Detection - Accurately identify bot types from user agents
- Hide Sensitive Pages -
mark_hidden()excludes pages from AI/bots - Component Hiding - Hide specific components from extraction
- Automatic Exclusion - Hidden pages removed from sitemaps/robots.txt
- 404 for Hidden Routes - Bots get 404 on hidden page docs
- Smart Sitemap Generation - Automatic priority inference
- Priority System - Homepage=1.0, Dashboards=0.9, Reports=0.8, Docs=0.7
- Change Frequency - Intelligent frequency detection (daily, weekly, monthly)
- Static HTML for Bots - Schema.org structured data, Open Graph tags
- Noscript Fallbacks - Content for non-JS crawlers
- 88 comprehensive tests - 100% pass rate
- 98-100% coverage - All new modules fully tested
- Integration tests - Real-world scenario coverage
- Fast execution - 0.22s for entire test suite
from dash import Dash, html, dcc
from dash_improve_my_llms import (
add_llms_routes,
mark_important,
mark_hidden,
register_page_metadata,
RobotsConfig
)
# Create app
app = Dash(__name__, use_pages=True)
# Configure bot policies
robots_config = RobotsConfig(
block_ai_training=True, # Block GPTBot, CCBot, etc.
allow_ai_search=True, # Allow ClaudeBot, ChatGPT-User
allow_traditional=True, # Allow Googlebot, Bingbot
crawl_delay=10, # 10 second delay between requests
disallowed_paths=["/admin", "/api/*"] # Block specific paths
)
# Set base URL for SEO
app._base_url = "https://myapp.com"
app._robots_config = robots_config
# Add LLMS routes with all features
add_llms_routes(app)
# Hide admin pages from AI bots
mark_hidden("/admin")
mark_hidden("/settings")
# Add custom metadata for better SEO
register_page_metadata(
path="/",
name="Equipment Management System",
description="Comprehensive equipment tracking and analytics platform"
)
if __name__ == '__main__':
app.run(debug=True)# pages/equipment.py
from dash import html, Input, Output, callback
from dash_improve_my_llms import mark_important, register_page_metadata
import dash_mantine_components as dmc
register_page_metadata(
path="/equipment",
name="Equipment Catalog",
description="Browse and filter the complete equipment catalog"
)
def layout():
return html.Div([
html.H1("Equipment Catalog"),
# Mark filters as important for AI understanding
mark_important(
html.Div([
html.H2("Filters"),
dmc.TextInput(
id="equipment-search",
placeholder="Search equipment...",
),
dmc.Select(
id="equipment-category",
data=[
{"value": "all", "label": "All Categories"},
{"value": "tools", "label": "Tools"},
{"value": "machinery", "label": "Machinery"},
],
value="all"
),
], id="filters")
),
html.Div(id="equipment-list"),
])
@callback(
Output("equipment-list", "children"),
Input("equipment-search", "value"),
Input("equipment-category", "value"),
)
def update_list(search, category):
# Your filtering logic here
return html.Div("Equipment items...")Hidden Admin Page
# pages/admin.py
from dash import html, register_page
from dash_improve_my_llms import mark_hidden
register_page(__name__, path="/admin", name="Admin Panel")
# This page won't appear in sitemaps or llms.txt
mark_hidden("/admin")
def layout():
return html.Div([
html.H1("Admin Panel"),
html.P("Sensitive administrative controls")
])from dash_improve_my_llms import RobotsConfig
# Default configuration (recommended)
config = RobotsConfig(
block_ai_training=True, # Block AI training bots
allow_ai_search=True, # Allow AI search bots
allow_traditional=True, # Allow traditional search engines
crawl_delay=None, # No delay
custom_rules=[], # No custom rules
disallowed_paths=[] # No additional blocks
)
# Strict configuration (block everything except Google)
strict_config = RobotsConfig(
block_ai_training=True,
allow_ai_search=False,
allow_traditional=True,
crawl_delay=30,
disallowed_paths=["/admin", "/api", "/internal/*"]
)
# Open configuration (allow everything)
open_config = RobotsConfig(
block_ai_training=False,
allow_ai_search=True,
allow_traditional=True
)
# Apply to app
app._robots_config = configThe plugin automatically detects and handles different bot types:
| Bot Type | Examples | Default Policy |
|---|---|---|
| AI Training | GPTBot, Claude-Web, CCBot, Google-Extended, anthropic-ai | β Blocked |
| AI Search | ChatGPT-User, ClaudeBot, PerplexityBot | β Allowed |
| Traditional | Googlebot, Bingbot, Yahoo, DuckDuckBot | β Allowed |
from dash_improve_my_llms.bot_detection import (
is_ai_training_bot,
is_ai_search_bot,
is_traditional_bot,
get_bot_type
)
user_agent = "Mozilla/5.0 (compatible; GPTBot/1.0)"
if is_ai_training_bot(user_agent):
print("AI training bot detected - blocking")
bot_type = get_bot_type(user_agent) # Returns: "training", "search", "traditional", or "unknown"The plugin automatically generates an SEO-optimized sitemap with intelligent priority inference:
# Automatic priority based on page type:
# - Homepage (/) β Priority 1.0
# - Dashboards β Priority 0.9
# - Reports/Analytics β Priority 0.8
# - Documentation/Help β Priority 0.7
# - Other pages β Priority 0.5
# Change frequency inference:
# - Dashboards/Live β daily
# - Reports/Analytics β weekly
# - Documentation β monthly
# - Static pages β yearlyExample sitemap entry:
<url>
<loc>https://myapp.com/</loc>
<lastmod>2025-11-04</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>Problem: AI crawlers cannot execute JavaScript, so they see empty <div id="react-entry-point"> placeholders instead of your actual content.
Solution: The middleware automatically detects bots and serves them llms.txt content wrapped in readable HTML.
# What bots receive:
β
Search Bots (ClaudeBot, ChatGPT-User) β llms.txt content in HTML
β
Traditional Bots (Googlebot, Bingbot) β llms.txt content in HTML
β Training Bots (GPTBot, anthropic-ai) β 403 Forbidden
β
Regular Users (Chrome, Firefox) β Full Dash React appBefore Middleware (β Bad):
<!-- Bots saw this - empty until JavaScript executes -->
<div id="react-entry-point">
<div class="_dash-loading">Loading...</div>
</div>After Middleware (β Good):
<!-- Bots now see this - readable content immediately -->
<div class="bot-notice">
π€ Bot-Optimized Content
Also available: llms.txt | page.json | architecture.txt
</div>
<pre>
# Equipment Catalog
> Browse and filter the complete equipment catalog
## Key Content
- Equipment search and filtering
- Category selection
...
</pre>Features:
- Automatic Detection: Identifies bot type from user agent
- Smart Serving: llms.txt content for bots, React app for users
- SEO Optimized: Includes Schema.org, Open Graph, meta tags
- Privacy Enforced: Training bots get 403 when blocked
- No JavaScript Required: Bots see content immediately
The HTML served to bots includes:
- Schema.org JSON-LD - Structured data for search engines
- Open Graph tags - Social media previews
- Meta tags - Description, robots, viewport
- Navigation links - Accessible site structure
- Bot notice banner - Links to documentation formats
- llms.txt content - Full page context in
<pre>tag
from dash_improve_my_llms import mark_hidden, is_hidden
# Hide sensitive pages
mark_hidden("/admin")
mark_hidden("/settings")
mark_hidden("/internal/metrics")
# Check if page is hidden
if is_hidden("/admin"):
print("Admin page is hidden from bots")
# Hidden pages are automatically:
# - Excluded from sitemap.xml
# - Blocked in robots.txt
# - Return 404 for /page-path/llms.txt
# - Return 404 for /page-path/page.jsonfrom dash_improve_my_llms import mark_component_hidden, is_component_hidden
from dash import html
# Hide sensitive components from extraction
api_key_display = html.Div([
html.P("API Key: sk-..."),
html.P("Secret: abc123"),
], id="api-keys")
mark_component_hidden(api_key_display)
# Check if component is hidden
if is_component_hidden("api-keys"):
print("Component excluded from llms.txt")# Equipment Catalog
> Browse and filter the complete equipment catalog
## Application Context
This page is part of a multi-page Dash application with 3 total pages.
## Page Purpose
- **Data Input**: Contains form elements
- **Interactive**: Responds to user interactions
## Interactive Elements
**User Inputs:**
- TextInput (ID: equipment-search)
- Select (ID: equipment-category)
## Data Flow & Callbacks
**Callback 1:**
- Updates: equipment-list.children
- Triggered by: equipment-search.value, equipment-category.value{
"path": "/equipment",
"components": {
"ids": {
"equipment-search": {
"type": "TextInput",
"module": "dash_mantine_components"
}
},
"categories": {
"inputs": ["equipment-search", "equipment-category"],
"interactive": ["equipment-search", "equipment-category"]
}
},
"callbacks": {
"list": [
{
"output": "equipment-list.children",
"inputs": ["equipment-search.value"]
}
]
}
}# Robots.txt for Dash Application
# Block AI training bots, allow search bots
User-agent: GPTBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: ClaudeBot
Allow: /
User-agent: *
Allow: /
Crawl-delay: 10
Disallow: /admin
Disallow: /api/*
Sitemap: https://myapp.com/sitemap.xml
The package has comprehensive test coverage:
# Run all 88 tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=dash_improve_my_llms --cov-report=term-missing
# Test results:
# β
Bot Detection: 14/14 tests (100% coverage)
# β
HTML Generator: 20/20 tests (100% coverage)
# β
Robots Generator: 16/16 tests (100% coverage)
# β
Sitemap Generator: 33/33 tests (98% coverage)
# β
Integration: 15/15 tests (Complete workflows)
# β
Total: 88/88 tests passing in 0.22sSee TEST_REPORT.md for detailed test documentation.
Add all LLMS routes to your Dash app (llms.txt, page.json, architecture.txt, robots.txt, sitemap.xml).
from dash_improve_my_llms import add_llms_routes, LLMSConfig
config = LLMSConfig(
enabled=True,
max_depth=20,
include_css=True,
include_callbacks=True
)
add_llms_routes(app, config)Mark a component as important for LLM context. All children inherit importance.
important_section = mark_important(
html.Div([...], id="key-metrics")
)mark_hidden(page_path)
Hide a page from AI bots, search engines, and sitemaps.
mark_hidden("/admin")
mark_hidden("/settings")Register custom metadata for better SEO and documentation.
register_page_metadata(
path="/analytics",
name="Analytics Dashboard",
description="Real-time business analytics",
category="reporting"
)Configuration for robots.txt generation.
Parameters:
block_ai_training(bool): Block AI training bots (default: True)allow_ai_search(bool): Allow AI search bots (default: True)allow_traditional(bool): Allow traditional search engines (default: True)crawl_delay(int, optional): Delay between requests in secondscustom_rules(list, optional): Additional robots.txt rulesdisallowed_paths(list, optional): Paths to block
from dash_improve_my_llms import RobotsConfig
config = RobotsConfig(
block_ai_training=True,
crawl_delay=15,
disallowed_paths=["/admin", "/api/*"]
)
app._robots_config = configfrom dash_improve_my_llms.bot_detection import (
is_ai_training_bot,
is_ai_search_bot,
is_traditional_bot,
is_any_bot,
get_bot_type
)
user_agent = request.headers.get('User-Agent', '')
# Check bot type
is_ai_training_bot(user_agent) # Returns bool
is_ai_search_bot(user_agent) # Returns bool
is_traditional_bot(user_agent) # Returns bool
is_any_bot(user_agent) # Returns bool
get_bot_type(user_agent) # Returns "training", "search", "traditional", or "unknown"from dash_improve_my_llms.sitemap_generator import SitemapEntry
custom_entry = SitemapEntry(
loc="https://myapp.com/special",
changefreq="monthly",
priority=0.6
)
# Add to sitemap via configurationfrom dash_improve_my_llms import (
generate_llms_txt,
generate_page_json,
generate_architecture_txt
)
from dash_improve_my_llms.robots_generator import generate_robots_txt
from dash_improve_my_llms.sitemap_generator import generate_sitemap_xml
# Generate documentation programmatically
llms_content = generate_llms_txt("/mypage", layout_func, "My Page", app)
page_arch = generate_page_json("/mypage", layout_func, app)
app_arch = generate_architecture_txt(app)
# Generate SEO files
robots_content = generate_robots_txt(robots_config, sitemap_url, base_url)
sitemap_content = generate_sitemap_xml(pages, base_url)v0.2.0 is fully backward compatible. All v0.1.0 code works without changes.
New features (optional):
# 1. Configure bot policies
app._robots_config = RobotsConfig(block_ai_training=True)
# 2. Set base URL for SEO
app._base_url = "https://myapp.com"
# 3. Hide sensitive pages
from dash_improve_my_llms import mark_hidden
mark_hidden("/admin")
# That's it! Enjoy:
# - /robots.txt
# - /sitemap.xml
# - Better SEO
# - Bot control- β Bot Detection - Identify AI training, AI search, and traditional bots
- β Robots.txt Generation - Automatic with configurable policies
- β Sitemap.xml Generation - Smart priorities and change frequencies
- β Static HTML for Bots - Schema.org structured data
- β Privacy Controls - mark_hidden() for sensitive pages
- β Component Hiding - Exclude components from extraction
- β 88 Comprehensive Tests - 100% pass rate in 0.22s
- β 98-100% Coverage - All new modules fully tested
- β Better SEO - Priority inference, change frequency detection
- β Bot Differentiation - Fine-grained control per bot type
dash_improve_my_llms/bot_detection.py- Bot user agent detectiondash_improve_my_llms/robots_generator.py- robots.txt generationdash_improve_my_llms/sitemap_generator.py- sitemap.xml generationdash_improve_my_llms/html_generator.py- Static HTML for botstests/test_bot_detection.py- 14 comprehensive teststests/test_robots_generator.py- 16 comprehensive teststests/test_sitemap_generator.py- 33 comprehensive teststests/test_html_generator.py- 20 comprehensive teststests/test_integration.py- 15 integration testsTEST_REPORT.md- Complete test documentation
- Python: 3.8, 3.9, 3.10, 3.11, 3.12+
- Dash: 3.2.0+
- Dash Mantine Components: 2.3.0+ (optional)
Works with:
- β
Dash Pages (
dash.register_page) - β
Manual routing (
dcc.Location) - β Multi-page apps
- β Single-page apps
- β All Dash component libraries
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all 88 tests pass
- Submit a pull request
# Run tests
pytest tests/ -v
# Run tests with coverage
pytest tests/ --cov=dash_improve_my_llms --cov-report=html
# Format code
black dash_improve_my_llms/ tests/MIT License - see LICENSE file for details.
Built by Pip Install Python LLC for the Dash community.
Inspired by:
Special thanks to the Dash community and Plotly team.
- Documentation: CLAUDE.md
- Test Report: TEST_REPORT.md
- PyPI: dash-improve-my-llms
- Dash: dash.plotly.com
- Plotly Pro: plotly.pro
- Issues: GitHub Issues
Made with β€οΈ for the Dash community