A comprehensive solution for parsing Excel files with tagged resources, extracting all fields dynamically, storing data in MongoDB, and performing advanced analytics with powerful visualization capabilities.
This application provides a complete end-to-end workflow for:
- π€ Uploading Excel files with tagged data
- π Parsing ALL fields from tags dynamically (no predefined schema)
- πΎ Storing data in MongoDB with optimized structure
- π Analyzing data with custom queries and visualizations
- π€ Querying via MCP tools for AI-powered insights
- Automatically extracts ALL fields from tags
- No need to predefine field names
- Supports unlimited custom fields
- Works with multiple tag formats (JSON, key-value, pipe-separated)
- π€ AI Assistant: Ask questions about Azure cost and Infrastructure related data
- π€ Excel Upload: Upload and process large Excel files (100K+ rows)
- π Query Builder: Build custom queries with dynamic filters and field selection
- π° Cost Analysis: Analyze costs by Application, Environment, Owner, and Date range
- π Drill Down Analysis: Hierarchical cost analysis (Application β Environment β Owner)
- π Monthly Comparison: Compare monthly costs across applications
- β Help: Interactive documentation viewer with search and navigation
- Optimized document structure for analytics
- All dynamic fields stored at top level for easy querying
- Automatic indexing for performance
- Real-time statistics
- 10+ tools for advanced data analysis
- Query by any field combination
- Cost analysis by any dimension
- Cross-tabulation and aggregations
- AI-friendly API for Claude and other LLMs
- Python 3.8+
- MongoDB 4.0+
- Excel files with tagged data
-
Clone the repository
git clone https://github.com/mudakara/excel-tags-parser-mongodb.git cd excel-tags-parser-mongodb -
Install dependencies
pip3 install -r requirements.txt
-
Start MongoDB
# macOS (Homebrew) brew services start mongodb-community # Or run directly mongod
-
Run the application
streamlit run src/ui/streamlit_app.py
-
Open in browser
- The app will automatically open at
http://localhost:8501
- The app will automatically open at
excel-tags-parser-mongodb/
βββ README.md # This file - project overview
βββ Documents/ # π All documentation (26 files)
β βββ INDEX.md # Documentation index with links
β βββ π Getting Started/
β βββ ποΈ Core Features/
β βββ π MCP Integration/
β βββ π¨ UI Components/
β βββ β‘ Performance Optimizations/
β βββ π§ Troubleshooting/
βββ src/
β βββ database/
β β βββ mongodb_client.py # MongoDB connection
β β βββ mongodb_operations.py # CRUD operations with dynamic fields
β βββ parser/
β β βββ excel_reader.py # Excel reading with chunking
β β βββ excel_writer.py # Excel writing with progress
β β βββ tag_parser.py # Dynamic tag parsing engine
β βββ ui/
β β βββ streamlit_app.py # Main app entry point
β β βββ pages/
β β βββ 0_π _Home.py # AI Assistant page
β β βββ 1_π€_Excel_Upload.py # Upload page
β β βββ 2_π_Query_Builder.py # Query Builder page
β β βββ 3_π°_Cost_Analysis.py # Cost Analysis page
β β βββ 4_π_Drill_Down_Analysis.py # Drill-down page
β β βββ 5_π_Monthly_Comparison.py # Comparison page
β β βββ 6_β_Help.py # Help & Documentation page
β βββ utils/
β βββ validators.py # File and data validation
βββ mcp_server/
β βββ server.py # MCP server with 10+ tools
β βββ test_mcp.py # MCP server tests
βββ config.py # Configuration settings
βββ requirements.txt # Python dependencies
π Complete Documentation Index - All documentation organized and searchable
- Documents/MONGODB_SETUP.md - MongoDB installation and setup
- Documents/PROJECT_CONTEXT.md - Complete project overview
- Documents/IMPLEMENTATION_SUMMARY.md - Implementation summary
- Documents/SETUP_AI_ASSISTANT.md - π AI assistant setup (5 min quick start)
- Documents/AI_QUERY_ASSISTANT.md - π AI query guide with examples
- Documents/STREAMLIT_MULTIPAGE_APP.md - Multi-page app guide
- Documents/DYNAMIC_PARSING_GUIDE.md - How dynamic tag parsing works
- Documents/MCP_QUICKSTART.md - Quick start for MCP tools
- Documents/MONGODB_DYNAMIC_FIELDS_UPDATE.md - MongoDB schema and dynamic fields
- Documents/MCP_DYNAMIC_QUERY_TOOLS.md - Complete MCP tools reference
- Documents/GITHUB_SETUP.md - Repository setup guide
- Documents/MONTHLY_COMPARISON_PAGE_OPTIMIZATION.md - 10-100x faster (Nov 17, 2025)
- Documents/DRILL_DOWN_ANALYSIS_PAGE_OPTIMIZATION.md - 20-30x faster (Nov 18, 2025)
- Documents/REPORTS_PAGE_PERFORMANCE_OPTIMIZATION.md - Query optimization
- Documents/TROUBLESHOOTING.md - General troubleshooting guide
- Documents/INDEX.md - Find specific fix documents
- test_dynamic_parsing.py - Tag parsing validation
- test_dynamic_mongodb.py - MongoDB field insertion tests
- test_mcp.py - MCP server tests
π‘ Tip: See Documents/INDEX.md for the complete organized documentation with 26 files categorized by topic.
- Natural Language Queries: Ask questions about Azure cost and Infrastructure related data
- Multiple LLM Support: OpenRouter (20+ models), Claude, or custom LLMs
- Automatic Tool Use: AI intelligently uses MongoDB MCP tools
- Interactive Chat: ChatGPT-style interface with message history
- Transparent Operations: See which tools the AI uses
- Real-time Analysis: Get insights, aggregations, and cost breakdowns
- Persistent Settings: LLM configuration saved to MongoDB, survives page refreshes
Example Questions:
- "What's the total cost by department?"
- "Show me all IT resources in production"
- "Which cost center has the highest spend?"
- "Find resources without proper tags"
See Documents/AI_QUERY_ASSISTANT.md and Documents/SETUP_AI_ASSISTANT.md for details.
- File upload with validation
- Progress tracking during processing
- Extract ALL tag fields dynamically
- Download processed Excel file
- Push data to MongoDB with progress bar
- Build custom queries with any field combination
- Add multiple filters dynamically
- Dynamic field explorer in sidebar
- Performance optimization tools (indexing)
- Database statistics on demand
- Export results to CSV
- Cache management
- Analyze costs by Application, Environment, Owner
- Single Month or Month Range selection
- Multi-select filters with $in operator support
- Total, average, min, max cost breakdown
- Monthly cost trend visualization
- Bar and pie chart visualizations
- Execution time tracking
- MongoDB query details display
- Hierarchical Navigation: Application β Environment β Owner
- Interactive Charts: Click-based drill-down with Plotly
- Time Period Filter: Last 3/6/9/12 months
- Top N Filter: View All or Top 5/10 applications
- Lazy Loading: On-demand data loading (20-30x faster)
- Caching: 5-minute intelligent caching
- Download: Export owner cost data to CSV
- Multi-Application Analysis: Compare 1-5 applications
- Custom Date Range: Select any month range
- Form-Based Input: Optimized for no-lag configuration
- Line Chart: Monthly cost trends visualization
- Pivot Table: Monthly breakdown by application
- Summary Metrics: Total, average, and month count
- Download: Export comparison data to CSV
- Ultra-Fast: 10-100x faster with distinct() query optimization
- Interactive Viewer: Read all 28 documentation files in-app
- Categorized Sidebar: Quick access to docs by category
- Search Functionality: Find specific topics and keywords
- Navigation: Back/Home buttons with history tracking
- Download: Export any document as .md file
- No External Links: Everything accessible within the app
See Documents/HELP_PAGE_IMPLEMENTATION.md for details.
The MCP server provides 10+ tools for advanced data analysis:
| Tool | Description |
|---|---|
get_available_fields |
List all queryable fields |
advanced_query |
Query by any field combination |
aggregate_by_any_field |
Group and aggregate by any field |
cost_analysis_by_field |
Cost breakdown by dimension |
multi_dimensional_analysis |
Cross-tabulate two fields |
query_resources |
Basic resource queries |
get_statistics |
Database overview stats |
get_total_cost |
Total cost with filters |
create_bar_chart |
Generate bar charts |
create_pie_chart |
Generate pie charts |
Start MCP Server:
cd mcp_server
python3 mongodb_mcp_server.pySee MCP_QUICKSTART.md for usage examples.
- Navigate to π€ Excel Upload page
- Upload your Excel file
- The parser extracts ALL fields from tags automatically
- Download the processed file or push to MongoDB
- Navigate to π Query Builder page
- Add filters (e.g.,
department = "IT",environment = "production") - Run query and export results
- Navigate to π° Cost Analysis page
- Select Application, Environment, Owner filters (multi-select supported)
- Choose Single Month or Month Range
- Click "Calculate Total Cost"
- View detailed breakdown with charts and metrics
- Navigate to π Drill Down Analysis page
- Select time period (Last 3/6/9/12 months or All)
- Choose Top N applications or view all
- Click on application to drill into environments
- Click on environment to see owner breakdown
Edit config.py to customize:
# MongoDB Settings
MONGODB_URI = "mongodb://localhost:27017/"
MONGODB_DATABASE = "azure"
MONGODB_COLLECTION = "resources"
# File Processing
CHUNK_SIZE = 10000
MAX_FILE_SIZE_MB = 100
ALLOWED_EXTENSIONS = ['.xlsx', '.xls']
# Tag Column
TAG_COLUMN = "Tags""primarycontact":"john doe","usage":"databricks prod","department":"IT"
applicationname:myapp,environment:prod,owner:john,usage:databricks
{"owner": "john", "environment": "production", "department": "IT"}myapp|production|john|1234.56
{
// Standard fields
"applicationName": "myapp",
"environment": "production",
"owner": "john",
"cost": 1234.56,
"date": "2025-11",
// ALL dynamic fields extracted from tags
"primaryContact": "jane doe",
"usage": "databricks prod",
"department": "IT",
"costCenter": "CC123",
"team": "analytics",
// ... unlimited custom fields
// Tags metadata
"tags": {
"raw": "original tag string",
"parsed": { /* all extracted fields */ }
},
// Original Excel data
"originalData": { /* complete row data */ },
// Import metadata
"metadata": {
"importDate": "2025-11-15T...",
"sourceFile": "filename.xlsx",
"dataDate": "2025-11"
}
}- Track all infrastructure resources
- Analyze costs by department, team, or owner
- Identify unused resources
- Analyze cloud spending by dimension
- Identify cost drivers
- Track usage patterns
- Ensure proper tagging compliance
- Identify untagged or mis-tagged resources
- Generate compliance reports
- Slice and dice by any dimension
- Create custom reports
- Export data for further analysis
# Make sure MongoDB is running
brew services start mongodb-community
# Or
mongod# Reinstall dependencies
pip3 install -r requirements.txt- Increase
CHUNK_SIZEin config.py - Ensure sufficient RAM
- Process files in batches
- Check tag format matches supported formats
- Enable debug logging in config.py
- Run test_dynamic_parsing.py to validate
- Large File Support: Handles 100K+ rows efficiently
- Chunked Processing: Memory-efficient streaming
- MongoDB Indexing: Optimized query performance
- Batch Insertion: Fast data loading
- Progress Tracking: Real-time updates
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is open source and available under the MIT License.
- Repository: https://github.com/mudakara/excel-tags-parser-mongodb
- Issues: https://github.com/mudakara/excel-tags-parser-mongodb/issues
- Streamlit Docs: https://docs.streamlit.io
- MongoDB Docs: https://docs.mongodb.com
- β Processed 200K+ rows in under 2 minutes
- β Extracted 50+ unique dynamic fields automatically
- β Reduced manual tagging analysis from hours to seconds
- β Enabled AI-powered querying via MCP tools
For help:
- Check documentation in this README
- Review troubleshooting section
- Open an issue on GitHub
Built with β€οΈ using Streamlit, MongoDB, and Python