Skip to content

Excel Tags Parser with MongoDB Integration - Process large Excel files, extract metadata, and integrate with MongoDB. Includes MCP server for LLM analytics.

Notifications You must be signed in to change notification settings

mudakara/excel-tags-parser-mongodb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Excel Tags Parser & Analytics Platform

A comprehensive solution for parsing Excel files with tagged resources, extracting all fields dynamically, storing data in MongoDB, and performing advanced analytics with powerful visualization capabilities.

🎯 Overview

This application provides a complete end-to-end workflow for:

  • πŸ“€ Uploading Excel files with tagged data
  • πŸ” Parsing ALL fields from tags dynamically (no predefined schema)
  • πŸ’Ύ Storing data in MongoDB with optimized structure
  • πŸ“Š Analyzing data with custom queries and visualizations
  • πŸ€– Querying via MCP tools for AI-powered insights

✨ Key Features

Dynamic Tag Parsing

  • Automatically extracts ALL fields from tags
  • No need to predefine field names
  • Supports unlimited custom fields
  • Works with multiple tag formats (JSON, key-value, pipe-separated)

Multi-Page Web Interface

  • πŸ€– AI Assistant: Ask questions about Azure cost and Infrastructure related data
  • πŸ“€ Excel Upload: Upload and process large Excel files (100K+ rows)
  • πŸ”Ž Query Builder: Build custom queries with dynamic filters and field selection
  • πŸ’° Cost Analysis: Analyze costs by Application, Environment, Owner, and Date range
  • πŸ“Š Drill Down Analysis: Hierarchical cost analysis (Application β†’ Environment β†’ Owner)
  • πŸ“ˆ Monthly Comparison: Compare monthly costs across applications
  • ❓ Help: Interactive documentation viewer with search and navigation

MongoDB Integration

  • Optimized document structure for analytics
  • All dynamic fields stored at top level for easy querying
  • Automatic indexing for performance
  • Real-time statistics

MCP Server

  • 10+ tools for advanced data analysis
  • Query by any field combination
  • Cost analysis by any dimension
  • Cross-tabulation and aggregations
  • AI-friendly API for Claude and other LLMs

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • MongoDB 4.0+
  • Excel files with tagged data

Installation

  1. Clone the repository

    git clone https://github.com/mudakara/excel-tags-parser-mongodb.git
    cd excel-tags-parser-mongodb
  2. Install dependencies

    pip3 install -r requirements.txt
  3. Start MongoDB

    # macOS (Homebrew)
    brew services start mongodb-community
    
    # Or run directly
    mongod
  4. Run the application

    streamlit run src/ui/streamlit_app.py
  5. Open in browser

    • The app will automatically open at http://localhost:8501

πŸ“ Project Structure

excel-tags-parser-mongodb/
β”œβ”€β”€ README.md                        # This file - project overview
β”œβ”€β”€ Documents/                       # πŸ“š All documentation (26 files)
β”‚   β”œβ”€β”€ INDEX.md                    # Documentation index with links
β”‚   β”œβ”€β”€ πŸš€ Getting Started/
β”‚   β”œβ”€β”€ πŸ—οΈ Core Features/
β”‚   β”œβ”€β”€ πŸ”Œ MCP Integration/
β”‚   β”œβ”€β”€ 🎨 UI Components/
β”‚   β”œβ”€β”€ ⚑ Performance Optimizations/
β”‚   └── πŸ”§ Troubleshooting/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ database/
β”‚   β”‚   β”œβ”€β”€ mongodb_client.py       # MongoDB connection
β”‚   β”‚   └── mongodb_operations.py   # CRUD operations with dynamic fields
β”‚   β”œβ”€β”€ parser/
β”‚   β”‚   β”œβ”€β”€ excel_reader.py         # Excel reading with chunking
β”‚   β”‚   β”œβ”€β”€ excel_writer.py         # Excel writing with progress
β”‚   β”‚   └── tag_parser.py           # Dynamic tag parsing engine
β”‚   β”œβ”€β”€ ui/
β”‚   β”‚   β”œβ”€β”€ streamlit_app.py        # Main app entry point
β”‚   β”‚   └── pages/
β”‚   β”‚       β”œβ”€β”€ 0_🏠_Home.py        # AI Assistant page
β”‚   β”‚       β”œβ”€β”€ 1_πŸ“€_Excel_Upload.py # Upload page
β”‚   β”‚       β”œβ”€β”€ 2_πŸ”Ž_Query_Builder.py # Query Builder page
β”‚   β”‚       β”œβ”€β”€ 3_πŸ’°_Cost_Analysis.py # Cost Analysis page
β”‚   β”‚       β”œβ”€β”€ 4_πŸ“Š_Drill_Down_Analysis.py # Drill-down page
β”‚   β”‚       β”œβ”€β”€ 5_πŸ“ˆ_Monthly_Comparison.py  # Comparison page
β”‚   β”‚       └── 6_❓_Help.py        # Help & Documentation page
β”‚   └── utils/
β”‚       └── validators.py           # File and data validation
β”œβ”€β”€ mcp_server/
β”‚   β”œβ”€β”€ server.py                   # MCP server with 10+ tools
β”‚   └── test_mcp.py                 # MCP server tests
β”œβ”€β”€ config.py                       # Configuration settings
└── requirements.txt                # Python dependencies

πŸ“– Documentation

πŸ“š Complete Documentation Index - All documentation organized and searchable

Quick Links

Getting Started (Start Here!)

User Guides

Technical Documentation

Performance Optimizations ⚑

Troubleshooting

Testing

  • test_dynamic_parsing.py - Tag parsing validation
  • test_dynamic_mongodb.py - MongoDB field insertion tests
  • test_mcp.py - MCP server tests

πŸ’‘ Tip: See Documents/INDEX.md for the complete organized documentation with 26 files categorized by topic.

🎨 Application Pages

πŸ€– AI Assistant

  • Natural Language Queries: Ask questions about Azure cost and Infrastructure related data
  • Multiple LLM Support: OpenRouter (20+ models), Claude, or custom LLMs
  • Automatic Tool Use: AI intelligently uses MongoDB MCP tools
  • Interactive Chat: ChatGPT-style interface with message history
  • Transparent Operations: See which tools the AI uses
  • Real-time Analysis: Get insights, aggregations, and cost breakdowns
  • Persistent Settings: LLM configuration saved to MongoDB, survives page refreshes

Example Questions:

  • "What's the total cost by department?"
  • "Show me all IT resources in production"
  • "Which cost center has the highest spend?"
  • "Find resources without proper tags"

See Documents/AI_QUERY_ASSISTANT.md and Documents/SETUP_AI_ASSISTANT.md for details.

πŸ“€ Excel Upload Page

  • File upload with validation
  • Progress tracking during processing
  • Extract ALL tag fields dynamically
  • Download processed Excel file
  • Push data to MongoDB with progress bar

πŸ”Ž Query Builder Page

  • Build custom queries with any field combination
  • Add multiple filters dynamically
  • Dynamic field explorer in sidebar
  • Performance optimization tools (indexing)
  • Database statistics on demand
  • Export results to CSV
  • Cache management

πŸ’° Cost Analysis Page

  • Analyze costs by Application, Environment, Owner
  • Single Month or Month Range selection
  • Multi-select filters with $in operator support
  • Total, average, min, max cost breakdown
  • Monthly cost trend visualization
  • Bar and pie chart visualizations
  • Execution time tracking
  • MongoDB query details display

πŸ“Š Drill Down Analysis Page

  • Hierarchical Navigation: Application β†’ Environment β†’ Owner
  • Interactive Charts: Click-based drill-down with Plotly
  • Time Period Filter: Last 3/6/9/12 months
  • Top N Filter: View All or Top 5/10 applications
  • Lazy Loading: On-demand data loading (20-30x faster)
  • Caching: 5-minute intelligent caching
  • Download: Export owner cost data to CSV

πŸ“ˆ Monthly Comparison Page

  • Multi-Application Analysis: Compare 1-5 applications
  • Custom Date Range: Select any month range
  • Form-Based Input: Optimized for no-lag configuration
  • Line Chart: Monthly cost trends visualization
  • Pivot Table: Monthly breakdown by application
  • Summary Metrics: Total, average, and month count
  • Download: Export comparison data to CSV
  • Ultra-Fast: 10-100x faster with distinct() query optimization

❓ Help & Documentation Page

  • Interactive Viewer: Read all 28 documentation files in-app
  • Categorized Sidebar: Quick access to docs by category
  • Search Functionality: Find specific topics and keywords
  • Navigation: Back/Home buttons with history tracking
  • Download: Export any document as .md file
  • No External Links: Everything accessible within the app

See Documents/HELP_PAGE_IMPLEMENTATION.md for details.

πŸ€– MCP Server Tools

The MCP server provides 10+ tools for advanced data analysis:

Tool Description
get_available_fields List all queryable fields
advanced_query Query by any field combination
aggregate_by_any_field Group and aggregate by any field
cost_analysis_by_field Cost breakdown by dimension
multi_dimensional_analysis Cross-tabulate two fields
query_resources Basic resource queries
get_statistics Database overview stats
get_total_cost Total cost with filters
create_bar_chart Generate bar charts
create_pie_chart Generate pie charts

Start MCP Server:

cd mcp_server
python3 mongodb_mcp_server.py

See MCP_QUICKSTART.md for usage examples.

πŸ’‘ Usage Examples

Upload and Process Excel File

  1. Navigate to πŸ“€ Excel Upload page
  2. Upload your Excel file
  3. The parser extracts ALL fields from tags automatically
  4. Download the processed file or push to MongoDB

Query Data

  1. Navigate to πŸ”Ž Query Builder page
  2. Add filters (e.g., department = "IT", environment = "production")
  3. Run query and export results

Cost Analysis

  1. Navigate to πŸ’° Cost Analysis page
  2. Select Application, Environment, Owner filters (multi-select supported)
  3. Choose Single Month or Month Range
  4. Click "Calculate Total Cost"
  5. View detailed breakdown with charts and metrics

Drill Down Analysis

  1. Navigate to πŸ“Š Drill Down Analysis page
  2. Select time period (Last 3/6/9/12 months or All)
  3. Choose Top N applications or view all
  4. Click on application to drill into environments
  5. Click on environment to see owner breakdown

πŸ”§ Configuration

Edit config.py to customize:

# MongoDB Settings
MONGODB_URI = "mongodb://localhost:27017/"
MONGODB_DATABASE = "azure"
MONGODB_COLLECTION = "resources"

# File Processing
CHUNK_SIZE = 10000
MAX_FILE_SIZE_MB = 100
ALLOWED_EXTENSIONS = ['.xlsx', '.xls']

# Tag Column
TAG_COLUMN = "Tags"

πŸ“Š Supported Tag Formats

1. Escaped JSON (Recommended)

"primarycontact":"john doe","usage":"databricks prod","department":"IT"

2. Key-Value Pairs

applicationname:myapp,environment:prod,owner:john,usage:databricks

3. JSON Format

{"owner": "john", "environment": "production", "department": "IT"}

4. Pipe-Separated (Limited)

myapp|production|john|1234.56

πŸ—ƒοΈ MongoDB Document Structure

{
  // Standard fields
  "applicationName": "myapp",
  "environment": "production",
  "owner": "john",
  "cost": 1234.56,
  "date": "2025-11",

  // ALL dynamic fields extracted from tags
  "primaryContact": "jane doe",
  "usage": "databricks prod",
  "department": "IT",
  "costCenter": "CC123",
  "team": "analytics",
  // ... unlimited custom fields

  // Tags metadata
  "tags": {
    "raw": "original tag string",
    "parsed": { /* all extracted fields */ }
  },

  // Original Excel data
  "originalData": { /* complete row data */ },

  // Import metadata
  "metadata": {
    "importDate": "2025-11-15T...",
    "sourceFile": "filename.xlsx",
    "dataDate": "2025-11"
  }
}

🎯 Use Cases

IT Asset Management

  • Track all infrastructure resources
  • Analyze costs by department, team, or owner
  • Identify unused resources

Cloud Cost Optimization

  • Analyze cloud spending by dimension
  • Identify cost drivers
  • Track usage patterns

Resource Governance

  • Ensure proper tagging compliance
  • Identify untagged or mis-tagged resources
  • Generate compliance reports

Data Analysis

  • Slice and dice by any dimension
  • Create custom reports
  • Export data for further analysis

πŸ› Troubleshooting

MongoDB Connection Error

# Make sure MongoDB is running
brew services start mongodb-community

# Or
mongod

Import Errors

# Reinstall dependencies
pip3 install -r requirements.txt

Large File Processing

  • Increase CHUNK_SIZE in config.py
  • Ensure sufficient RAM
  • Process files in batches

Tag Parsing Issues

  • Check tag format matches supported formats
  • Enable debug logging in config.py
  • Run test_dynamic_parsing.py to validate

πŸš€ Performance

  • Large File Support: Handles 100K+ rows efficiently
  • Chunked Processing: Memory-efficient streaming
  • MongoDB Indexing: Optimized query performance
  • Batch Insertion: Fast data loading
  • Progress Tracking: Real-time updates

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“œ License

This project is open source and available under the MIT License.

πŸ”— Links

πŸŽ‰ Success Stories

  • βœ… Processed 200K+ rows in under 2 minutes
  • βœ… Extracted 50+ unique dynamic fields automatically
  • βœ… Reduced manual tagging analysis from hours to seconds
  • βœ… Enabled AI-powered querying via MCP tools

πŸ“ž Support

For help:

  1. Check documentation in this README
  2. Review troubleshooting section
  3. Open an issue on GitHub

Built with ❀️ using Streamlit, MongoDB, and Python

About

Excel Tags Parser with MongoDB Integration - Process large Excel files, extract metadata, and integrate with MongoDB. Includes MCP server for LLM analytics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •