This project provides a complete end-to-end analysis of Vendor Performance using Python, MySQL, and Power BI.
It covers every stage of a professional data analytics workflow — from data upload and SQL-based preprocessing to Exploratory Data Analysis (EDA), performance evaluation, and dashboard visualization.
The goal of this project is to derive actionable insights on vendor efficiency, purchasing behavior, and profitability using real-world-style data.
Download all CSV files from the link and import them into your project directory or database before starting the analysis.
| File Name | Description |
|---|---|
| Data_upload_using_python.ipynb | Uploads and cleans data using Python and SQL connections to MySQL. Handles missing values and merges data tables. |
| Exploratory_Data_Analysis.ipynb | Performs data exploration, correlation study, and outlier detection to understand dataset behavior. |
| Vendor_Performance_Analysis.ipynb | Conducts performance analysis by vendor and brand using statistical and SQL-based aggregations. |
| EXPLORATORY DATA ANALYSIS REPORT.pdf | Contains summary statistics, data filtering results, and correlation insights. |
| DASHBOARD PDF.pdf | Power BI visualization of all key findings and KPIs. |
- Python 3.x – For scripting and data analysis
- MySQL – For database management, querying, and data transformation
- Power BI Desktop – For building interactive dashboards
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, SQLAlchemy
- Environment: VS Code & MySQL
📘 File: Data_upload_using_python.ipynb
The analysis began by connecting Python to MySQL using SQLAlchemy and mysql.connector to efficiently manage and query data.
-
Data Import into MySQL:
All CSV datasets (sales, purchase, vendor) were imported into MySQL tables for structured storage and SQL-based transformations. -
SQL Querying & Extraction:
SQL queries were used to:- Check data integrity and record counts.
- Identify missing or inconsistent values.
- Perform conditional filtering using SQL WHERE clauses for clean and accurate results.
-
Python-MySQL Integration:
Python was used to execute SQL queries and retrieve dataframes for analysis using:- SQL connections for reading data.
to_sql()andread_sql_query()functions for transferring data between MySQL and Python.
-
Data Cleaning:
- Removed duplicates and irrelevant rows.
- Replaced or dropped missing values.
- Converted date columns into datetime format.
- Removed transactions with zero sales or negative gross profit.
-
Final Output:
The cleaned, structured data was exported from MySQL to a CSV file (vendor_sales_summary.csv) for further analysis in Python.
📘 File: Exploratory_Data_Analysis.ipynb
📄 Report: EXPLORATORY DATA ANALYSIS REPORT.pdf
EDA focused on understanding the dataset’s characteristics, patterns, and relationships between key variables.
-
Summary Statistics:
Generated descriptive measures such as mean, median, and standard deviation to understand the data distribution. -
Outlier Detection:
Identified extreme values in fields like freight cost, purchase price, and stock turnover.
These outliers indicated potential logistic inefficiencies or special product categories. -
Data Filtering:
Excluded unprofitable or inconsistent records using conditions such as:- Gross Profit ≤ 0
- Profit Margin ≤ 0
- Sales Quantity = 0
-
Correlation Analysis:
- Weak correlation between purchase price and profit margin → price fluctuations have limited effect on profit.
- Strong correlation between purchase and sales quantities → efficient stock flow.
- Negative correlation between sales price and profit margin → competitive pricing behavior.
-
Findings Summary:
After cleaning and filtering, the dataset represented only meaningful, profitable transactions for reliable business insights.
📘 File: Vendor_Performance_Analysis.ipynb
This stage focused on evaluating the financial and operational performance of each vendor.
-
Data Aggregation:
Vendor-wise and brand-wise data were aggregated to calculate total purchases, sales, and gross profits. -
Performance Metrics Calculated:
- Purchase Contribution % – the vendor’s share in overall purchase volume.
- Sales-to-Purchase Ratio – indicates how effectively each vendor converts purchases into sales.
- Profit Margin % – highlights the profitability of each vendor’s sales.
-
Vendor Classification:
- Top-performing vendors were identified based on gross profit and high sales-to-purchase ratios.
- Low-performing vendors were flagged for low profitability or underperformance.
-
Visual Insights:
The results were visualized to display:- Top 10 Vendors by Gross Profit.
- Low-performing vendors by profit margin.
- Unsold inventory or capital tied with specific vendors.
-
MySQL Usage in This Stage:
- SQL queries were used to group and summarize vendor performance directly in the database before exporting to Python.
- Complex joins and aggregations were executed using SQL for speed and accuracy.
- Clean vendor summaries were then visualized in Python.
📄 File: DASHBOARD PDF.pdf
After Python and SQL-based processing, the final cleaned dataset was imported into Power BI to visualize the findings.
-
Data Loading:
Imported the cleaned vendor dataset into Power BI and established relationships between different tables (Sales, Purchase, Vendor). -
DAX Measures:
Created custom calculations to compute total sales, total purchases, gross profit, profit margin percentage, and unsold capital. -
Dashboard Components:
- KPI Cards displaying Total Sales ($441.41M), Total Purchase ($307.34M), Gross Profit ($134.07M), and Profit Margin (38.72%).
- Top Vendors by Sales: Diageo North America, Pernod Ricard USA, Jim Beam Brands Co.
- Top Brands by Sales: Jack Daniels No.7, Tito’s Handmade Vodka, Grey Goose Vodka.
- Low-Performing Vendors and Brands: Shown via bar charts and contribution metrics.
- Unsold Capital by Vendor: Visualized using tree maps and pie charts.
-
Insights from Dashboard:
- A few major vendors contributed nearly 80% of total sales.
- Vendors purchasing in bulk benefited from significant unit cost reductions (~72%).
- High levels of unsold capital were traced to specific underperforming vendors.
| Category | Key Finding |
|---|---|
| Profitability | SQL-based data cleaning removed loss-making entries, improving accuracy of profit calculations. |
| Efficiency | Strong correlation between purchase and sales quantities confirmed efficient vendor inventory cycles. |
| Pricing Strategy | Bulk purchase discounts positively impacted profit margins. |
| Inventory Optimization | Identified vendors with high unsold stock, allowing targeted stock clearance strategies. |
| Automation | MySQL queries automated data filtering, improving performance and reusability across updates. |
- Data successfully cleaned, structured, and stored using MySQL.
- Unprofitable and inconsistent records removed for accuracy.
- Key metrics and vendor KPIs generated through Python and SQL collaboration.
- Insights visualized through Power BI Dashboard.
- Actionable findings derived to guide vendor strategy and purchasing efficiency.
Ravindra Kumar Nayak
Data Analyst | MySQL | Python | Power BI | Data Visualization