E-Commerce Data Analysis with Python and MySQL
Transform raw e-commerce CSV data into a MySQL database and extract actionable business insights using Python.
This project imports multiple CSV files related to an e-commerce business into MySQL, cleans and structures the data using Python, and performs analysis to understand sales trends, customer distribution, and revenue contribution by product category. Visualizations are generated to provide clear insights.
E-commerce businesses generate large amounts of raw data. Extracting meaningful insights from this data manually is time-consuming and error-prone. This project automates data cleaning, storage, and analysis to enable faster, accurate business decision-making.
- Customers – customer_id, name, city, state, etc.
- Orders – order_id, customer_id, purchase_timestamp, etc.
- Products – product_id, category, price, etc.
- Payments – payment_id, order_id, payment_value, payment_installments, etc.
- Sellers – seller_id, name, location, etc.
- Geolocation – seller_zip_code, city, state, etc.
- Order Items – order_item_id, order_id, product_id, etc.
All data is stored in CSV format in the E-Commerce folder.
- Python – Data cleaning and automation
- Pandas – CSV handling and data manipulation
- MySQL – Relational database for structured storage
- Matplotlib & Seaborn – Data visualization
- OS Library – File path management
-
Data Import & Cleaning : Read CSV files using pandas. : Clean column names to remove spaces, hyphens, or dots. : Replace missing values (NaN) with None for SQL compatibility.
-
Database Creation: Create MySQL tables with appropriate data types. : Insert cleaned data into respective tables.
-
Data Analysis & Visualization: Write SQL queries in Python to extract business insights. : Generate bar charts and statistical plots to visualize trends.
- List all unique cities where customers are located.
- Count the number of orders placed in 2017.
- Find the total sales per category.
- Calculate the percentage of orders that were paid in installments.
- Count the number of customers from each state.
- Calculate the number of orders per month in 2018.
- Find the average number of products per order, grouped by customer city.
- Calculate the percentage of total revenue contributed by each product category.
- Identify the correlation between product price and the number of times a product has been purchased.
- Clone this repository: git clone https://github.com/pracheta07/Python-SQL-Project
- Install Python 3.x and required libraries: pip install pandas mysql-connector-python matplotlib seaborn
- Create a MySQL database named ecommerce. 4.Update MySQL credentials in import_csv_to_mysql.py and data_analysis.py.
- Run the data import script: python import_csv_to_mysql.py
- Run the analysis and visualization script: python data_analysis.py
The project successfully transformed raw CSV data into a structured MySQL database and provided actionable business insights. The analysis revealed sales trends, customer distribution patterns, and revenue contribution by product category, enabling better decision-making for the e-commerce business. Visualizations made it easier to interpret and communicate the results.
Name: Pracheta Sharma Linkedin Profile: https://www.linkedin.com/in/pracheta-sharma-a6933725b/