This project involves an in-depth analysis of an e-commerce dataset using SQL for data extraction and Python (Pandas, Numpy, Matplotlib, and Seaborn) for exploratory data analysis (EDA). The goal is to uncover insights and trends within the data, ranging from basic to advanced queries.
- Project Description
- Dataset
- Setup and Installation
- EDA and SQL Queries
- Results and Visualizations
- Conclusion
- Acknowledgements
This project utilizes SQL for querying data and Python libraries such as Pandas, Numpy, Matplotlib, and Seaborn for EDA. The analysis aims to answer various business questions, ranging from basic statistics to advanced insights, which are crucial for making informed decisions in the e-commerce domain.
The dataset used in this project includes information on customers, orders, products, and sellers. It is assumed to be stored in a relational database and can be accessed using SQL queries.
-
Clone the Repository:
git clone https://github.com/yourusername/SQL-Python-E-commerce-Project.git cd SQL-Python-E-commerce-Project -
Install Dependencies:
pip install pandas numpy matplotlib seaborn sqlalchemy
-
Database Connection: Configure your database connection in the
config.pyfile.
Basic Queries
- List all unique cities where customers are located.
- Count the number of orders placed in 2017.
- Find the total sales per category.
- Calculate the percentage of orders that were paid in installments.
- Count the number of customers from each state.
Intermediate Queries
- Calculate the number of orders per month in 2018.
- Find the average number of products per order, grouped by customer city.
- Calculate the percentage of total revenue contributed by each product category.
- Identify the correlation between product price and the number of times a product has been purchased.
- Calculate the total revenue generated by each seller, and rank them by revenue.
Advanced Queries
- Calculate the moving average of order values for each customer over their order history.
- Calculate the cumulative sales per month for each year.
- Calculate the year-over-year growth rate of total sales.
- Calculate the retention rate of customers, defined as the percentage of customers who make another purchase within 6 months of their first purchase.
- Identify the top 3 customers who spent the most money in each year.
Results of the queries and visualizations will be generated and displayed using Python libraries such as Pandas, Matplotlib, and Seaborn. Example visualizations include bar charts for sales per category, line plots for monthly sales trends, and scatter plots for correlation analysis.
This project demonstrates how to combine SQL and Python for comprehensive data analysis in the e-commerce domain. By answering various business questions, you can gain valuable insights to make informed decisions.