Skip to content

joshdewhurst/DataAnalysisInPython

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 

Repository files navigation

Project: Analyzing Sales Data for an E-commerce Store

Problem Statement:

A fictional e-commerce store has provided us with their sales data for the past year. The data includes information on sales revenue, products sold, customer demographics, and shipping information. Our task is to analyze the data and provide insights on the following questions:

  1. What is the overall revenue for the year?
  2. What were the top-selling products?
  3. What was the average order value?
  4. What is the customer demographics like?
  5. What was the shipping cost distribution?
  6. Can we identify any trends or patterns in the sales data?

Libraries Used:

For this project, I will be using the following libraries:

  • Pandas: for data manipulation and analysis
  • NumPy: for numerical computing
  • Matplotlib: for data visualization

Functions:

Loading the Data:

I will start by loading the data into a Pandas DataFrame using the read_csv() function:

import pandas as pd

Load the data from a CSV file

sales_data = pd.read_csv("sales_data.csv")

Overall Revenue:

To calculate the overall revenue for the year, we can use the sum() function to sum up the revenue column: overall_revenue = sales_data["revenue"].sum()

Top-Selling Products:

To identify the top-selling products, we can use the groupby() function to group the data by product, and then use the sum() function to calculate the total revenue for each product: product_revenue = sales_data.groupby("product")["revenue"].sum()

top_products = product_revenue.sort_values(ascending=False).head(10)

Average Order Value:

To calculate the average order value, we can divide the overall revenue by the number of orders:

avg_order_value = overall_revenue / sales_data["order_id"].nunique()

Customer Demographics:

To analyze the customer demographics, we can use the groupby() function to group the data by customer age and gender, and then calculate the average order value for each group:

customer_demographics = sales_data.groupby(["age", "gender"])["revenue"].mean()

Shipping Cost Distribution:

To analyze the shipping cost distribution, we can use the hist() function from Matplotlib to create a histogram of the shipping costs:

import matplotlib.pyplot as plt

plt.hist(sales_data["shipping_cost"], bins=20) plt.xlabel("Shipping Cost") plt.ylabel("Frequency") plt.show()

Trends and Patterns:

To identify trends and patterns in the sales data, we can use various data visualization techniques such as line charts, scatter plots, and heat maps:

sales_data["date"] = pd.to_datetime(sales_data["date"])

sales_data.set_index("date", inplace=True)

sales_data.resample("M")["revenue"].sum().plot()

plt.xlabel("Date") plt.ylabel("Revenue") plt.show()

plt.scatter(sales_data["age"], sales_data["revenue"]) plt.xlabel("Age") plt.ylabel("Revenue")

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published