Load the data from a CSV file

Project: Analyzing Sales Data for an E-commerce Store

Problem Statement:

A fictional e-commerce store has provided us with their sales data for the past year. The data includes information on sales revenue, products sold, customer demographics, and shipping information. Our task is to analyze the data and provide insights on the following questions:

What is the overall revenue for the year?
What were the top-selling products?
What was the average order value?
What is the customer demographics like?
What was the shipping cost distribution?
Can we identify any trends or patterns in the sales data?

Libraries Used:

For this project, I will be using the following libraries:

Pandas: for data manipulation and analysis
NumPy: for numerical computing
Matplotlib: for data visualization

Functions:

Loading the Data:

I will start by loading the data into a Pandas DataFrame using the read_csv() function:

import pandas as pd

Load the data from a CSV file

sales_data = pd.read_csv("sales_data.csv")

Overall Revenue:

To calculate the overall revenue for the year, we can use the sum() function to sum up the revenue column: overall_revenue = sales_data["revenue"].sum()

Top-Selling Products:

To identify the top-selling products, we can use the groupby() function to group the data by product, and then use the sum() function to calculate the total revenue for each product: product_revenue = sales_data.groupby("product")["revenue"].sum()

top_products = product_revenue.sort_values(ascending=False).head(10)

Average Order Value:

To calculate the average order value, we can divide the overall revenue by the number of orders:

avg_order_value = overall_revenue / sales_data["order_id"].nunique()

Customer Demographics:

To analyze the customer demographics, we can use the groupby() function to group the data by customer age and gender, and then calculate the average order value for each group:

customer_demographics = sales_data.groupby(["age", "gender"])["revenue"].mean()

Shipping Cost Distribution:

To analyze the shipping cost distribution, we can use the hist() function from Matplotlib to create a histogram of the shipping costs:

import matplotlib.pyplot as plt

plt.hist(sales_data["shipping_cost"], bins=20) plt.xlabel("Shipping Cost") plt.ylabel("Frequency") plt.show()

Trends and Patterns:

To identify trends and patterns in the sales data, we can use various data visualization techniques such as line charts, scatter plots, and heat maps:

sales_data["date"] = pd.to_datetime(sales_data["date"])

sales_data.set_index("date", inplace=True)

sales_data.resample("M")["revenue"].sum().plot()

plt.xlabel("Date") plt.ylabel("Revenue") plt.show()

plt.scatter(sales_data["age"], sales_data["revenue"]) plt.xlabel("Age") plt.ylabel("Revenue")

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project: Analyzing Sales Data for an E-commerce Store

Problem Statement:

Libraries Used:

Functions:

Loading the Data:

Load the data from a CSV file

Overall Revenue:

Top-Selling Products:

Average Order Value:

Customer Demographics:

Shipping Cost Distribution:

Trends and Patterns:

About

Uh oh!

Releases

Packages

joshdewhurst/DataAnalysisInPython

Folders and files

Latest commit

History

Repository files navigation

Project: Analyzing Sales Data for an E-commerce Store

Problem Statement:

Libraries Used:

Functions:

Loading the Data:

Load the data from a CSV file

Overall Revenue:

Top-Selling Products:

Average Order Value:

Customer Demographics:

Shipping Cost Distribution:

Trends and Patterns:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages