---
title: Apple Sales & Warranty Analysis
author: Karandeep Singh
date: 3 March 2025
categories: [SQL,PYTHON]
description: Sales & Warranty Analysis
type: website 
jupyter: python3
code-overflow: scroll
warning: False
toc: True
toc-title: On this page
code-links:
    icon: github
    text: "Project Repositary"
    href: "https://github.com/gitbykaran/Apple-Sales-Warranty-Analysis-using-SQL-Pandas"
---

![](project-pic.jpg)

### Project Overview
This project analyzes Apple's sales, product, store, and warranty data using Advanced SQL queries. The dataset contains **over 1 million records** across multiple tables, including:


- Sales 🛒
- Products 📱
- Categories 🎯
- Warranty Claims 🛠️
- Stores 🏬

By leveraging complex SQL queries, we extract **key business insights**, such as store performance, product demand, warranty claim trends, and year-over-year sales growth.

### Dataset Structure
| Table Name | Description |
|------------|-------------|
| `sales` | Records of all product sales including date, store, quantity, and revenue. |
| `products` | List of all Apple products with their specifications and pricing. |
| `categories` | Classification of products into different categories. |
| `warranty` | Details of all warranty claims, including status and resolution. |
| `stores` | Information on Apple store locations worldwide. |


### Data Preprocessing & EDA

In [1]:
import pandas as pd 

sales = pd.read_csv('sales.csv')
products = pd.read_csv('products.csv')
stores = pd.read_csv('stores.csv')
categories = pd.read_csv('categories.csv')
warranty = pd.read_csv('warranty.csv')

sales.head(), products.head(), stores.head(), categories.head(), warranty.head()

(   sale_id   sale_date  store_id  product_id  quantity
 0        1  2022-03-29        27         263         1
 1        2  2022-10-04        36         336         3
 2        3  2022-04-28        29         324         2
 3        4  2024-04-02        38         322         4
 4        5  2022-03-24        57         282         4,
    product_id     product_name  category_id launch_date  price
 0           1       Offer Many            7  2024-11-19   1072
 1           2        Down Hair            4  2023-06-10   2021
 2           3        World Its            8  2021-04-05   1496
 3           4  Certain Improve            5  2024-01-03   2309
 4           5    Audience Join            7  2021-12-28    795,
    store_id                  store_name                city           country
 0         1               Miller-Walker       Port Karlland         Sri Lanka
 1         2      Peck, Hughes and Wolfe         South Jerry             Kenya
 2         3               Valdez-Weaver 

In [2]:
sales.shape, products.shape, stores.shape, categories.shape, warranty.shape

((900000, 5), (500, 5), (100, 4), (10, 2), (100000, 4))

In [3]:
sales.info(), products.info(), stores.info(), categories.info(), warranty.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 900000 entries, 0 to 899999
Data columns (total 5 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   sale_id     900000 non-null  int64 
 1   sale_date   900000 non-null  object
 2   store_id    900000 non-null  int64 
 3   product_id  900000 non-null  int64 
 4   quantity    900000 non-null  int64 
dtypes: int64(4), object(1)
memory usage: 34.3+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   product_id    500 non-null    int64 
 1   product_name  500 non-null    object
 2   category_id   500 non-null    int64 
 3   launch_date   500 non-null    object
 4   price         500 non-null    int64 
dtypes: int64(3), object(2)
memory usage: 19.7+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):


(None, None, None, None, None)

### Connecting to Database

In [3]:
from sqlalchemy import create_engine as ce 

engine = ce('mysql://root:Karandeep2417@localhost:3306/apple_db')
conn = engine.connect()

df_to_sql = sales.to_sql('sales',con=conn,if_exists='replace',index=False)
df_to_sql = products.to_sql('products',con=conn,if_exists='replace',index=False)
df_to_sql = stores.to_sql('stores',con=conn,if_exists='replace',index=False)
df_to_sql = categories.to_sql('categories',con=conn,if_exists='replace',index=False)
df_to_sql = warranty.to_sql('warranty',con=conn,if_exists='replace',index=False)

%load_ext sql 
%sql mysql://root:Karandeep2417@localhost/apple_db
%config SqlMagic.style = '_DEPRECATED_DEFAULT'

%config SqlMagic.autopandas = True
%config SqlMagic.feedback = 0
%config SqlMagic.displaycon = False


<br>

### Analysis

<br>

1. Find the number of stores in each country.

In [10]:
%%sql 
SELECT 
country,
COUNT(store_id) AS cnt
FROM stores
GROUP BY country
ORDER BY COUNT(store_id) desc;

Unnamed: 0,country,cnt
0,Cote d'Ivoire,3
1,Libyan Arab Jamahiriya,2
2,Kenya,2
3,Saint Barthelemy,2
4,Zambia,2
...,...,...
79,Venezuela,1
80,French Southern Territories,1
81,United States Minor Outlying Islands,1
82,Cocos (Keeling) Islands,1


<br>

2. Calculate the total number of units sold by each store.

In [11]:
%%sql
SELECT 
s.store_name,
SUM(sl.quantity) AS units_sold
FROM stores s JOIN sales sl 
ON s.store_id = sl.store_id
GROUP BY s.store_name;

Unnamed: 0,store_name,units_sold
0,Cunningham LLC,22584
1,"Clark, Wilkinson and Flores",23078
2,Monroe LLC,22421
3,Johnson Ltd,22814
4,Snyder-Everett,21920
...,...,...
95,Bell Group,21798
96,"Simmons, Green and Oconnell",22603
97,Goodman-Banks,22838
98,Williams-Mooney,22328


<br>

3. Identify how many sales occurred in December 2023.

In [12]:
%%sql 
SELECT
'December 2023' AS Date, 
SUM(quantity) sales
FROM sales
WHERE  MONTH(sale_date) = 12 AND YEAR(sale_date) = 2023 ;

Unnamed: 0,Date,sales
0,December 2023,63478


<br>

4. Determine how many stores have never had a warranty claim filed.

In [13]:
%%sql 
SELECT COUNT(*) AS cnt FROM stores 
WHERE store_id NOT IN (
						SELECT DISTINCT store_id
						FROM sales s 
						RIGHT JOIN warranty w 
						ON s.sale_id = w.sale_id
						);

Unnamed: 0,cnt
0,0


<br>

5. Calculate the percentage of warranty claims marked as "Warranty Void". 

In [14]:
%%sql 
SELECT 
(COUNT(claim_id)/(SELECT COUNT(*) FROM warranty)) * 100 AS percentage
FROM warranty
WHERE repair_status = 'Warranty Void'

Unnamed: 0,percentage
0,19.992


<br>

6. Identify which store had the highest total units sold in the last year.

In [15]:
%%sql
SELECT 
s.product_id,
st.store_name,
SUM(s.quantity) as units 
FROM sales s JOIN stores st
ON s.store_id = st.store_id
WHERE YEAR(s.sale_date) = 2024
GROUP BY s.product_id , st.store_name
ORDER BY SUM(s.quantity) DESC
LIMIT 1;

Unnamed: 0,product_id,store_name,units
0,467,Wright-Allen,50


<br>

7. Count the number of unique products sold in the last year. 

In [16]:
%%sql
SELECT
COUNT(DISTINCT product_id) AS unique_products
FROM sales 
WHERE YEAR(sale_date) = 2024;

Unnamed: 0,unique_products
0,500


<br>

8. Find the average price of products in each category.

In [17]:
%%sql
SELECT 
p.category_id,
c.category_name,
AVG(p.price) AS avg_price
FROM products p JOIN categories c 
ON p.category_id = c.category_id
GROUP BY p.category_id ,c.category_name ;

Unnamed: 0,category_id,category_name,avg_price
0,7,Place,1630.7818
1,4,Free,1545.6458
2,8,Edge,1673.8627
3,5,Sure,1487.102
4,10,Market,1492.7222
5,3,On,1435.8947
6,6,Above,1764.8974
7,2,Scientist,1745.3902
8,1,Majority,1619.5323
9,9,Record,1560.1818


<br>

9. How many warranty claims were filed in 2022? 

In [18]:
%%sql
SELECT COUNT(*) cnt
FROM warranty
WHERE YEAR(claim_date) = 2022

Unnamed: 0,cnt
0,27837


<br>

10. For each store, identify the best-selling day based on highest quantity sold

In [19]:
%%sql
SELECT * 
FROM 
(SELECT 
store_id,
WEEKDAY(sale_date) day,
SUM(quantity) units_sold,
RANK() OVER(PARTITION BY store_id ORDER BY SUM(quantity) DESC) AS rnk
FROM sales
GROUP BY store_id,WEEKDAY(sale_date)
) AS t1
WHERE rnk = 1;

Unnamed: 0,store_id,day,units_sold,rnk
0,1,4,3499,1
1,2,4,3379,1
2,3,2,3356,1
3,4,5,3424,1
4,5,0,3312,1
...,...,...,...,...
95,96,6,3357,1
96,97,2,3410,1
97,98,4,3361,1
98,99,3,3322,1


<br>

11. Identify the least selling product in each country for each year based on total units sold.

In [20]:
%%sql
WITH product_rank
AS
(SELECT 
p.product_name,
st.country,
SUM(s.quantity) units_sold,
RANK() OVER(PARTITION BY st.country ORDER BY SUM(s.quantity)) AS rnk
FROM sales s JOIN products p 
ON s.product_id = p.product_id
JOIN stores st 
ON s.store_id = st.store_id
GROUP BY 2,1
)
SELECT *
FROM product_rank
WHERE rnk = 1;

Unnamed: 0,product_name,country,units_sold,rnk
0,Factor Research,Afghanistan,13,1
1,Power Memory,Albania,10,1
2,Focus Reflect,Antarctica (the territory South of 60 deg S),14,1
3,Race Stock,Aruba,14,1
4,Phone Nothing,Aruba,14,1
...,...,...,...,...
90,Focus Reflect,Vanuatu,18,1
91,If Build,Venezuela,19,1
92,Assume Serious,Western Sahara,17,1
93,Rather Affect,Yemen,48,1


<br>

12. Calculate how many warranty claims were filed within 180 days of a product sale.

In [21]:
%%sql
WITH claims AS 
(SELECT 
w.*,
s.sale_date,
DATEDIFF(w.claim_date,s.sale_date) AS day_diff
FROM warranty w
LEFT JOIN sales s
ON w.sale_id = s.sale_id
WHERE DATEDIFF(w.claim_date,s.sale_date) > 0 AND DATEDIFF(w.claim_date,s.sale_date) <=180
)
SELECT COUNT(*) AS cnt FROM claims

Unnamed: 0,cnt
0,15049


<br>

13. Determine how many warranty claims were filed for products launched in the last two years.

In [22]:
%%sql
SELECT COUNT(w.claim_id) as cnt
FROM warranty w 
JOIN sales s 
ON w.sale_id = s.sale_id
JOIN products p
ON s.product_id = p.product_id
WHERE YEAR(p.launch_date) = 2022

Unnamed: 0,cnt
0,20319


<br>

14. List the months in the last three years where sales exceeded 1,000 units in the USA.

In [None]:
%%sql
SELECT 
MONTH(s.sale_date) AS month,
SUM(s.quantity) AS units
FROM sales s
JOIN stores st 
ON s.store_id = st.store_id
WHERE st.country = 'United States of America' 
AND YEAR(s.sale_date) IN (2024,2023,2022)
GROUP BY MONTH(s.sale_date)
HAVING SUM(s.quantity) > 1000

<br>

15. Identify the product category with the most warranty claims filed in the last two years.

In [23]:
%%sql
SELECT 
ct.category_name,
COUNT(w.claim_id) claims
FROM warranty w 
LEFT JOIN sales s
ON w.sale_id = s.sale_id
JOIN products p 
ON p.product_id = s.product_id
JOIN categories ct 
ON ct.category_id = p.category_id
WHERE YEAR(w.claim_date) IN(2023,2024) 
GROUP BY ct.category_name
ORDER BY 2 DESC

Unnamed: 0,category_name,claims
0,Majority,8309
1,On,7614
2,Place,7577
3,Market,7207
4,Edge,6637
5,Sure,6553
6,Free,6442
7,Record,5867
8,Scientist,5375
9,Above,5195


<br>

16. Determine the percentage chance of receiving warranty claims after each purchase for each country.

In [24]:
%%sql
SELECT 
country,
(total_units_sold/total_claims) * 100  AS claim_ratio
FROM
(SELECT 
st.country,
SUM(s.quantity) total_units_sold,
SUM(w.claim_id) total_claims
FROM sales s
JOIN stores st 
ON s.store_id = st.store_id
LEFT JOIN warranty w 
ON s.sale_id = w.sale_id
GROUP BY 1
) t1
GROUP BY 1

Unnamed: 0,country,claim_ratio
0,Congo,0.0454
1,Saint Barthelemy,0.0435
2,British Virgin Islands,0.0427
3,Tanzania,0.0451
4,Russian Federation,0.0449
...,...,...
79,Saint Martin,0.0432
80,Zambia,0.0450
81,Turks and Caicos Islands,0.0453
82,Lithuania,0.0446


<br>

17. Analyze the year-by-year growth ratio for each store.

In [25]:
%%sql
WITH yearly_sales 
AS
(SELECT 
s.store_id,
st.store_name,
YEAR(s.sale_date) as year,
SUM(s.quantity * p.price) as current_year_sales
FROM sales s 
JOIN products p
ON s.product_id = p.product_id
JOIN stores st 
ON st.store_id = s.store_id
GROUP BY 1,2,3
ORDER BY 2,3 
),
growth_ratio 
AS
(
SELECT 
store_id,
store_name,
year,
current_year_sales,
LAG(current_year_sales,1) OVER(PARTITION BY store_name ORDER BY year) AS previous_year_sale
FROM yearly_sales
)
SELECT
store_id,
store_name,
year,
current_year_sales,
previous_year_sale,
((current_year_sales - previous_year_sale)/previous_year_sale) * 100 AS growth
FROM growth_ratio

Unnamed: 0,store_id,store_name,year,current_year_sales,previous_year_sale,growth
0,66,Adams and Sons,2022,10282506,,
1,66,Adams and Sons,2023,12133014,10282506,17.9967
2,66,Adams and Sons,2024,11869887,12133014,-2.1687
3,66,Adams and Sons,2025,1860443,11869887,-84.3264
4,6,Adams Ltd,2022,9844527,,
...,...,...,...,...,...,...
395,26,"Wu, Floyd and Clark",2025,1963779,11696911,-83.2111
396,84,Young Inc,2022,9588247,,
397,84,Young Inc,2023,11534997,9588247,20.3035
398,84,Young Inc,2024,11883712,11534997,3.0231


<br>

18. Calculate the correlation between product price and warranty claims for products sold in the last five years, segmented by price range

In [26]:
%%sql
WITH range_table AS 
(SELECT 
p.product_id,
p.price,
w.claim_id,
CASE 
	WHEN p.price < 500 THEN 'Cheap Product'
    WHEN p.price BETWEEN 500 AND 1500 THEN 'Affordable Product'
    ELSE 'Expensive'
    END AS price_range
FROM warranty w
LEFT JOIN sales s 
ON w.sale_id = s.sale_id
JOIN products p 
ON p.product_id = s.product_id
)
SELECT 
price_range,
COUNT(claim_id) claims
FROM range_table
GROUP BY 1 

Unnamed: 0,price_range,claims
0,Expensive,53134
1,Affordable Product,35609
2,Cheap Product,11257


<br>

19. Identify the store with the highest percentage of "Paid Repaired" claims relative to total claims filed.

In [27]:
%%sql
WITH claims_paid_repaired AS 
(SELECT 
s.store_id,
COUNT(*) paid_repaired 
FROM warranty w
LEFT JOIN sales s 
ON w.sale_id = s.sale_id
WHERE w.repair_status = 'Paid Repaired'
GROUP BY 1
),
total_claims AS 
(SELECT 
s.store_id,
COUNT(*) no_claims 
FROM warranty w
LEFT JOIN sales s 
ON w.sale_id = s.sale_id
GROUP BY 1
)
SELECT 
tc.store_id,
pc.paid_repaired,
tc.no_claims ,
(pc.paid_repaired/tc.no_claims)*100 paid_claims_percentage
FROM total_claims tc
JOIN claims_paid_repaired pc
ON tc.store_id = pc.store_id

Unnamed: 0,store_id,paid_repaired,no_claims,paid_claims_percentage
0,77,801,1006,79.6223
1,96,763,957,79.7283
2,54,795,1007,78.9474
3,47,838,1041,80.4995
4,1,783,971,80.6385
...,...,...,...,...
95,92,809,1020,79.3137
96,12,771,964,79.9793
97,38,832,1006,82.7038
98,76,798,998,79.9599


<br>

20. Write a query to calculate the monthly running total of sales for each store over the past four years and compare trends during this period.

In [28]:
%%sql
WITH monthly_sales AS 
(SELECT 
s.store_id,
YEAR(s.sale_date) AS year,
MONTH(s.sale_date) AS month,
SUM(s.quantity * p.price) sales
FROM sales s
JOIN products p 
ON s.product_id = p.product_id
GROUP BY s.store_id ,YEAR(s.sale_date),MONTH(s.sale_date)
ORDER BY 1 ,2 ,3 
)
SELECT 
store_id,
year,
month,
SUM(sales) OVER(PARTITION BY store_id ORDER BY year,month) AS running_total
FROM monthly_sales

Unnamed: 0,store_id,year,month,running_total
0,1,2022,3,910190
1,1,2022,4,1908425
2,1,2022,5,3022207
3,1,2022,6,3892525
4,1,2022,7,4882024
...,...,...,...,...
3595,100,2024,10,32409712
3596,100,2024,11,33456626
3597,100,2024,12,34592094
3598,100,2025,1,35609928


<br>

> **NOTE** <br>
> This Apple's Dataset was generated by ChatGPT.