<h1>Sales Data Analysis Dataset From Amazon S3</h1>

The Sales Data Analysis use case demonstrates how to utilize Singlestore's powerful querying capabilities to analyze sales data stored in a CSV file. This demo showcases typical operations that businesses perform to gain insights from their sales data, such as calculating total sales, identifying top-selling products, and analyzing sales trends over time. By working through this example, new users will learn how to load CSV data into Singlestore, execute aggregate functions, and perform time-series analysis, which are essential skills for leveraging the full potential of Singlestore in a business intelligence context.

<h3>Demo Flow</h3>

<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/LoadDataCSV.png width="100%" hight="50%"/>

<h3>Create Table</h3>

In [None]:
%%sql
CREATE TABLE `SalesData` (
  `Date` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
  `Store_ID` bigint(20) DEFAULT NULL,
  `ProductID` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
  `Product_Name` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
  `Product_Category` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
  `Quantity_Sold` bigint(20) DEFAULT NULL,
  `Price` float DEFAULT NULL,
  `Total_Sales` float DEFAULT NULL
)

<h3>Load Data</h3>

In [None]:
%%sql
CREATE PIPELINE SalesData_Pipeline AS
LOAD DATA S3 's3://singlestoreloaddata/SalesData/sales_data.csv'
CONFIG '{ \"region\": \"ap-south-1\" }'
/*
CREDENTIALS '{"aws_access_key_id": "<access key id>",
               "aws_secret_access_key": "<access_secret_key>"}'
               */
INTO TABLE SalesData
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
IGNORE 1 lines;

In [None]:
%%sql
START PIPELINE SalesData_Pipeline

In [9]:
%%sql
SELECT COUNT(*) FROM SalesData

COUNT(*)
15400000


In [14]:
%%sql
SELECT * FROM SalesData LIMIT 10

Date,Store_ID,ProductID,Product_Name,Product_Category,Quantity_Sold,Price,Total_Sales
2023-11-28,1075,PRD81,Digital Thermometer,Pharmacy,9,46.42,417.78
2023-10-03,1035,PRD57,Swimsuits,Clothing,14,54.78,766.92
2023-09-24,1073,PRD46,Monitors,Electronics,8,11.07,88.56
2023-09-07,1099,PRD96,Bird Grooming Kits,Pet Supplies,3,25.29,75.87
2023-12-26,1057,PRD25,Minoxidil 5% Topical Solution,Pharmacy,4,67.56,270.24
2023-08-30,1093,PRD69,Knee-High Boots,Clothing,19,30.88,586.72
2024-03-25,1064,PRD68,Ankle Boots,Clothing,8,36.41,291.28
2024-06-08,1081,PRD50,Doxycycline 100 mg,Pharmacy,15,31.91,478.65
2023-08-31,1009,PRD50,Mozzarella Cheese,Groceries,20,87.4,1748.0
2024-04-16,1024,PRD100,Photo Printers,Electronics,5,62.08,310.4


<h3>Queries</h3>

We will try to execute some Analytical Queries

<b>Top-Selling Products

In [19]:
%%sql
SELECT  product_name, SUM(quantity_sold) AS total_quantity_sold FROM SalesData 
    GROUP BY  product_name ORDER BY total_quantity_sold DESC LIMIT 5;


product_name,total_quantity_sold
Coats,695037
Jeans,693984
Vests,691671
Jackets,691598
Sweaters,691548


<b>Sales Trends Over Time

In [20]:
%%sql
SELECT date, SUM(total_sales) AS total_sales FROM SalesData
GROUP BY date ORDER BY total_sales desc limit 5;


date,total_sales
2023-10-04,23683180.01171875
2024-03-12,23643707.98828125
2023-08-10,23579240.79296875
2024-05-11,23566254.0546875
2023-10-08,23562311.21484375


<b>Total Sales by Store

In [21]:
%%sql
SELECT  Store_ID, SUM(total_sales) AS total_sales FROM SalesData
GROUP BY  Store_ID ORDER BY total_sales DESC limit 5;

Store_ID,total_sales
1046,84749903.3125
1085,84698935.921875
1059,84515027.5625
1070,84467964.34375
1023,84456465.9375


<b>Sales Contribution by Product (Percentage)

In [118]:
%%sql
SELECT product_name, SUM(total_sales) * 100.0 / (SELECT SUM(total_sales) FROM SalesData) AS sales_percentage FROM SalesData
    GROUP BY product_name ORDER BY sales_percentage DESC limit 5;

product_name,sales_percentage
Shorts,0.435872907309105
Jackets,0.4339830686314366
Hoodies,0.4308459576413966
Sweaters,0.4242101353194775
Vests,0.4184480826345888


<b>Top Days with Highest Sale</b>

In [125]:
%%sql
SELECT date, SUM(total_sales) AS total_sales FROM SalesData
    GROUP BY date ORDER BY total_sales DESC LIMIT 5;


date,total_sales
2024-01-01,693590.2890625
2023-12-08,678730.6484375
2024-03-01,662192.734375
2024-02-17,655928.375
2023-10-04,651127.13671875
