<div id="singlestore-header" style="display: flex; background-color: rgba(235, 249, 245, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/database.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">SingleStore Notebooks</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">Sales Data Analysis Dataset From Amazon S3</h1>
    </div>
</div>

<div class="alert alert-block alert-warning">
    <b class="fa fa-solid fa-exclamation-circle"></b>
    <div>
        <p><b>Note</b></p>
        <p>This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to <tt>Start</tt> using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.</p>
    </div>
</div>

<div class="alert alert-block alert-warning">
    <b class="fa fa-solid fa-exclamation-circle"></b>
    <div>
        <p><b>Note</b></p>
        <p>This notebook creates a pipeline, data may take up to 1 minute to populate</p>
    </div>
</div>

The Sales Data Analysis use case demonstrates how to leverage SingleStore's powerful querying capabilities in a business intelligence context like analyzing sales data stored in a CSV file.

This demo showcases typical operations that businesses perform to gain insights from their sales data, such as:
- calculating total sales
- identifying top-selling products
- analyzing sales trends over time.

By working through this example, new users will:
- learn how to load CSV data into Singlestore from S3
- execute aggregate functions
- perform time-series analysis

<h3>Demo Flow</h3>

<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/LoadDataCSV.png width="100%" hight="50%"/>

## How to use this notebook

<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/notebookuse.gif width="75%" hight="50%"/>

## Create a database (You can skip this Step if you are using Free Starter Tier)

We need to create a database to work with in the following examples.

In [1]:
shared_tier_check = %sql show variables like 'is_shared_tier'
if not shared_tier_check or shared_tier_check[0][1] == 'OFF':
    %sql DROP DATABASE IF EXISTS SalesAnalysis;
    %sql CREATE DATABASE SalesAnalysis;

<h3>Create Table</h3>

In [2]:
%%sql
CREATE TABLE `SalesData` (
  `Date` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
  `Store_ID` bigint(20) DEFAULT NULL,
  `ProductID` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
  `Product_Name` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
  `Product_Category` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
  `Quantity_Sold` bigint(20) DEFAULT NULL,
  `Price` float DEFAULT NULL,
  `Total_Sales` float DEFAULT NULL
)

<h3>Load Data Using Pipelines</h3>

In [3]:
%%sql
CREATE PIPELINE SalesData_Pipeline AS
LOAD DATA S3 's3://singlestoreloaddata/SalesData/*.csv'
CONFIG '{ \"region\": \"ap-south-1\" }'
/*
CREDENTIALS '{"aws_access_key_id": "<access key id>",
               "aws_secret_access_key": "<access_secret_key>"}'
               */
INTO TABLE SalesData
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 lines;


START PIPELINE SalesData_Pipeline;

### Data may take couple of seconds to load after pipeline is started, rerun cell to verify

In [4]:
%%sql
SELECT count(*) FROM SalesData

<h3>Sample Queries</h3>

We will try to execute some Analytical Queries

<b>Top-Selling Products

In [5]:
%%sql
SELECT  product_name, SUM(quantity_sold) AS total_quantity_sold FROM SalesData
    GROUP BY  product_name ORDER BY total_quantity_sold DESC LIMIT 5;

<b>Sales Trends Over Time

In [6]:
%%sql
SELECT date, SUM(total_sales) AS total_sales FROM SalesData
GROUP BY date ORDER BY total_sales desc limit 5;

<b>Total Sales by Store

In [7]:
%%sql
SELECT  Store_ID, SUM(total_sales) AS total_sales FROM SalesData
GROUP BY  Store_ID ORDER BY total_sales DESC limit 5;

<b>Sales Contribution by Product (Percentage)

In [8]:
%%sql
SELECT product_name, SUM(total_sales) * 100.0 / (SELECT SUM(total_sales) FROM SalesData) AS sales_percentage FROM SalesData
    GROUP BY product_name ORDER BY sales_percentage DESC limit 5;

<b>Top Days with Highest Sale</b>

In [9]:
%%sql
SELECT date, SUM(total_sales) AS total_sales FROM SalesData
    GROUP BY date ORDER BY total_sales DESC LIMIT 5;

## Conclusion

We have shown how to insert data from a Amazon S3 using `Pipelines` to SingleStoreDB. These techniques should enable you to
integrate your Amazon S3 with SingleStoreDB.

## Clean up

Remove the '#' to uncomment and execute the queries below to clean up the pipeline and table created.

#### Drop Pipeline

In [10]:
%%sql
%%sql
#STOP PIPELINE SalesData_Pipeline;

#DROP PIPELINE SalesData_Pipeline;

#### Drop Data

In [11]:
#shared_tier_check = %sql show variables like 'is_shared_tier'
#if not shared_tier_check or shared_tier_check[0][1] == 'OFF':
#    %sql DROP DATABASE IF EXISTS SalesAnalysis;
#else:
#    %sql DROP TABLE SalesData;

<div id="singlestore-footer" style="background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px"></div>
<div><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png" style="padding: 0px; margin: 0px; height: 24px"/></div>