# Customer - Orders Retail Genie
As of 11/7/2025, there is no yaml or json configuration file for Genie spaces. Below are Settings, Instructions, and included data for the Customer - Orders Retail Genie in this demo.

## Data
Tables to include in Genie space.

**NOTE: It is highly recommended to add table and column descriptions for these tables**
- Descriptions can be added in the Catalog tab and generated with AI.
- Make sure to double check and edit AI descriptions for accuracy before saving.

### Tables
- `customer_silver_pyspark`
- `gold_customer_order_history_pyspark`
- `gold_orders_by_city_state_pyspark`
- `gold_returns_by_customer_pyspark`
- `orders_silver_pyspark`
- `status_silver_pyspark`
- `gold_current_orders_pyspark`

## Instructions

### General Instructions
_Use astrix `*` to separate in config window_

* States in the state column use 2 character state codes
* Orders can have more than one status
* Order status is not always in the right chronological order to determine what is the most recent / current status
* Any tables and rows with status or status timestamp indicates a record for that specific order status. Order IDs are duplicated for each status, and any calculations or queries around the orders themselves need to be deduplicated or made distinct. Best to use other tables without status to answer questions around total orders, orders by geography, orders over time.
* Only tables with "current" or columns with "current" have the up to date order statuses. Use these tables and columns when asked about the current status for an order or to group the number of orders in a given status.

### Joins
Predefined joins to guide the Genie on combining data:

In [0]:
-- Left Table: orders_silver_pyspark
-- Right table: customers_silver_pyspark
-- Join condition:
`orders_silver_pyspark`.`customer_id` = `customers_silver_pyspark`.`customer_id`
-- Relationship type: Many to One

In [0]:
-- Left Table: orders_silver_pyspark
-- Right table: status_silver_pyspark
-- Join condition:
`orders_silver_pyspark`.`order_id` = `status_silver_pyspark`.`order_id`
-- Relationship type: One to Many

In [0]:
-- Left Table: gold_current_orders_pyspark
-- Right table: customers_silver_pyspark
-- Join condition:
`gold_current_orders_pyspark`.`customer_id` = `customers_silver_pyspark`.`customer_id`
-- Relationship type: Many to One

### SQL Queries
Ground truth SQL queries for the Genie space. Make sure to change the catalog to the catalog that your data is stored in. Recommend adding more ground truth SQL during experimentation to increase Genie accuracy.

In [0]:
-- Current customer count
SELECT count(*)
FROM workspace.sim_retail_demo.customers_silver_pyspark;

In [0]:
-- Returns by state
SELECT state,count(*)
FROM workspace.sim_retail_demo.gold_returns_by_customer_pyspark
GROUP BY state
ORDER BY count(*) DESC;

In [0]:
-- Repeat customers
SELECT customer_id, count(*) as total_orders
FROM workspace.sim_retail_demo.orders_silver_pyspark
GROUP BY customer_id
HAVING count(*) > 1
ORDER BY total_orders DESC;

## Settings
Title: Customer - Orders Retail Genie

Description: Retail Genie to query customers and orders from the Pyspark pipeline and tables. Built to drive insights on current customer records, order history, and order status. Handles questions around the order fulfilment process, customer information, geographic order analysis.

Sample questions:
- What are the states with the most customers?
- Which customers have more than one order?
- How many customers do we have?
- How many returns do we have in the state of California?

## Benchmarks
Benchmarks are used to automatically evaluate the Genie space and check for any drift or issues. Below are 5 benchmark questions and ground truth SQL answers to add to your Genie space. It is recommended you add more as your data and Genie usage develops.

**Update catalog if not using `workspace`**

In [0]:
-- How many orders are currently on the way to the customer?
SELECT
  COUNT(*) AS orders_on_the_way
FROM
  workspace.sim_retail_demo.gold_current_orders_pyspark
WHERE
  current_order_status = 'on the way';

In [0]:
-- Get me the emails of all repeat customers
SELECT DISTINCT
  c.email
FROM
  workspace.sim_retail_demo.orders_silver_pyspark o
    JOIN workspace.sim_retail_demo.customers_silver_pyspark c
      ON o.customer_id = c.customer_id
GROUP BY
  c.email,
  o.customer_id
HAVING
  COUNT(o.order_id) > 1

In [0]:
-- How many customers do we have?
SELECT
  COUNT(*) AS total_customers
FROM
  `workspace`.`sim_retail_demo`.`customers_silver_pyspark`;

In [0]:
-- Show me orders by state in a bar chart. Only show the top 5 states.
SELECT
  `state`,
  SUM(`total_orders`) AS total_orders
FROM
  workspace.sim_retail_demo.gold_orders_by_city_state_pyspark
WHERE
  `state` IS NOT NULL
GROUP BY
  `state`
ORDER BY
  total_orders DESC
LIMIT 5;

In [0]:
-- What year and month did we get the most orders?
SELECT
  YEAR(`order_timestamp`) AS order_year,
  MONTH(`order_timestamp`) AS order_month,
  COUNT(*) AS total_orders
FROM
  workspace.sim_retail_demo.orders_silver_pyspark
WHERE
  `order_timestamp` IS NOT NULL
GROUP BY
  YEAR(`order_timestamp`),
  MONTH(`order_timestamp`)
ORDER BY
  total_orders DESC
LIMIT 1;