# BigQuery: A Powerful Tool for Data Analysis

### Introduction to BigQuery for Data Analysis

BigQuery, Google's serverless, highly scalable, and cost-effective multi-cloud data warehouse, is designed for agile data analysis. The core of data interaction in BigQuery is through SQL, especially the versatile **SQL SELECT** statements.

### Order Item Overview


This query retrieves a list of order items from the `order_items` table within the `demo_retail` dataset. Specifically, it:

- Selects the `order_item_id` and `order_item_order_id` columns.
- Renames `order_item_id` to `id` and `order_item_order_id` to `oid` for clarity in the results.
- Limits the results to the first 10 records for a quick overview.

The dataset is part of the `pp-bigquery-02` project, ensuring that the query operates within the correct BigQuery environment.

In [None]:
%%bigquery 

SELECT
    order_item_id AS id,
    order_item_order_id AS oid
FROM
    `pp-bigquery-02.demo_retail.order_items`
LIMIT 10 ;

### Unique Order Status

This query retrieves distinct order statuses from the `orders` table within the `demo_retail` dataset, providing an overview of all unique order states present in the data. Specifically, it:

- Selects the `order_status` column.
- Uses the `DISTINCT` keyword to ensure each order status is listed only once, offering a clear view of all possible states an order can be in.

The data is sourced from the `pp-bigquery-02` project's `demo_retail` dataset, ensuring the query is executed within the correct BigQuery environment.


In [None]:
SELECT
  DISTINCT order_status
FROM
  `pp-bigquery-02.demo_retail.orders`

### Completed and Closed Orders

This query extracts details about orders that are either completed or closed from the `orders` table within the `demo_retail` dataset. It aims to provide focused insights into orders that have reached the end of their processing cycle. Specifically, it:

- Selects the `order_id`, `order_customer_id`, and `order_status` columns to provide a comprehensive view of each relevant order.
- Filters the results to include only those orders where the `order_status` is either 'COMPLETE' or 'CLOSED', ensuring that the query focuses on finalized transactions.

The data is extracted from the `pp-bigquery-02` project's `demo_retail` dataset, guaranteeing that the query is performed within the accurate BigQuery environment.

In [None]:
%%bigquery 

SELECT
  order_id,
  order_customer_id,
  order_status
FROM
  `pp-bigquery-02`.demo_retail.orders
WHERE
  order_status IN ('COMPLETE', 'CLOSED');

### High-Value Products

This query is designed to identify products from the `products` table within the `demo_retail` dataset that have a price exceeding 200. It's particularly useful for analyzing higher-end items in the product catalog. 


In [None]:
%%bigquery 

SELECT
  product_id,
  product_name,
  product_price
FROM
  `pp-bigquery-02`.demo_retail.products
WHERE
  product_price > 200;

### Customers Without Orders

This query identifies customers from the `customers` table within the `demo_retail` dataset who have not placed any orders. It's useful for understanding customer engagement and identifying customers who might need targeted marketing or engagement strategies. The query:

- Selects `customer_id`, `customer_fname`, and `customer_lname` from the `customers` table to provide basic information about each customer.
- Performs a left outer join on the `orders` table to match each customer with their orders.
- Filters the results to include only those customers whose `order_customer_id` is null, indicating that these customers have not placed any orders.

The query is executed against data in the `pp-bigquery-02` project's `demo_retail` dataset, ensuring accurate and contextually relevant results.

```sql
%%bigquery 

SELECT
  c.customer_id,
  c.customer_fname,
  c.customer_lname
FROM
  `pp-bigquery-02.demo_retail.customers` c
LEFT OUTER JOIN
  `pp-bigquery-02.demo_retail.orders` o
ON
  o.order_customer_id = c.customer_id
WHERE
  o.order_customer_id IS NULL;


### Daily Revenue by Product

This query calculates the daily revenue generated by each product, focusing on orders that are either 'COMPLETE' or 'CLOSED' from the `orders` and `order_items` tables within the `demo_retail` dataset.

In [None]:
%%bigquery 

SELECT
  o.order_date,
  oi.order_item_product_id,
  ROUND(SUM(oi.order_item_subtotal), 2) AS revenue
FROM
  `pp-bigquery-02.demo_retail.orders` AS o
JOIN
  `pp-bigquery-02.demo_retail.order_items` AS oi
ON
  o.order_id = oi.order_item_order_id
WHERE
  o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY
  1, 2
ORDER BY
  1, 3 DESC;


### Daily and Cumulative Monthly Revenue

This sophisticated query provides insights into both daily revenue and the cumulative monthly revenue for orders marked as 'COMPLETE' or 'CLOSED'. It employs Common Table Expressions (CTEs) and window functions, showcasing the powerful analytical capabilities of BigQuery SQL. 


In [None]:
%%bigquery 

WITH
  daily_revenue AS (
  SELECT
    o.order_date,
    ROUND(SUM(oi.order_item_subtotal), 2) AS revenue
  FROM
    `pp-bigquery-02.demo_retail.orders` AS o
  JOIN
    `pp-bigquery-02.demo_retail.order_items` AS oi
  ON
    o.order_id = oi.order_item_order_id
  WHERE
    o.order_status IN ('COMPLETE', 'CLOSED')
  GROUP BY
    1 )
SELECT
  FORMAT_DATE('%Y%m', order_date) AS order_month,
  order_date,
  revenue,
  ROUND(SUM(revenue) OVER (PARTITION BY FORMAT_DATE('%Y%m', order_date)
    ORDER BY
      order_date ), 2) AS revenue_cumulative
FROM
  daily_revenue
ORDER BY
  2;

### Top 3 Daily Product Revenue

This advanced query provides insights into the top 3 revenue-generating products for each day, focusing on orders marked as 'COMPLETE' or 'CLOSED'. It utilizes Common Table Expressions (CTEs) and window functions to perform a dense ranking of products based on their daily revenue. 


In [None]:
%%bigquery 

WITH
  daily_product_revenue AS (
  SELECT
    o.order_date,
    oi.order_item_product_id,
    ROUND(SUM(oi.order_item_subtotal), 2) AS revenue
  FROM
    `pp-bigquery-02.demo_retail.orders` AS o
  JOIN
    `pp-bigquery-02.demo_retail.order_items` AS oi
  ON
    o.order_id = oi.order_item_order_id
  WHERE
    o.order_status IN ('COMPLETE', 'CLOSED')
  GROUP BY
    1, 2 )
SELECT
  *
FROM (
  SELECT
    FORMAT_DATE('%Y%m', order_date) AS order_month,
    order_date,
    order_item_product_id,
    revenue,
    DENSE_RANK() OVER (PARTITION BY order_date ORDER BY revenue DESC ) AS denserank
  FROM
    daily_product_revenue )
WHERE
  denserank <= 3
ORDER BY
  2, 4 DESC;


### Order Status Action Classification

This query classifies orders from the `orders` table within the `demo_retail` dataset into different categories based on their `order_status`. It helps in quickly identifying the orders that might require attention or intervention. 



In [None]:
%%bigquery 

SELECT
  *,
  CASE
    WHEN order_status IN ('CLOSED', 'COMPLETE') THEN 'No Action Needed'
    WHEN order_status IN ('ON_HOLD', 'PAYMENT_REVIEW', 'PENDING', 'PENDING_PAYMENT', 'PROCESSING') THEN 'Action Needed'
  ELSE
    'Risky Orders'
  END AS order_action_category
FROM
  `pp-bigquery-02.demo_retail.orders`
LIMIT
  100;

## Summary: Key Learnings from BigQuery SQL Queries

Throughout this notebook, we have explored a wide array of SQL queries and techniques within the BigQuery environment, gaining valuable insights and honing our data analysis skills. Here are the key learnings and takeaways from the queries discussed:

1. **Basic SQL Clauses**: We started with fundamental SQL clauses like `WHERE`, `ORDER BY`, `GROUP BY`, and `HAVING`. These clauses form the backbone of data querying, enabling us to filter datasets, organize results, and summarize data effectively.

2. **Query Refinement**: We learned how to refine our queries for more precise results using `ALIAS` for readability, `IN` for specifying multiple values in a `WHERE` clause, `BETWEEN` for range conditions, `LIKE` for pattern matching, and `LIMIT` for restricting the number of results.

3. **Advanced SQL Operations**: Our journey introduced us to complex SQL operations, including:
    - **Aggregates**: Utilizing functions like `SUM()`, `COUNT()`, and `AVG()` to perform calculations on dataset columns.
    - **Conditionals**: Implementing conditional logic in queries using `CASE` statements to handle various data scenarios.
    - **Joins**: Merging data from multiple tables using different types of joins to create comprehensive datasets.
    - **CTEs**: Simplifying complex queries and enhancing readability by breaking them down into manageable parts.