## 1069. Product Sales Analysis II
### Table: Sales

| Column Name | Type |
|-------------|------|
| sale_id     | int  |
| product_id  | int  |
| year        | int  |
| quantity    | int  |
| price       | int  |

**Primary Key**: `sale_id`  
**Foreign Key**: `product_id` references `Product.product_id`  
**Note**: `price` is per unit

---

### Table: Product

| Column Name  | Type    |
|--------------|---------|
| product_id   | int     |
| product_name | varchar |

**Primary Key**: `product_id`

---

### 🧠 Business Logic:
Write an SQL query that reports the **total quantity sold** for every `product_id`.

---

### ✅ Expected Output Format:

#### Sales Table:
| sale_id | product_id | year | quantity | price |
|---------|------------|------|----------|-------|
| 1       | 100        | 2008 | 10       | 5000  |
| 2       | 100        | 2009 | 12       | 5000  |
| 7       | 200        | 2011 | 15       | 9000  |

#### Product Table:
| product_id | product_name |
|------------|--------------|
| 100        | Nokia        |
| 200        | Apple        |
| 300        | Samsung      |

#### Output:
| product_id | total_quantity |
|------------|----------------|
| 100        | 22             |
| 200        | 15             |

**Explanation**:  
- Product 100 (Nokia): 10 + 12 = 22  
- Product 200 (Apple): 15  
- Product 300 (Samsung): no sales → excluded

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType

# Sample data for Sales
sales_data = [
    (1, 100, 2008, 10, 5000),
    (2, 100, 2009, 12, 5000),
    (7, 200, 2011, 15, 9000)
]

sales_schema = StructType([
    StructField("sale_id", IntegerType(), False),
    StructField("product_id", IntegerType(), True),
    StructField("year", IntegerType(), True),
    StructField("quantity", IntegerType(), True),
    StructField("price", IntegerType(), True)
])

# Sample data for Product
product_data = [
    (100, "Nokia"),
    (200, "Apple"),
    (300, "Samsung")
]

product_schema = StructType([
    StructField("product_id", IntegerType(), False),
    StructField("product_name", StringType(), True)
])

# Create DataFrames
sales_df = spark.createDataFrame(sales_data, sales_schema)
product_df = spark.createDataFrame(product_data, product_schema)

# Register Temp Views
sales_df.createOrReplaceTempView("Sales")
product_df.createOrReplaceTempView("Product")

# SQL logic to compute total quantity per product_id
result = spark.sql("""
    SELECT s.product_id, SUM(s.quantity) AS total_quantity
    FROM Sales s
    GROUP BY s.product_id
""")

# Display result
display(result)

In [0]:
%sql

Select  p.product_id , sum(s.quantity) as total_quantity from Product p  inner join 
Sales  s on p.product_id = s.product_id
group by p.product_id

In [0]:
from pyspark.sql.functions import *
product_df_new = product_df.selectExpr("product_id as p_id","product_name as p_name")
sales_df_new = sales_df.selectExpr("sale_id as s_id","product_id as sp_id","year as year","quantity",  "price")


product_df_new.join(sales_df_new, col("p_id")==col("sp_id"),"inner")\
    .groupBy("p_id")\
    .agg(sum("quantity").alias("total_quantity" ))\
    .display()