### 2. Write a PySpark program to create a DataFrame containing information about various products, including ProductID, ProductName, Category, Price, StockQuantity, & Rating and perform the following operations:
### • Insert minimum 10 values for the given columns.
### • Sort the DataFrame first by Price in descending order and then by Category in ascending order.
### • Find the total sales amount for each product by category.
### • Find the total sales amount and the total quantity sold for each product.

In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, sum

In [2]:
spark = SparkSession.builder.appName("Program 2").getOrCreate()

In [29]:
cols = ["ProductID", "ProductName", "Category", "Price", "StockQuantity", "Rating", "QuantitySold"]

#### • Insert minimum 10 values for the given columns.

In [30]:
data = [
    (1, "Laptop", "Electronics", 1000, 50, 4.5, 20),
    (2, "Smartphone", "Electronics", 700, 200, 4.7, 30),
    (3, "Tablet", "Electronics", 300, 100, 4.2, 25),
    (4, "Refrigerator", "Appliances", 1000, 20, 4.8, 40),
    (5, "Microwave", "Appliances", 200, 150, 4.3, 15),
    (6, "T-Shirt", "Clothing", 30, 500, 4.1, 11),
    (7, "Jeans", "Clothing", 50, 300, 4.4, 31),
    (8, "Blender", "Appliances", 100, 80, 4.5, 22),
    (9, "Smartwatch", "Electronics", 250, 300, 4.6, 10),
    (10, "Sneakers", "Clothing", 80, 400, 4.2, 13)
]

In [31]:
df = spark.createDataFrame(data, cols)

In [32]:
print("Initial DataFrame")
df.show()

Initial DataFrame
+---------+------------+-----------+-----+-------------+------+------------+
|ProductID| ProductName|   Category|Price|StockQuantity|Rating|QuantitySold|
+---------+------------+-----------+-----+-------------+------+------------+
|        1|      Laptop|Electronics| 1000|           50|   4.5|          20|
|        2|  Smartphone|Electronics|  700|          200|   4.7|          30|
|        3|      Tablet|Electronics|  300|          100|   4.2|          25|
|        4|Refrigerator| Appliances| 1000|           20|   4.8|          40|
|        5|   Microwave| Appliances|  200|          150|   4.3|          15|
|        6|     T-Shirt|   Clothing|   30|          500|   4.1|          11|
|        7|       Jeans|   Clothing|   50|          300|   4.4|          31|
|        8|     Blender| Appliances|  100|           80|   4.5|          22|
|        9|  Smartwatch|Electronics|  250|          300|   4.6|          10|
|       10|    Sneakers|   Clothing|   80|          400|  

#### • Sort the DataFrame first by Price in descending order and then by Category in ascending order.

In [33]:
sorted_df = df.orderBy(col("Price").desc(), col("Category"))

In [34]:
print("Sorted DataFrame")
sorted_df.show()

Sorted DataFrame
+---------+------------+-----------+-----+-------------+------+------------+
|ProductID| ProductName|   Category|Price|StockQuantity|Rating|QuantitySold|
+---------+------------+-----------+-----+-------------+------+------------+
|        4|Refrigerator| Appliances| 1000|           20|   4.8|          40|
|        1|      Laptop|Electronics| 1000|           50|   4.5|          20|
|        2|  Smartphone|Electronics|  700|          200|   4.7|          30|
|        3|      Tablet|Electronics|  300|          100|   4.2|          25|
|        9|  Smartwatch|Electronics|  250|          300|   4.6|          10|
|        5|   Microwave| Appliances|  200|          150|   4.3|          15|
|        8|     Blender| Appliances|  100|           80|   4.5|          22|
|       10|    Sneakers|   Clothing|   80|          400|   4.2|          13|
|        7|       Jeans|   Clothing|   50|          300|   4.4|          31|
|        6|     T-Shirt|   Clothing|   30|          500|   

#### • Find the total sales amount for each product by category.

In [36]:
total_sales_by_category = df.groupBy("Category", "ProductName").agg(sum(col("Price")*col("QuantitySold")).alias("Total Sales"))

In [37]:
print("Total Sales Amount for each Product by Category.")
total_sales_by_category.show()

Total Sales Amount for each Product by Category.
+-----------+------------+-----------+
|   Category| ProductName|Total Sales|
+-----------+------------+-----------+
|Electronics|      Laptop|      20000|
|Electronics|  Smartphone|      21000|
|Electronics|      Tablet|       7500|
| Appliances|   Microwave|       3000|
| Appliances|Refrigerator|      40000|
|   Clothing|     T-Shirt|        330|
|   Clothing|       Jeans|       1550|
| Appliances|     Blender|       2200|
|Electronics|  Smartwatch|       2500|
|   Clothing|    Sneakers|       1040|
+-----------+------------+-----------+



#### • Find the total sales amount and the total quantity sold for each product.

In [38]:
total_sales_and_quantity = df.groupBy("ProductName").agg(sum(col("Price")*col("QuantitySold")).alias("Total Sales"), sum(col("QuantitySold")).alias("Total Quantity Sold"))

In [39]:
print("Total Sales Amount and the Total Quantity Sold for each Product")
total_sales_and_quantity.show()

Total Sales Amount and the Total Quantity Sold for each Product
+------------+-----------+-------------------+
| ProductName|Total Sales|Total Quantity Sold|
+------------+-----------+-------------------+
|      Laptop|      20000|                 20|
|  Smartphone|      21000|                 30|
|      Tablet|       7500|                 25|
|Refrigerator|      40000|                 40|
|   Microwave|       3000|                 15|
|     T-Shirt|        330|                 11|
|       Jeans|       1550|                 31|
|     Blender|       2200|                 22|
|    Sneakers|       1040|                 13|
|  Smartwatch|       2500|                 10|
+------------+-----------+-------------------+



In [40]:
spark.stop()